How would you defend an LLM application against prompt injection?
No single fix is complete, so defenses are layered: separate trusted instructions from untrusted data, constrain and least-privilege the tools and actions the model can take, validate and sanitize inputs and tool outputs, add output guardrails and injection classifiers, and keep a human in the loop for sensitive actions. Treat all external or retrieved content as untrusted.
How to think about it
No single fix is complete, so defenses are layered: separate trusted instructions from untrusted data, constrain and least-privilege the tools and actions the model can take, validate and sanitize inputs and tool outputs, add output guardrails and injection classifiers, and keep a human in the loop for sensitive actions. Treat all external or retrieved content as untrusted.