The SQL injection analogy is instructive but the framing matters. SQL injection got fixed not by teaching databases to recognize hostile SQL — it got fixed by parameterized queries, which took the trust boundary out of the data path entirely. The fix wasn't smarter parsing; it was structural separation.
The same category of fix exists for agent security today, without waiting for models to get better at detecting injection. Assume the LLM will be compromised — it's processing untrusted input. The constraint lives at the tool call boundary: before execution, a deterministic policy evaluates whether this specific action (npm install, bash, git push) is permitted in this context. The model's intent doesn't matter. The policy doesn't ask 'does this look malicious?' — it enforces what's allowed, period. Fail-closed.
The Cline config tells the full story. allowed_non_write_users='*' combined with unrestricted Bash is not a model safety failure. It's an authorization architecture failure. The agent was configured to allow arbitrary code execution triggered by any GitHub account. Prompt injection just exercised what was already permitted.
Enforcement has to live outside the context window. Anything inside it — system prompt rules, safety instructions, 'don't run npm install from untrusted repos' — becomes part of the attack surface the moment injection succeeds. The fix isn't better prompting. It's deterministic enforcement at the execution boundary, independent of whatever the model was convinced to do.
The same category of fix exists for agent security today, without waiting for models to get better at detecting injection. Assume the LLM will be compromised — it's processing untrusted input. The constraint lives at the tool call boundary: before execution, a deterministic policy evaluates whether this specific action (npm install, bash, git push) is permitted in this context. The model's intent doesn't matter. The policy doesn't ask 'does this look malicious?' — it enforces what's allowed, period. Fail-closed.
The Cline config tells the full story. allowed_non_write_users='*' combined with unrestricted Bash is not a model safety failure. It's an authorization architecture failure. The agent was configured to allow arbitrary code execution triggered by any GitHub account. Prompt injection just exercised what was already permitted.
Enforcement has to live outside the context window. Anything inside it — system prompt rules, safety instructions, 'don't run npm install from untrusted repos' — becomes part of the attack surface the moment injection succeeds. The fix isn't better prompting. It's deterministic enforcement at the execution boundary, independent of whatever the model was convinced to do.