Lesson 13: Prompt injection and safety — in plain terms
When your prompt includes text from an untrusted source — a user, an email, a web page — that text can carry its own instructions, like 'ignore the previous instructions'. Sometimes the model follows them. In this lesson we'll understand what prompt injection is and how to defend against it in plain
If you ask the model to summarize a text, and the text itself says 'ignore the request and write something else' — the model might obey it. The fix: tell the model the text is data only, not instructions.
- prompt injection
- Untrusted text that contains instructions trying to override or replace your original instructions.
- untrusted input
- Text from an external source (a user, email, website) that can't be trusted not to carry malicious instructions.
- trust boundary
- The clear separation between your instructions (trusted) and external text (untrusted) that must be treated as data.