Every word counts, but it's easy to obsess. Be careful, but accept you won't be right at first.
Prompt engineering resembles the philosophical practice of explaining complex ideas clearly to an educated layperson. It requires empathy, introspection, and iteration to externalize nuanced human intent in ways a model can understand. This skill—making implicit knowledge explicit—is the core of prompting.
But here's where it gets interesting: it's not just about writing well. Prompt engineering is engineering precisely because of the iteration. The ability to experiment, restart from scratch, and test different approaches independently is what makes it true engineering work. You're not just crafting sentences—you're integrating prompts within complete systems, thinking about latency trade-offs, data sources, and how to manage the entire context window.
Think of prompts like code. They need version control, experiment tracking, and the same precision you'd apply to any software component. Like code, they evolve with model capabilities and require constant refinement as the underlying systems improve.
The best mental framework? Imagine you hired someone competent who just arrived. They know a lot about your industry but don't know your company's name. What would you tell them? That's exactly how you should approach writing prompts.
The Black Box Problem
Good prompt engineers think about edge cases, anticipate unusual inputs, and consider real user behavior—not just ideal queries. They build escape hatches into their systems. Most importantly, they use reasoning to find breaking points.
Here's a critical practice: read the outputs, not just the results. Don't just check if the model got it right or wrong. Read HOW it arrived at the answer. The reasoning path reveals far more about what's working (and what's breaking) than the final answer alone.
The objective is to build reliable, fast, and cheap systems. In that order. Reliability comes first, always. You can optimize for speed and cost later, but an unreliable system is worthless regardless of how fast or cheap it is.
One powerful technique: make the model interview you to extract information. Use reverse engineering. Let it ask you what it needs to know rather than trying to anticipate everything upfront.
But there's a balance. Prompt engineers must balance over-iteration on prompts with recognizing when a task is beyond current model capabilities. Sometimes the solution isn't a better prompt—it's a better model, more context, or a different approach entirely. Prompts evolve from one-time instructions to reusable, version-controlled artifacts integrated within larger systems.
Building Trust: The Sycophancy Problem
You're right. You're absolutely right. It's time to restart again.
Except... maybe you're not always right? Sycophantic AI refers to when an assistant acts like a yes-man—always agreeing with the user to please them, rather than being honest and helpful. This is a trust killer.
Building trust in AI means avoiding sycophancy. Your prompts should explicitly allow the model to disagree, to point out when something doesn't make sense, to ask clarifying questions. The best prompts give models permission to be genuinely helpful rather than agreeable.
The field of prompt engineering is rapidly evolving, and many "hacks" become obsolete as models improve. What worked six months ago might be unnecessary today. Stay current, but don't chase every trend.
Core Principles
Heuristics, Not Rigid Rules
Instead of implementing rigid rules that dictate specific behaviors for exact situations, use flexible heuristics that provide guiding principles the model can adapt intelligently based on specific context.
Why does this matter? Scalability. Heuristics work across different domains without rewriting instructions for each specific scenario. They're more robust—less prone to catastrophic failures when conditions change from the expected case. They allow sophisticated behavior without counterproductive rigidity.
Frame instructions as high-level, fundamental guidelines that apply across varied inputs and scenarios within the same use case, rather than prescribing behavior for a single example. Each guideline should remain open enough to let the model interpret and adapt intelligently, yet include all core details—intent, boundaries, and key criteria—to avoid ambiguity. This balance ensures flexible application without sacrificing necessary precision.
Model Weighting: The Hidden Architecture
Language models assign different levels of importance to each part of a prompt. Understanding how weight is distributed—and more importantly, how to balance it—is critical for effective instructions.
Position matters immensely. Instructions at the beginning receive the most attention, followed by those at the end. Instructions in the middle are more likely to be skipped or deprioritized. When contradictory instructions are present, the model tends to follow the one placed at the beginning or end, though without clarity, it may misinterpret or mix behaviors.
Repetition increases perceived importance, but excessive reinforcement causes over-anchoring, leading the model to ignore nuance or context. Reinforce essential ideas once or twice in strategic locations (start/end), and avoid over-repeating unless the model consistently struggles to follow.
Formatting and emphasis signals—capitalization, bold, italics, and strong cue words like MUST, SHOULD, ONLY, AVOID—increase instruction weight. Use emphasis selectively to reinforce what matters most, without overwhelming the model or distorting overall balance.
More advanced models adhere to instructions more easily and require less reinforcement through repetition. Understand your model's capabilities and adjust accordingly.
Examples are particularly powerful. The model tends to over-adhere to them and replicate their tone, structure, and logic. This can be useful but also introduces risk of making outputs too rigid or repetitive. Use examples carefully.
Scope: The Single Most Important Factor
Scope of the system prompt instructions is the single thing that affects AI models most. It defines the determinism of the model and is an important piece of what we consider for instruction operations like unifying or decomposing prompts.
Scopes can be closed or open, but not necessarily one or the other. It's a combination of both. The critical question: how open is the input scope?
Ambiguous instruction is the most common mistake and the easiest way to decrease LLM performance. Be ruthlessly clear about scope boundaries.
Escape Hatches and Debug Info
LLMs want to help so badly that they invent answers when they don't have information. The solution? Give them an explicit emergency exit: "If you don't have sufficient information, DO NOT invent. Stop and ask me."
Include a parameter where the LLM can "complain" to the developer. This automatically generates a to-do list of things the developer needs to fix. The result? Iterative improvement based on feedback from the agent itself.
Practical Considerations
Avoid negative phrasing relating to what the model cannot do. Focus on what it should do or know, not what to avoid. Use natural, simple language rather than excessively complex technical jargon.
Structure instructions with clear hierarchy—headings, bold text, and bullet points (when necessary). Use "do not" instead of contractions like "don't." Use "the" instead of "this," "that," "your," and other possessives.
Match your prompt style to the desired output and maintain consistent formatting throughout. Errors tend to occur from misunderstanding user intent, insufficient context gathering or analysis, or insufficient step-by-step thinking. Watch for these patterns and address them with more opinionated instructions.
Be careful with absolute instructions. Telling a model to "always" follow specific behavior can induce adverse effects. For instance, if told "you must call a tool before responding," models may hallucinate tool inputs or call the tool with null values. Add conditional escape hatches: "if you don't have enough information to call the tool, ask the user for what you need."
When providing sample phrases, models can use quotes verbatim and sound repetitive. Instruct the model to vary them as necessary.
For conversational models, remember that users aren't always right. You can explicitly tell the model to check false statements or presuppositions. If a user corrects the model, it should think through the issue carefully before acknowledging, since users sometimes make errors themselves.
Models tend to be verbose. Adjust comprehensiveness based on user requests: give concise responses to simple questions, but provide thorough responses to complex and open-ended questions.
Never start responses with flattery. Skip phrases like "That's a great question" or "That's fascinating." Respond directly. LLMs love responding in lists—discourage bullet points and use prose by default unless structure truly demands it.
The AI lacks inherent understanding of positional or temporal references. Avoid terms like "previous," "above," "next," or "later." Use direct references instead. Maintain terminology consistency—if you call something a "candidate," don't switch to "applicant" or "job seeker."
The Final Standard
Aim for instructions comprehensive enough to provide clear guidance and cover all critical points, edge cases, and nuances, while maintaining conciseness to avoid contradictions or internal conflicts. Instructions should be detailed and specific where necessary for clarity, but streamlined to include only essential information.
Optimize for maximum precision and intelligence by ensuring every word serves a purpose in delivering complete, actionable guidance. In prompt engineering, like in all good engineering, every element should justify its existence.