Prompt engineering lessons from 2 years of LLM work

Over the past two years, I've built LLM-powered features into multiple products — from SEO content briefs at Swiftbrief to virtual assistants handling sensitive company data at knecon. Here are the patterns and anti-patterns I've discovered.

Start Simple, Then Iterate

My biggest early mistake was over-engineering prompts from the start. I'd write elaborate system prompts with dozens of rules, only to find they confused the model more than they helped. Now I start with the simplest possible prompt and add complexity only when I have concrete failure cases to address.

Structure Your Outputs

Asking for JSON or structured output formats dramatically improves reliability. Models are much better at following a schema than producing consistently formatted free text. Combined with output validation, this makes LLM outputs much more predictable in production systems.

Chain of Thought Isn't Always the Answer

Chain-of-thought prompting is powerful for complex reasoning tasks, but it adds latency and token costs. For straightforward tasks like classification or simple extraction, direct prompting often works just as well and is much faster. Match the technique to the complexity of the task.

RAG: Quality Over Quantity

When building RAG systems, I initially focused on retrieving as much relevant context as possible. But more context often means more noise. Now I invest heavily in retrieval quality — better chunking strategies, hybrid search, and re-ranking — rather than just throwing more documents at the model.

Key Takeaways

Test with real users: Synthetic test cases never capture the full range of inputs your system will see. Get real usage data as early as possible.
Version your prompts: Treat prompts like code. Store them in version control, track changes, and A/B test improvements.
Build evaluation frameworks:You can't improve what you can't measure. Automated evaluation — even imperfect — beats manual review at scale.
Plan for model changes: Models get updated, deprecated, and replaced. Design your system so prompts and model choices are easily swappable.

The field is moving incredibly fast, and what works today might be outdated in six months. But these foundational principles — starting simple, measuring rigorously, and iterating based on real usage — remain constant.