What We Learned from a Year of Building with LLMs

We’ve identified some crucial, yet often neglected, lessons and methodologies informed by machine learning that are essential for developing products based on LLMs. Awareness of these concepts can give you a competitive advantage against most others in the field without requiring ML expertise!

<aside> 💡 One of the best articles in this space → here

</aside>

Prompting
- Focus on getting the most out of fundamental prompting techniques
- Structure your inputs and outputs
- Have small prompts that do one thing, and only one thing, well
- Craft your context tokens
Information Retrieval/RAG
- The quality of your RAG’s output is dependent on the quality of retrieved documents, which in turn can be considered along a few factors.
- Don’t forget keyword search; use it as a baseline and in hybrid search.
- Prefer RAG over fine-tuning for new knowledge
- Long-context models won’t make RAG obsolete
Tuning and optimizing workflows
- Step-by-step, multi-turn “flows” can give large boosts
- Prioritize deterministic workflows for now
- Getting more diverse outputs beyond temperature
- Caching is underrated.
- When to fine-tune
Evaluation & Monitoring
- Create a few assertion-based unit tests from real input/output samples
- LLM-as-Judge can work (somewhat), but it’s not a silver bullet
- The “intern test” for evaluating generations
- Overemphasizing certain evals can hurt overall performance
- Simplify annotation to binary tasks or pairwise comparisons
- (Reference-free) evals and guardrails can be used interchangeably
- LLMs will return output even when they shouldn’t
- Hallucinations are a stubborn problem.