Yesterday’s post posed a question around how you measure prompts? If you are building an AI powered application how do you know if you’re app is production ready or if it just works on a handful of examples you’ve tested it on.
Evaluating LLM-based Applications
Yesterday’s post posed a question around how you measure prompts? If you are building an AI powered application how do you know if you’re app is production ready or if it just works on a handful of examples you’ve tested it on.