1 article tagged with Eval.
Most LLM eval suites correlate poorly with what real users experience. The eval patterns we run that move with prod metrics — and the ones that lied to us.