Glossary AI technique
Eval
What are evals?
Evals are automated tests for AI features that measure how accurately, safely and stably they perform on a fixed set of representative examples.
Without evals you do not know whether your AI feature works well, you only know that it returns something. An eval set is a collection of realistic examples with the desired result, with which you objectively measure whether the system is right, and whether a change improves something or quietly breaks something else.
Checks can be hard (is the amount exactly right), judged by a model (is this answer helpful and tidy), or reviewed by a human for the borderline cases. Together they give a score that you rerun with every change.
Honestly: evals are the most underrated part of most AI projects. Everyone wants to build, almost no one wants to measure. That is precisely why it is our dividing line between a demo that "mostly works" and a system you dare to run in production. Build the eval set from day one.
Last updated: 18 June 2026