Glossary AI technique

Eval

What are evals?

Evals are automated tests for AI features that measure how accurately, safely and stably they perform on a fixed set of representative examples.

Also known as evalsevaluationAI evaluation

Without evals you do not know whether your AI feature works well, you only know that it returns something. An eval set is a collection of realistic examples with the desired result, with which you objectively measure whether the system is right, and whether a change improves something or quietly breaks something else.

Checks can be hard (is the amount exactly right), judged by a model (is this answer helpful and tidy), or reviewed by a human for the borderline cases. Together they give a score that you rerun with every change.

Honestly: evals are the most underrated part of most AI projects. Everyone wants to build, almost no one wants to measure. That is precisely why it is our dividing line between a demo that "mostly works" and a system you dare to run in production. Build the eval set from day one.

Last updated: 18 June 2026

Relatedterms

Read on in the glossary

All terms

AI techniqueAgentic workflowAn agentic workflow is a process in which an AI agent, not a human, plans the steps, calls tools, checks intermediate results and continues until the goal is reached.AI techniqueHallucinationA hallucination is when a language model confidently gives information that is incorrect or made up, presented as if it were a fact.AI techniqueLLM (Large Language Model)An LLM is a large language model trained on enormous amounts of text, which lets it understand and generate language; it is the engine under tools like ChatGPT, Claude and Gemini.AI techniqueAI agentAn AI agent is software that pursues a goal on its own: it plans the steps, uses tools or systems, checks the result and adjusts course, without a human directing every step.

You can feel it has to change,
we show you how.

You know where the friction is. We help you figure out how AI can genuinely fix it.

Book an intro call

Not ready for a conversation yet? Get honest AI advice first →

30 minutes. Online or in Enschede. You decide.

You can feel it has to change,we show you how.

You can feel it has to change,
we show you how.