Evaluator runs without new generation

You can now evaluate existing LLM outputs without generating new responses. Previously, running an evaluator required creating a new output every time. With this update, you can retroactively score any response already stored in your dataset.

Why this matters

Quality: Evaluate historical responses to ensure you’re assessing the true source of truth, not just new outputs.
Flexibility: Apply evaluations to both** single responses and full conversation chains**, adapting to your specific review needs.

How it works

Prepare your dataset including LLM outputs in the messages column.
Set up an experiment and select one or more evaluators - do not select a prompt.
The evaluator will analyze the responses already present in the “messages” column of your dataset.