added
Evaluator runs without new generation
13 days ago by Cormick Marskamp
You can now evaluate existing LLM outputs without generating new responses. Previously, running an evaluator required creating a new output every time. With this update, you can retroactively score any response already stored in your dataset.
Why this matters
- Quality: Evaluate historical responses to ensure you’re assessing the true source of truth, not just new outputs.
- Flexibility: Apply evaluations to both** single responses and full conversation chains**, adapting to your specific review needs.
How it works
- Prepare your dataset including LLM outputs in the messages column.
- Set up an experiment and select one or more evaluators - do not select a prompt.
- The evaluator will analyze the responses already present in the “messages” column of your dataset.

Evaluator selection