added

Evaluator runs without new generation

You can now evaluate existing LLM outputs without generating new responses. Previously, running an evaluator required creating a new output every time. With this update, you can retroactively score any response already stored in your dataset.

Why this matters

  • Quality: Evaluate historical responses to ensure you’re assessing the true source of truth, not just new outputs.
  • Flexibility: Apply evaluations to both** single responses and full conversation chains**, adapting to your specific review needs.

How it works

  1. Prepare your dataset including LLM outputs in the messages column.
  2. Set up an experiment and select one or more evaluators - do not select a prompt.
  3. The evaluator will analyze the responses already present in the “messages” column of your dataset.

Evaluator selection