Experiment

Experiments enable efficient testing of multiple scenarios across different prompts, models, and evaluators simultaneously.

  • Each Run executes model generations using configured Inputs and Messages from a Dataset.
  • After a Run completes, Latency and Cost metrics are recorded for each generation.
  • Results can be reviewed manually or validated automatically with Evaluators, allowing for comparison across an Expected Output.

See what running an Experiment looks like in the presentation below.