Experiment

Experiments enable efficient testing of multiple scenarios across different prompts, models, and evaluators simultaneously.

Each Run executes model generations using configured Inputs and Messages from a Dataset.
After a Run completes, Latency and Cost metrics are recorded for each generation.
Results can be reviewed manually or validated automatically with Evaluators, allowing for comparison across an Expected Output.

See what running an Experiment looks like in the presentation below.