Experiment
Experiments enable efficient testing of multiple scenarios across different prompts, models, and evaluators simultaneously.
- Each Run executes model generations using configured Inputs and Messages from a Dataset.
- After a Run completes, Latency and Cost metrics are recorded for each generation.
- Results can be reviewed manually or validated automatically with Evaluators, allowing for comparison across an Expected Output.
See what running an Experiment looks like in the presentation below.
Updated 3 days ago