Running an Experiment
Prerequisite
First configure your Experiment, see Creating an Experiment
Running an Experiment
Once configured, you can run the Experiment using the Run button.
Depending on the Dataset size it may take a few minutes to run all model prompts generations.
Once successful your Experiment Run Status will change to Completed. You can then see Experiment Results.
Only Evaluating Existing Dataset Outputs
If you want to test evaluators on datasets that already contain generated responses, you can run an evaluation-only experiment:
- Set up your experiment with the dataset containing existing outputs in the "messages" column
- Do not select a prompt during experiment setup
- Add your desired evaluators
- Run the experiment
This mode will evaluate the existing responses without generating new outputs, allowing you to retroactively score historical responses and conversation chains that are already stored in your dataset.
To run another iteration of the Experiment, with different prompts or data, use the New Run button. A new Experiment Run will be created in Draft state.
Seeing Experiment Results
Report
Once an Experiment is ran, its status will change from Running to Completed

The total cost and runtime for the Experiment will be displayed.
The right side of the table will be filled with results.

The results for Prompts A
and B
Under each Column corresponding to each Prompt you can see results for:
- Latency.
- Costs.
- Evaluators.
Viewing Multiple Experiment Runs
Within the Runs tab, visualize all previous runs for an Experiment.
Through this view, all Evaluators results are visible at a glance, making it easy to compare result and see progress between multiple Runs.

See at a glance how results evolved between two experiment runs.
Logs
Switching to the Logs tab lets you see the details of each call.
Within logs you can process Feedback and build Curated Dataset
By hovering a cell you can also directly access the related log using the See log button.
For easy comparison between models and prompts, you can also click the Show Comparison button. This will open the panel below which allows you to quickly compare the output side by side and navigate through the different rows.

Comparing Prompt version A and B from GPT-4o and Claude 3.5 Sonnet
Export
You can export datasets and results in CSV, JSON and JSONL formats using the Export button.
Updated 16 days ago