Prerequisite
First configure your Experiment, see Creating an ExperimentRunning an Experiment
Once configured, you can run the Experiment using the Run button. Depending on the Dataset size it may take a few minutes to run all model prompts generations. Once successful your Experiment Run Status will change to Completed. You can then see Experiment Results.Only Evaluating Existing Dataset Outputs
If you want to test evaluators on datasets that already contain generated responses, you can run an evaluation-only experiment:- Set up your experiment with the dataset containing existing outputs in the “messages” column
- Do not select a prompt during experiment setup
- Add your desired evaluators
- Run the experiment
To run another iteration of the Experiment, with different prompts or data, use the New Run button. A new Experiment Run will be created in Draft state.
Running a single Prompt
It is often useful to add an extra prompt after running an experiment, to tweak a configuration or try a different version. Once a new Prompt is added, select and choose Run to run on the existing Dataset.
Seeing Experiment Results
Report
Once an Experiment is ran, its status will change from Running to Completed
The total cost and runtime for the Experiment will be displayed.

The results for Prompts A and B
- Latency.
- Costs.
- Evaluators.
Comparing Model Performance
Using the Compare tab, visualize multiple model executions.
View multiple model generations side-by-side.
The variables and expected outputs are now displayed on the left for better context, especially when working with large inputs or detailed test cases. At the bottom of the screen, the evaluators section provides scores and feedback for each result, helping you assess model quality and performance at a glance. You can use this screen to easily apply Feedbacks and Human Review to each output, letting you evaluate and review Experiment results efficiently.

Feedbacks and Human Reviews are available at a Click.
Viewing Multiple Experiment Runs
Within the Runs tab, visualize all previous runs for an Experiment. Through this view, all Evaluators results are visible at a glance, making it easy to compare result and see progress between multiple Runs.
See at a glance how results evolved between two experiment runs.
Compare Results
For easy comparison between models and prompts, click the Compare tab. This opens the redesigned comparison view where you can analyze outputs side by side across multiple models or configurations.The variables and expected outputs are now displayed on the left for better context, especially when working with large inputs or detailed test cases.
At the bottom of the screen, the evaluators section provides scores and feedback for each result, helping you assess model quality and performance at a glance.

Comparing two Prompts and their results