Python Evaluator

What are Python Evaluators?

Python Evaluators enable users to write custom Python code to create tailored evaluations, oﬀering maximum flexibility for assessing text or data.

From simple validations (e.g. regex patterns, data formatting) to complex analyses (e.g. statistical checks, custom scoring algorithms), they execute user-defined logic to measure specific criteria..

Why use Python Evaluators?

Python Evaluators are ideal when you need complete control over evaluation logic or require custom checks that predefined tools can’t cover.

They shine in scenarios demanding tailored validation (e.g., domain-specific formatting), complex data analysis (e.g., statistical benchmarks), or integration with external systems (e.g., APIs, databases).

Example

Suppose your AI generates product descriptions in JSON format. A Python Evaluator could:

Validate the JSON structure.
Check for required fields (SKU, price).
Ensure prices fall within a valid range.
Verify image URLs link to active, on-brand assets.

With Python, you combine these checks into a single script, even adding logic like flagging prices 20% above competitors. This flexibility ensures outputs meet both technical and business criteria.

To get started:

see Creating a Python Evaluator.
see Using Evaluator in Experiment.