Skip to main content

What are Python Evaluators?

Python Evaluators enable users to write custom Python code to create tailored evaluations, offering maximum flexibility for assessing text or data. From simple validations (e.g. regex patterns, data formatting) to complex analyses (e.g. statistical checks, custom scoring algorithms), they execute user-defined logic to measure specific criteria..

Why use Python Evaluators?

Python Evaluators are ideal when you need complete control over evaluation logic or require custom checks that predefined tools can’t cover. They shine in scenarios demanding tailored validation (e.g., domain-specific formatting), complex data analysis (e.g., statistical benchmarks), or integration with external systems (e.g., APIs, databases).

Example

Suppose your AI generates product descriptions in JSON format. A Python Evaluator could:
  1. Validate the JSON structure.
  2. Check for required fields (SKU, price).
  3. Ensure prices fall within a valid range.
  4. Verify image URLs link to active, on-brand assets.
With Python, you combine these checks into a single script, even adding logic like flagging prices 20% above competitors. This flexibility ensures outputs meet both technical and business criteria.
Security and Execution EnvironmentPython Evaluators run in a secure, sandboxed environment using RestrictedPython to protect against malicious code execution. This means:
  • Limited modules: Only pre-approved libraries are available (json, re, numpy, nltk)
  • No file system access: open(), file operations, and path traversal are blocked
  • No network access: requests, urllib, socket are not allowed
  • No system calls: os, sys, subprocess modules are restricted
  • Execution limits: Code must complete within 5 seconds and use less than 256MB of memory
  • Restricted imports: Dynamic imports and dangerous modules like pickle or marshal are blocked
If your code violates these restrictions, you’ll receive a clear error message indicating which pattern was detected.
To get started:
Creating an LLM Evaluator Creating a Python Evaluator