What are Python Evaluators?
Python Evaluators enable users to write custom Python code to create tailored evaluations, offering maximum flexibility for assessing text or data. From simple validations (e.g. regex patterns, data formatting) to complex analyses (e.g. statistical checks, custom scoring algorithms), they execute user-defined logic to measure specific criteria..Why use Python Evaluators?
Python Evaluators are ideal when you need complete control over evaluation logic or require custom checks that predefined tools can’t cover. They shine in scenarios demanding tailored validation (e.g., domain-specific formatting), complex data analysis (e.g., statistical benchmarks), or integration with external systems (e.g., APIs, databases).Example
Suppose your AI generates product descriptions in JSON format. A Python Evaluator could:- Validate the JSON structure.
- Check for required fields (SKU, price).
- Ensure prices fall within a valid range.
- Verify image URLs link to active, on-brand assets.
Security and Execution EnvironmentPython Evaluators run in a secure, sandboxed environment using RestrictedPython to protect against malicious code execution. This means:
- Limited modules: Only pre-approved libraries are available (
json,re,numpy,nltk) - No file system access:
open(), file operations, and path traversal are blocked - No network access:
requests,urllib,socketare not allowed - No system calls:
os,sys,subprocessmodules are restricted - Execution limits: Code must complete within 5 seconds and use less than 256MB of memory
- Restricted imports: Dynamic imports and dangerous modules like
pickleormarshalare blocked
Creating an LLM Evaluator Creating a Python Evaluator