Creating LLM-as-a-judge Evaluator
To start building an LLM-as-a-judge Evaluator, head to a Projects, use the +
button and select Evaluator.
The following modal opens:

Select the LLM-as-a-judge type
Configure Model & Output
Select then the model you would like to use to evaluate the output (the model needs to be enabled in your Model Garden).
Choose which type of output your model evaluation will provide:
- Boolean, if the evaluation generates a True/False response.
- Number, if the evaluation generates a Score.
Configure Prompt
Your prompt has access to two variables:
{{log.input}}
contains the input message sent to the evaluated model{{log.output}}
contains the output response generated by the evaluated model
Example
Evaluating the Familiarity of an output
Evaluate the familiarity of the [OUTPUT], give a score between 1 and 10, 1 being very formal, 10 being very familiar. Only output the score.
[OUTPUT] {{log.output}}
Evaluating the accuracy of a response
Evaluate how accurate a response [OUTPUT] is compared to the query [INPUT]. Give a score between 1 and 10, 1 being not accurate at all, 10 being perfectly accurate. Only output the score.
[INPUT] {{log.input}}
[OUTPUT] {{log.output}}
Guardrail Configuration
Within a Deployments, you can use your LLM-as-a-judge Evaluator as a Guardrail, effectively permitting a validation on input and output for a deployment generation.
Enabling the Guardrail toggle will block payloads that don't meet a score or expected boolean response.
Once created the Evaluator will be available to use in Deployments, to learn more, see Evaluators & Guardrails in Deployments.
Updated 3 days ago