Logging & Monitoring LLM
In this guide, you will learn how you can log different LLM metrics such as metadata, latency, score, LLM response and economics (prompt_tokens, completion_tokens, total_tokens ) to orq.ai.
The reserved keywords are:
- Scoring
- Metadata
- Latency
- LLM response
- Economics
You need to have the orq.ai Python or NodeJs SDK installed.
#pip installation
pip install orq-ai-sdk
// Node installation
npm install @orq-ai/node --save
yarn add @orq-ai/node
1. Scoring
Scoring serves as the compass that guides the LLM's responses toward coherence, relevance, and accuracy. Some methods to get this include thumbs up/down, rating stars, and end-user feedback. It is of type int
and represents feedback the end user provides, which ranges between 0 and 100. You can implement your logic on what good looks like.
deployment.add_metrics(
feedback = { "score": 100 }
)
deployment.addMetrics({
feedback: {
score: 100
}
})
2. Metadata
Metadata refers to additional information or context associated with a text input or output. This information typically details the text, its source, or its purpose. It has a datatype of Dict
and holds key-value pairs of custom fields to attach to the generated logs.
deployment.add_metrics(
metadata={
"custom": "custom_metadata",
"chain_id": "ad1231xsdaABw",
}
)
deployment.addMetrics({
metadata: {
custom: "custom_metadata",
chain_id: "ad1231xsdaABw"
}
})
3. Latency
Latency refers to the time lag or delay between sending a request to the LLM and receiving the corresponding response. It is of the type int
and represents the total time of the request to the LLM provider API in milliseconds.
Note: Logging latency is only needed when using orq.ai as a configuration manager. When using orq.ai as AI Gateway, this is all handled by us.
First, you need to calculate the start and end times of the completion request. The difference in the start and end times is the latency, calculated in milliseconds. You must import the OrquestaPromptMetrics
class into your project to log the latency metric to orq.ai.
Calculate the latency
# Start time of the completion request
start_time = time.time()
# End time of the completion request
end_time = time.time()
# Calculate the difference (latency) in milliseconds
latency = (end_time - start_time) * 1000
// Start time of the completion request
const startTime = new Date().getTime();
// End time of the completion request
const endTime = new Date().getTime();
// Calculate the difference (latency) in milliseconds
const latency = endTime - startTime;
Log the metric:
# log the latency metric
deployment.add_metrics(
performance={
"latency": latency,
"time_to_first_token": 250,
}
)
// log the latency mertic.
deployment.addMetrics({
performance: {
latency: latency,
time_to_first_token: 250
}
})
4. Economics
Note: Logging economics is only needed when using orq.ai Prompts. When using orq.ai Deployments, we handle all of this. The tokens_per_second
metric gets logged automatically by orq.ai.
prompt_tokens
are the total tokens input into the model.completion_tokens
are the token output by the model.total_tokens
are the sum of the prompt tokens and the completion tokens.
This is made available with the help of the OrquestaPromptMetricsEconomics
class contains prompt information about the prompt tokens, completion tokens, and total tokens.
deployment.add_metrics(
usage={
"prompt_tokens": 100,
"completion_tokens": 900,
"total_tokens": 1000,
}
)
deployment.addMetrics({
usage: {
prompt_tokens: 100,
completion_tokens: 900,
total_tokens: 1000
}
})
5. Additional information response
You can also log any other additional information of your choice, such as the chain_id
, conversation_id
and the user_id
.
deployment.add_metrics(
chain_id="c4a75b53-62fa-401b-8e97-493f3d299316",
conversation_id="ee7b0c8c-eeb2-43cf-83e9-a4a49f8f13ea",
user_id="e3a202a6-461b-447c-abe2-018ba4d04cd0"
)
deployment.addMetrics({
chain_id: "c4a75b53-62fa-401b-8e97-493f3d299316",
conversation_id: "ee7b0c8c-eeb2-43cf-83e9-a4a49f8f13ea",
user_id: "e3a202a6-461b-447c-abe2-018ba4d04cd0"
})
Updated 29 days ago