Logging & monitoring LLM

In this guide, you will learn how you can log different LLM metrics such as metadata, latency, score, LLM response and economics (prompt_tokens, completion_tokens, total_tokens ) to orq.ai.

The reserved keywords are:

  1. Scoring
  2. Metadata
  3. Latency
  4. LLM response
  5. Economics

You need to have the orq.ai Python or NodeJs SDK installed.

#pip installation

pip install orquesta-sdk
//  Node installation

npm install @orquesta/node --save
yarn add @orquesta/node

1. Scoring

Scoring serves as the compass that guides the LLM's responses toward coherence, relevance, and accuracy. Some methods to get this include thumbs up/down, rating stars, and end-user feedback. It is of type int and represents feedback the end user provides, which ranges between 0 and 100. You can implement your logic on what what good looks like.

deployment.add_metrics(
  feedback = { "score": 100 }
)
deployment.addMetrics({
  feedback: {
    score: 100
  }
})

2. Metadata

Metadata refers to additional information or context associated with a text input or output. This information typically details the text, its source, or its purpose. It has a datatype of Dict and holds key-value pairs of custom fields to attach to the generated logs.

deployment.add_metrics(
  metadata={
      "custom": "custom_metadata",
      "chain_id": "ad1231xsdaABw",
  }
)
deployment.addMetrics({
  metadata: {
    custom: "custom_metadata",
    chain_id: "ad1231xsdaABw"
  }
})

3. Latency

Latency refers to the time lag or delay between sending a request to the LLM and receiving the corresponding response. It is of the type int and represents the total time of the request to the LLM provider API in milliseconds.

🚧

Note: Logging latency is only needed when using orq.ai as a configuration manager. When using orq.ai as AI Gateway, this is all handled by us.

First, you need to calculate the start and end times of the completion request. The difference in the start and end times is the latency, calculated in milliseconds. You must import the OrquestaPromptMetrics class into your project to log the latency metric to orq.ai.

Calculate the latency

# Start time of the completion request
start_time = time.time()

# End time of the completion request
end_time = time.time()

# Calculate the difference (latency) in milliseconds
latency = (end_time - start_time) * 1000
// Start time of the completion request
const startTime = new Date().getTime();


// End time of the completion request
const endTime = new Date().getTime();

// Calculate the difference (latency) in milliseconds
const latency = endTime - startTime;

Log the metric:

# log the latency metric

deployment.add_metrics(
  performance={
      "latency": latency,
    	"time_to_first_token": 250,
  }
)
// log the latency mertic.  
deployment.addMetrics({
  performance: {
    latency: latency,
    time_to_first_token: 250
  }
})

4. Economics

Note: Logging economics is only needed when using orq.ai Prompts. When using orq.ai Deployments, we handle all of this. The tokens_per_second metric gets logged automatically by orq.ai.

  • prompt_tokens are the total tokens input into the model.
  • completion_tokens are the token output by the model.
  • total_tokens are the sum of the prompt tokens and the completion tokens.

This is made available with the help of the OrquestaPromptMetricsEconomics class contains prompt information about the prompt tokens, completion tokens, and total tokens.

deployment.add_metrics(
  usage={
      "prompt_tokens": 100,
      "completion_tokens": 900,
      "total_tokens": 1000,
  }
)
deployment.addMetrics({
  usage: {
    prompt_tokens: 100,
    completion_tokens: 900,
    total_tokens: 1000
  }
})

5. Additional information response

You can also log any other additional information of your choice, such as the chain_id, conversation_id and the user_id.

deployment.add_metrics(
  chain_id="c4a75b53-62fa-401b-8e97-493f3d299316",
  conversation_id="ee7b0c8c-eeb2-43cf-83e9-a4a49f8f13ea",
  user_id="e3a202a6-461b-447c-abe2-018ba4d04cd0"
)
deployment.addMetrics({
  chain_id: "c4a75b53-62fa-401b-8e97-493f3d299316",
  conversation_id: "ee7b0c8c-eeb2-43cf-83e9-a4a49f8f13ea",
  user_id: "e3a202a6-461b-447c-abe2-018ba4d04cd0"
})