Understand the core concepts and glossary of topics we use throughout orq.ai.
The Context is the environment (dev-test, production, etc.) and setting where the subject that requests the Deployment is being evaluated operates. Your own team defines the Context, which consists of a set of Fields and mirrors your Data Model. Our evaluation engine attempts to match the conditions set in our Configuration Matrix as strictly as possible to the provided Context to provide the correct Return Value.
In the Data model, a Field is a major concept to look for. A Field mirrors a field that occurs within your data model. Each Field has a unique key and type: a Boolean, Date, List, Number, and String. A set of Fields can be combined as a specific Context and sent as a payload for an Evaluation.
Domains enable you to group your Deployments logically. Additionally, you can set specific read, write, and admin permissions per domain for collaboration. For example, a Backend domain where Backend Engineers have write permissions, but the Frontend Engineers and Product Owners only have view permissions.
An Evaluation happens each time an API Call with Context is sent to orq.ai. With our distributed architecture, this Evaluation happens on the Edge and is returned to your systems in milliseconds globally.
The workspace is where you collaborate with your team in setting up Deployments. Each workspace is an isolated environment with unique Deployments, Domains, and Teams.
The AI Gateway is a component that provides access to LLMs. It is responsible for authenticating users and routing requests to the appropriate LLM while handling all DevOps needs, such as logging, security, routing, fallbacks, and retries.
Orq.ai is LLM-agnostic. We get out of your way of working with public, private, and custom LLM Providers and Models. Within the Model Garden, you can enable providers and specific models your product works with.
Orq.ai provides a playground where you can quickly assess each model, generate a response, and evaluate each model's answer's relevance, coherence, and accuracy. You can also compare how well each model understands the nuance or context behind a given prompt, and fine-tuning its provided parameters can help improve the response quality of the selected models.
You can create multiple playgrounds to identify the limitations or flaws in each model's understanding or output generation by creating a side-by-side comparison, making it easier to spot differences in quality and efficiency.
The playground provides an immediate way to judge performance metrics like speed and accuracy; all this is constantly auto-saved, so you don't have to save all the time.
In orq.ai, you maintain and manage the entire lifecycle of your deployments for your AI-infused products. Each Deployment is a combination of the actual prompt, provider, model, and set of hyperparameters. Deployments can also be consumed as a config or invocation. This allows you to design, experiment, and optimize prompts for different use cases as new models continuously emerge.
A Rule is configured with our highly intuitive Configuration Matrix. Anyone who knows Excel can contribute to managing and operating your systems. The matrix contains Fields that are used to evaluate against a Context. Each constellation of Fields can have a unique return value. The Evaluation of each request happens top-to-bottom, and the first matched Context is returned.
A default value is a value that is assigned to a parameter if no value is explicitly provided. Default values ensure that parameters are always assigned a value, even if the user forgets to specify one.
Small blocks of reusable code are provided to demonstrate how to integrate the selected Deployment in various programming environments like Node, Python, or cURL. These snippets serve as a quick reference for developers to understand the required syntax and parameters for making API calls.
A variant is a variation of a prompt and model configuration based on the Context. Variants are used to improve the quality and diversity of the Deployment. They are created by adding business rules to the Context.
A large language model (LLM) is a type of artificial intelligence that can generate and understand human-like text. LLMs are trained on massive datasets of text and code, which allows them to learn the statistical relationships between words and phrases. This knowledge allows them to perform a wide range of tasks, such as:
- Translation: LLMs can translate text from one language to another.
- Summarization: LLMs can generate summaries of long or complex texts.
- Creative writing: LLMs can generate creative text formats, such as poems, code, scripts, musical pieces, emails, letters, etc.
- Question answering: LLMs can answer questions in a comprehensive and informative way.
An LLM Provider is a company or organization that provides access to large language models (LLMs). LLM providers include OpenAI, Cohere, HuggingFace, Anthropic, Google, and Replicate.
The simulator lets you test your business rules based on a provided Context. It provides the capability to verify the configuration matrix to aid in debugging and validating the behavior of your prompts before or after deploying them into a live environment.
Each unique state of a prompt is saved as a different version. This allows you to track changes over time and roll back to previous configurations if necessary. Versions are usually numbered sequentially and may include metadata like who made changes and when.
The average time it takes for the server to return a configuration after it has been requested.
Cost helps teams and organizations understand and manage their LLMOps costs more effectively across multiple providers, models and use cases.
This is the ratio of requests with a context matching a specific value in which the default value is not returned.
Latency is the time it takes for a large language model (LLM) to process a request and return a response. This is an important metric to consider when using LLMs. It can be affected by several factors, including:
- The size and complexity of the LLM
- The type of request (Chat or Completion) being made
- The size of your prompt and/or requested tokens as response
- The provider of the LLM
Monitoring and logging is the process of collecting and analyzing data about the performance and usage of LLMs. This data can be used to identify and troubleshoot problems, improve the performance of LLMs, and understand how LLMs are being used.
P50 is the percentile of latency at which 50% of requests are served. In other words, it is the time it takes for half of all requests to be completed. P50 is an important metric to track because it can help teams identify and address application performance bottlenecks. For example, if the P50 latency is high, it may indicate that the platform cannot handle the volume of requests it is receiving.
The p99 refers to the 99th percentile of a given metric. In other words, the value is higher than 99% of all other values in the metric. p99 is often used to measure the performance of LLMs in production, but it can also be used to measure the quality of LLM outputs.
Score is a measure of the quality of an LLM's output or a metric used to evaluate a language model's performance in a deployment. The score can be calculated using various methods, depending on the specific task. For example, for a task such as answering questions, the score could be calculated based on the human feedback of an end user.
The cumulative number of times a particular remote configuration has been requested over a specified period.
Updated 8 days ago