Experiments v2 - Evaluate your LLM Config
We’re excited to introduce Experiments V2, a major upgrade to our Experiments module that makes testing, evaluating, and benchmarking models and prompts more intuitive and flexible than ever before.
DeepSeek Models Now Available: 67B Chat, R1, and V3
We are excited to announce the integration of DeepSeek’s latest AI models—67B Chat, R1, and V3—into our platform.
OpenAI’s Latest Small Reasoning Model – o3-mini
Start using OpenAI’s newest and most advanced ‘small’ reasoning model: o3-mini.
Llama 3.3 70b & Llama Guard 3 are now available through Together AI
Experience the power of the latest Llama 3.3 70b and Llama Guard 3 models on Orq, integrated via Together AI.
New Layout with Project Structure
We’re introducing a new project structure UI to help you organize and manage your resources more effectively. With projects, you can group your work by use case, environment, or any logical structure that suits your needs.
Online Guardrails in Live Deployments
You can now configure Guardrails after you have added them to your Library directly in Deployments > Settings for both input and output, giving you full control over Deployment responses
HTTP and JSON Evaluators and Guardrails
You can now add HTTP and JSON Evaluators and Guardrails under the Evaluator tab and add them to your Deployment or Experiment.
Master Your RAG with RAGAS Evals
The Ragas Evaluators are now available, providing specialized tools to evaluate retrieval-augmented generation (RAG) workflows. These evaluators make it easy to set up quality checks when integrating a Knowledge Base into a RAG system and can be used in Experiments and Deployment to ensure responses are accurate, relevant, and safe.
Evaluator Library: 50+ Ready-to-Use and Tailorable Evaluators
Introducing the new Evaluator Library:
Improved LLMs as a Judge
LLM-as-a-Judge Enhancements:
We’ve significantly improved our existing LLM Evaluator feature to provide more robust evaluation capabilities and enforce type-safe outputs.