OpenAI's latest models can be found in the model garden.

When to use what model

  • o-series models are purpose-built for reasoning and tool-use. Choose o3 when raw capability beats everything else; use o4-mini when you still need strong reasoning but want 10× cheaper tokens and snappier replies.
  • GPT-4.1 family is the evolution of the “generalist” GPT line. The full model gives you 1 M-token context with strong instruction following; mini trims cost/latency for mainstream workloads; nano pushes pricing and speed to the floor for simple or massive-scale tasks.
FeatureGPT-4.1GPT-4.1 miniGPT-4.1 nanoo3o4 mini
Training cutoff dateMay 2024May 2024May 2024May 2024May 2024
Context window1M1M1M200k200k
Max output tokens32.76832.76832.768100k100k
Input price$2/M Tokens$0.40/M Tokens$0.10/M Tokens$10/M tokens$1.10/M tokens
Output price$8/M Tokens$1.60/M Tokens$0.40/M Tokens$40/M tokens$4.40/M tokens
LatencyModerateFastVery fastSlowModerate
When to use?Long-context chat, knowledge-work, multimodal apps needing top GPT qualityEveryday product features, prototyping & chat where speed and price countReal-time, latency-critical or large-scale batch jobs on a tight budgetDeep research, complex multi-step reasoning, high-stakes coding/science tasksReasoning workloads where cost & throughput matter; API agents, math & data-science

To ensure fair usage and consistent performance, we’ve introduced rate limits for all Orq.ai APIs on a per-account basis. This helps prevent server overload, reduces risk of abuse, and keeps costs manageable.

If your account exceeds its rate limit, you’ll receive a 429 Too Many Requests response.

Rate limits vary by subscription tier:

SubscriptionRate Limit for Deployment API callsRate Limit for other API callsLog retention
Sandbox100 API calls/minute20 API calls/minute3 days
Team (Legacy)1000 API calls/minute50 API calls/minute14 days
Pro2000 API calls/minute100 API calls/minute30 days
EnterpriseCustomCustomCustom

For more details, refer to our pricing page.

You can now enable Agentic RAG in your deployments to improve the relevance of retrieved context and the overall output quality.

Once toggled on, simply select a model to act as the agent. The agent will automatically check if the retrieved knowledge chunks are relevant to the user query. If they aren’t, it rewrites the query — preserving the original intent — to improve retrieval results.

This iterative refinement loop increases the chance of surfacing useful context from your Knowledge Base, giving the language model better grounding to generate high-quality, reliable responses. The setup includes two key components:

  • Document Grading Agent – determines if relevant chunks were retrieved.
  • Query Refinement Agent – rewrites the query if needed.

See the screenshot below on how the input query gets refined. Input query: 'is my suitcase too big?' is reformulated to > 'luggage size requirements and restrictions for carry-on and checked baggage'.


How to enable this? Just toggle the Agentic RAG feature on and select your model.


This feature is part of our ongoing effort to help you ship more robust AI features, let us know if you have any feedback!

We’ve added a new Threads view to help you make sense of multi-step conversations at a glance.

Each thread captures the full back-and-forth between the user and the assistant, showing just the inputs and outputs per step—without the technical breakdowns like embeddings, rerankers, or guardrails. It’s ideal for reviewing how the conversation unfolds across multiple steps.

The Thread Overview includes:

  • Start time and last update
  • Total duration
  • Number of traces (steps)
  • Total cost and tokens used
  • Project name and session ID
  • Custom tags to help categorize threads

You can customize which columns are visible to tailor the overview to your needs.

To narrow things down, you can filter threads by:

  • Project
  • Tags
  • Start date

This view complements the existing Traces tab, which remains the place for inspecting each individual LLM call in detail.


📘

To start using the Threads function add a thread id during the deployment invocation, to learn more, see Threads.


The threads overview

The threads overview


Multi-step conversation

Multi-step conversation

We’ve added Jina AI as a new provider, bringing a suite of high-performance models to the platform:

  • Multilingual Embedding Models: Use Jina AI’s embedding models—like jina-embeddings-v3, supporting 89 languages and sequences up to 8,192 tokens.
  • Efficient Rerank Models: Choose from three reranker variations:
    • Tiny – optimized for speed and low memory use.
    • Turbo – a balance between speed and accuracy.
    • Base – best-in-class accuracy for critical use cases.

These models strengthen both multilingual coverage and reranking performance across a wide range of applications.


Previously, creating and managing a knowledge base required using the Orq UI. Users had to manually create a knowledge base, upload one file at a time, and define a chunking strategy per datasource. While simple, this process was time-consuming, especially for developers looking to scale or automate their workflows.

With the new Knowledge Base API, all major operations are now available programmatically. You can create and manage knowledge bases, upload datasources, generate or manage chunks, and perform retrieval-related actions, all through code.

This opens up much more flexibility, especially for teams working on complex chunks. You can now:
Perform chunking on your own side, tailored to your data structure
Bring your own chunks—and even include your own embeddings
• Let Orq embed the chunks if embeddings aren’t provided, using the model specified in your API call

Note: Attaching custom metadata to chunks isn’t supported yet but will be added soon.

Available operations:
Knowledge bases: create, list, retrieve, update, delete
Datasources: create, list, retrieve, update, delete
Chunks: create, list, retrieve, update, delete
Search: inspect what’s being retrieved

Whether you’re working with a single file or a dynamic content pipeline, this update makes the knowledge base workflow faster, more flexible, and developer-friendly.

📘

Read more on the Knowledge API in our docs or in the SDKs


We’ve added a new feature called the Hub — a library where you can browse and reuse common evaluators and prompts for your projects.

Instead of starting from a blank screen, you can now add ready-made components with one click. Some can be fully customized after adding them to your project (like prompts or classification evaluators), while others — such as traditional NLP metrics like cosine similarity — are fixed by design.

The goal: make it easier to start with off the shelf evals and prompts.

At launch, the Hub includes prompts and evaluators. Soon, you’ll also find datasets, tools, and other entities here to further speed up your workflow.

You can find the Hub in the left-hand menu of your workspace.

AI workflows can feel like a black box—when something goes wrong, it’s hard to know why. Tracing changes that by giving you full visibility into every step of your workflow. Instead of guessing why an LLM output is wrong, you can quickly check every step in the workflow—saving time and reducing frustration.

With this release, you can inspect all events that occur within a trace, including:

  • Retrieval – See which knowledge chunks were fetched.
  • Embedding & Reranking – Understand how inputs are processed and prioritized.
  • LLM Calls – Track prompts, responses, and latency.
  • Evaluation & Guardrails – Ensure quality control in real time.
  • Cache Usage – Spot inefficiencies in repeated queries.
  • Fallbacks & Retries – Detect when your system auto-recovers from failures.

This level of observability helps teams debug faster, optimize workflows, and make data-driven improvements.


Example of a trace from a RAG bot that has two evals

Example of a trace from a RAG bot that has 1 evaluator


Billing impact - Event count

With the introduction of Traces, all the events seen in the overview will count towards the number of events. This has direct impact on the billing.

An example: A chat request with 2 evaluators was historically counted as 1 request but will now be counted as 3 events.

When building AI features, ensuring high-quality and reliable outputs is crucial. Orq.ai allows you to implement custom evaluators in Python, giving you full control over how AI-generated content is assessed and validated.

Benefits of Using Python for Evaluators

  • Flexibility & Customization

Python enables you to define evaluation logic that precisely matches your needs, whether it’s scoring relevance, detecting biases, or enforcing style guidelines.

  • Seamless Integration

Orq.ai supports direct integration of Python-based evaluators, allowing you to run checks on AI outputs within your workflow without extra tooling.

  • Preloaded with NumPy

Your Python evaluators can leverage NumPy (v1.26.4) for numerical computations, making it easier to apply statistical methods, calculate custom scores, or analyze AI responses efficiently.

  • Automated & Scalable

Python evaluators run automatically on AI responses, ensuring continuous quality control and reducing manual review efforts.


Example python eval

Example python eval



📘

Read here how to set up a custom Python Evaluator, see our docs

Previously, once an entity was created, it was locked in place—you couldn’t move it to another project or directory. Now, you finally can.

With this new feature, users can seamlessly move entities and directories to different projects and directories, ensuring better organization and flexibility in managing data.

Key Details:

  • Move entities and directories freely between projects and directories.
  • Maintain project-level permissions—entities will only be visible to the selected teams assigned to that project.
  • Improve workflow efficiency by keeping related entities together where they belong.

How It Works

This update gives you more control over your workspace, making it easier to structure your data the way you need it.

Try it out and optimize your project structure today!