Orq MCP is live: Use natural language to interrogate traces, spot regressions, and experiment your way to optimal AI configurations. Available in Claude Desktop, Claude Code, Cursor, and more. Start now →
Cache identical LLM requests to reduce latency by 95% and cut API costs. Configure TTL, exact match caching, and optimize response times for repeated queries.
Use Cases
Eliminating redundant costs on repeated identical queries (FAQs, product lookups).
Speeding up development and test loops by caching fixture requests.
Serving the same prompt to many concurrent users without paying per call.
Reducing tail latency on frequently-called endpoints.
The examples below use the Chat Completions endpoint. The same cache parameter applies to the Responses API: replace chat.completions.create(...) with responses.create(...).
curl -X POST https://api.orq.ai/v3/router/chat/completions \ -H "Authorization: Bearer $ORQ_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-4o", "messages": [ { "role": "user", "content": "Explain the benefits of renewable energy for businesses" } ], "cache": { "type": "exact_match", "ttl": 3600 } }'