Cache your LLM response

Cache and reuse your LLM outputs for near-instant responses, reduced costs, and consistent results.

What is Caching?

With this release, you can now enable caching for your LLM calls. Caching works by storing both the input and the generated output for a configurable period (TTL – Time-to-Live). When an exact request is made during this time, the cached output is returned instantly, bypassing the need for the LLM to generate a new response.

Why Enable Caching?

Cost Savings

By reusing previously generated responses, caching reduces the number of LLM calls, which can significantly cut down on usage costs.

Time Savings

Cached responses are nearly instant, offering a much faster response time compared to generating a new output from the LLM.

Improved Consistency

Caching mitigates the inherent variability of LLM responses by ensuring that identical inputs produce the same outputs during the cache period.

How to set up Caching:

Go to your Deployment > settings > enable caching > configure TTL

How to check the Cache status:

Logs > select specific log > request tab > caching status

In here you will see if the cache was hit or not and the configured TTL (time-to-leave) in seconds.

Enabling Cache and checking the Cache status in Orq.ai