Cache your LLM response
Cache and reuse your LLM outputs for near-instant responses, reduced costs, and consistent results.
What is Caching?
With this release, you can now enable caching for your LLM calls. Caching works by storing both the input and the generated output for a configurable period (TTL – Time-to-Live). When an exact request is made during this time, the cached output is returned instantly, bypassing the need for the LLM to generate a new response.
Why Enable Caching?
- Cost Savings
By reusing previously generated responses, caching reduces the number of LLM calls, which can significantly cut down on usage costs.
- Time Savings
Cached responses are nearly instant, offering a much faster response time compared to generating a new output from the LLM.
- Improved Consistency
Caching mitigates the inherent variability of LLM responses by ensuring that identical inputs produce the same outputs during the cache period.
How to set up Caching:
Go to your Deployment > settings > enable caching > configure TTL
How to check the Cache status:
Logs > select specific log > request tab > caching status
In here you will see if the cache was hit or not and the configured TTL (time-to-leave) in seconds.