Retries & Fallbacks
Set up your deployment to include a fallback model for situations where your primary model encounters provider or configuration issues.
A unique configuration is available when configuring a Primary Model on your Deployments: Retries.
What are retries & fallbacks
In the event of a failure, you can specify the number of times a query will be retried using the primary model.
After the defined amount of Retries on your Primary Model, orq.ai will seamlessly use your Fallback Model.
Having set up retries and fallbacks makes your application m. It is important to be resilient to issues with third-party APIs and make sure that your deployment responds optimally to users.
Common use cases
Context Window
One common use case is having a Fallback Model able to handle a bigger context window than the Primary Model is capable of. The Fallback Model is usually more costly than the Primary Model. It will be triggered only if the user query cannot be handled within the Primary Model's context window.
This makes for an optimized configuration: your spending will be controlled depending on user inputs, and your users won't see any error if their query is sent to your fallback model.
Provider Fault Tolerance
One model can be accessed through different providers (for instance, gpt-4 is available through OpenAI and Azure).
Having your main model from one provider with a fallback accessible on another ensures that if a third party is unreachable, your clients will still get a response.
Similarly, it can happen that you hit rate limits while querying one provider. Having a contingency plan in place is a great way to minimize production issues.
You can also use different API keys to the same Provider by using our Multi API key selector.
Configuring a Fallback Model
The Fallback Model can have a different configuration than the Primary Model.
Updated about 1 month ago