Cascade Failover¶
The Single-Provider Risk¶
Relying on a single LLM provider creates availability risks:
- Outages: Every provider experiences downtime
- Rate limits: High-traffic applications hit quota limits
- Regional issues: Some providers have region-specific problems
- Cost optimization: Different providers may be cheaper for different use cases
Manual failover logic is error-prone and clutters application code.
Automatic Multi-Provider Failover¶
LLMCascade wraps multiple provider configurations and automatically fails over when errors occur:
- Tries providers in priority order
- Catches
ProviderErrorexceptions and moves to the next provider - Returns the first successful response
- Supports all LLM methods (
get_response,get_json_response,get_structured_json_response)
Basic Usage¶
from majordomo_llm import LLMCascade
cascade = LLMCascade([
("anthropic", "claude-sonnet-4-20250514"), # Primary: preferred provider
("openai", "gpt-4o"), # Secondary: reliable fallback
("gemini", "gemini-2.5-flash"), # Tertiary: cost-effective backup
])
# Automatically tries providers in order until one succeeds
response = await cascade.get_response(
user_prompt="Summarize this document",
system_prompt="Be concise.",
)
print(response.content)
Failover Behavior¶
- Request sent to Anthropic
- If Anthropic returns
ProviderError(rate limit, outage, etc.), try OpenAI - If OpenAI fails, try Gemini
- If all fail, raise the last
ProviderError
Important: Only ProviderError triggers failover. Application-level errors (bad prompts, validation failures) are not retried.
Strategy Tips¶
- Diversify vendors: Don't use multiple models from the same provider
- Consider capabilities: Ensure fallback models support your use case (structured outputs, context length, etc.)
- Monitor which provider serves requests: Log the provider used for capacity planning
# Each provider has built-in retries (3 attempts with exponential backoff)
# Cascade adds cross-provider resilience on top of per-provider retries
Next Steps¶
See the Cascade Failover recipe for more configuration examples.