majordomo-llm¶
A unified async Python interface for multiple LLM providers with built-in cost tracking, automatic retries, and structured outputs.
Why majordomo-llm?¶
Building with LLMs often means dealing with:
- Different APIs for each provider — OpenAI, Anthropic, and Gemini all have different client libraries and response formats
- Hidden costs — Token usage and spending are hard to track across providers
- Fragile integrations — When one provider goes down, your application goes down
- Inconsistent structured outputs — Each provider handles JSON schemas differently
majordomo-llm solves these problems with a single, consistent interface that works across all major providers.
Quick Example¶
import asyncio
from pydantic import BaseModel
from majordomo_llm import get_llm_instance
class Summary(BaseModel):
title: str
key_points: list[str]
word_count: int
async def main():
# Works with any provider: openai, anthropic, gemini, deepseek, cohere
llm = get_llm_instance("anthropic", "claude-sonnet-4-20250514")
response = await llm.get_structured_json_response(
response_model=Summary,
user_prompt="Summarize the benefits of async programming in Python",
)
print(response.content.title)
print(response.content.key_points)
print(f"Cost: ${response.total_cost:.6f}")
asyncio.run(main())
Key Features¶
Unified Provider Interface¶
Write once, run on any provider. Switch between OpenAI, Anthropic, Gemini, DeepSeek, and Cohere with a single line change.
llm = get_llm_instance("openai", "gpt-4o")
llm = get_llm_instance("anthropic", "claude-sonnet-4-20250514")
llm = get_llm_instance("gemini", "gemini-2.5-flash")
Structured Outputs with Pydantic¶
Get validated, typed Python objects instead of raw JSON. Provider-specific implementation details are handled internally.
response = await llm.get_structured_json_response(
response_model=MyPydanticModel,
user_prompt="Extract data from this text...",
)
result: MyPydanticModel = response.content # Fully typed
Built-in Cost Tracking¶
Every response includes token counts and calculated costs. No external tracking needed.
print(f"Tokens: {response.input_tokens} in / {response.output_tokens} out")
print(f"Cost: ${response.total_cost:.6f}")
Cascade Failover¶
Automatically fall back to alternative providers when one fails.
from majordomo_llm import LLMCascade
cascade = LLMCascade([
("anthropic", "claude-sonnet-4-20250514"),
("openai", "gpt-4o"),
("gemini", "gemini-2.5-flash"),
])
response = await cascade.get_response("Hello!") # Tries each until one succeeds
Optional Request Logging¶
Persist all requests for analytics, debugging, and compliance with pluggable database and storage adapters.
from majordomo_llm.logging import LoggingLLM, SqliteAdapter, FileStorageAdapter
db = await SqliteAdapter.create("logs.db")
storage = await FileStorageAdapter.create("./request_logs")
logged_llm = LoggingLLM(llm, db, storage)
Supported Providers¶
| Provider | Recent Models |
|---|---|
| OpenAI | gpt-5, gpt-5-mini, gpt-4.1, gpt-4.1-mini, gpt-4o |
| Anthropic | claude-sonnet-4.5, claude-opus-4.1, claude-sonnet-4, claude-3.5-haiku |
| Google Gemini | gemini-2.5-flash, gemini-2.0-flash |
| DeepSeek | deepseek-chat, deepseek-reasoner |
| Cohere | command-a, command-r-plus, command-r |
All providers support structured outputs. Additional models are available—see llm_config.yaml for the complete list with pricing.
Next Steps¶
- Getting Started — Installation and quickstart
- Core Concepts — Understand the key capabilities
- Recipes — Practical examples and patterns
- API Reference — Detailed API documentation