Base Classes¶

Bases: ABC

Abstract base class for LLM provider implementations.

Provides a unified interface for interacting with different LLM providers (OpenAI, Anthropic, Gemini) with automatic retry logic and cost tracking.

Subclasses must implement the :meth:get_response method. Other methods have default implementations that can be overridden for provider-specific optimizations.

Attributes:

Name	Type	Description
`provider`		The LLM provider name (e.g., "openai", "anthropic", "gemini").
`model`		The specific model identifier (e.g., "gpt-4o", "claude-sonnet-4-20250514").
`input_cost`		Cost per million input tokens in USD.
`output_cost`		Cost per million output tokens in USD.
`supports_temperature_top_p`		Whether the model supports temperature/top_p params.
`use_web_search`		Whether to enable web search (Anthropic only).
`api_key_hash`		Truncated SHA256 hash of the API key (for logging).
`api_key_alias`		Optional human-readable name for the API key.

Example

from majordomo_llm import get_llm_instance llm = get_llm_instance("anthropic", "claude-sonnet-4-20250514") response = await llm.get_response("What is 2+2?") print(response.content) 4 print(f"Cost: ${response.total_cost:.6f}")

Source code in src/majordomo_llm/base.py

class LLM(ABC):
    """Abstract base class for LLM provider implementations.

    Provides a unified interface for interacting with different LLM providers
    (OpenAI, Anthropic, Gemini) with automatic retry logic and cost tracking.

    Subclasses must implement the :meth:`get_response` method. Other methods
    have default implementations that can be overridden for provider-specific
    optimizations.

    Attributes:
        provider: The LLM provider name (e.g., "openai", "anthropic", "gemini").
        model: The specific model identifier (e.g., "gpt-4o", "claude-sonnet-4-20250514").
        input_cost: Cost per million input tokens in USD.
        output_cost: Cost per million output tokens in USD.
        supports_temperature_top_p: Whether the model supports temperature/top_p params.
        use_web_search: Whether to enable web search (Anthropic only).
        api_key_hash: Truncated SHA256 hash of the API key (for logging).
        api_key_alias: Optional human-readable name for the API key.

    Example:
        >>> from majordomo_llm import get_llm_instance
        >>> llm = get_llm_instance("anthropic", "claude-sonnet-4-20250514")
        >>> response = await llm.get_response("What is 2+2?")
        >>> print(response.content)
        4
        >>> print(f"Cost: ${response.total_cost:.6f}")
    """

    def __init__(
        self,
        provider: str,
        model: str,
        input_cost: float,
        output_cost: float,
        supports_temperature_top_p: bool = True,
        use_web_search: bool = False,
        api_key: str | None = None,
        api_key_alias: str | None = None,
    ) -> None:
        """Initialize the LLM instance.

        Args:
            provider: The LLM provider name.
            model: The model identifier.
            input_cost: Cost per million input tokens in USD.
            output_cost: Cost per million output tokens in USD.
            supports_temperature_top_p: Whether temperature/top_p are supported.
            use_web_search: Enable web search capability (Anthropic only).
            api_key: The API key (used to compute hash for logging).
            api_key_alias: Optional human-readable name for the API key.
        """
        self.provider = provider
        self.model = model
        self.input_cost = input_cost
        self.output_cost = output_cost
        self.supports_temperature_top_p = supports_temperature_top_p
        self.use_web_search = use_web_search
        self.api_key_hash = _hash_api_key(api_key) if api_key else None
        self.api_key_alias = api_key_alias

    def get_full_model_name(self) -> str:
        """Get the fully qualified model name.

        Returns:
            Model name in the format "provider:model" (e.g., "anthropic:claude-sonnet-4-20250514").
        """
        return f"{self.provider}:{self.model}"

    def _calculate_costs(
        self, input_tokens: int, output_tokens: int
    ) -> tuple[float, float, float]:
        """Calculate costs for a request.

        Args:
            input_tokens: Number of input tokens.
            output_tokens: Number of output tokens.

        Returns:
            Tuple of (input_cost, output_cost, total_cost) in USD.
        """
        input_cost = (input_tokens * self.input_cost) / TOKENS_PER_MILLION
        output_cost = (output_tokens * self.output_cost) / TOKENS_PER_MILLION
        return input_cost, output_cost, input_cost + output_cost

    @abstractmethod
    async def get_response(
        self,
        user_prompt: str,
        system_prompt: str | None = None,
        temperature: float = 0.3,
        top_p: float = 1.0,
    ) -> LLMResponse:
        """Get a plain text response from the LLM.

        Args:
            user_prompt: The user's input prompt.
            system_prompt: Optional system prompt to set context/behavior.
            temperature: Sampling temperature (0.0-2.0). Lower is more deterministic.
            top_p: Nucleus sampling parameter (0.0-1.0).

        Returns:
            LLMResponse containing the text content and usage metrics.

        Raises:
            Exception: If the API request fails after retries.
        """
        raise NotImplementedError()

    @retry(wait=wait_random_exponential(min=0.2, max=1), stop=stop_after_attempt(3))
    async def get_json_response(
        self,
        user_prompt: str,
        system_prompt: str | None = None,
        temperature: float = 0.3,
        top_p: float = 1.0,
    ) -> LLMJSONResponse:
        """Get a JSON response from the LLM.

        Automatically parses the LLM's text response as JSON.

        Args:
            user_prompt: The user's input prompt.
            system_prompt: Optional system prompt to set context/behavior.
            temperature: Sampling temperature (0.0-2.0). Lower is more deterministic.
            top_p: Nucleus sampling parameter (0.0-1.0).

        Returns:
            LLMJSONResponse containing the parsed JSON dict and usage metrics.

        Raises:
            ResponseParsingError: If the response cannot be parsed as JSON.
            Exception: If the API request fails after retries.
        """
        response = await self.get_response(user_prompt, system_prompt, temperature, top_p)
        # Strip markdown code fencing if present
        content = response.content.replace("```json", "").replace("```", "").strip()
        try:
            parsed_content = json.loads(content)
        except json.JSONDecodeError as e:
            raise ResponseParsingError(
                f"Failed to parse JSON response: {e}",
                raw_content=response.content,
            ) from e
        return LLMJSONResponse(
            content=parsed_content,
            input_tokens=response.input_tokens,
            output_tokens=response.output_tokens,
            cached_tokens=response.cached_tokens,
            input_cost=response.input_cost,
            output_cost=response.output_cost,
            total_cost=response.total_cost,
            response_time=response.response_time,
        )

    async def get_structured_json_response(
        self,
        response_model: type[T],
        user_prompt: str,
        system_prompt: str | None = None,
        temperature: float = 0.3,
        top_p: float = 1.0,
    ) -> LLMStructuredResponse:
        """Get a structured response validated against a Pydantic model.

        Uses provider-specific mechanisms (tool calling, response schemas) to
        ensure the response conforms to the specified Pydantic model schema.

        Args:
            response_model: Pydantic model class defining the expected structure.
            user_prompt: The user's input prompt.
            system_prompt: Optional system prompt to set context/behavior.
            temperature: Sampling temperature (0.0-2.0). Lower is more deterministic.
            top_p: Nucleus sampling parameter (0.0-1.0).

        Returns:
            LLMStructuredResponse containing the validated Pydantic model instance.

        Raises:
            pydantic.ValidationError: If the response doesn't match the model schema.
            Exception: If the API request fails after retries.

        Example:
            >>> from pydantic import BaseModel
            >>> class Person(BaseModel):
            ...     name: str
            ...     age: int
            >>> response = await llm.get_structured_json_response(
            ...     response_model=Person,
            ...     user_prompt="Extract: John is 30 years old",
            ... )
            >>> print(response.content.name)
            John
        """
        response = await self._get_structured_response(
            response_model=response_model,
            user_prompt=user_prompt,
            system_prompt=system_prompt,
            temperature=temperature,
            top_p=top_p,
        )
        parsed_content = response_model.model_validate(response.content)

        return LLMStructuredResponse(
            content=parsed_content,
            input_tokens=response.input_tokens,
            output_tokens=response.output_tokens,
            cached_tokens=response.cached_tokens,
            input_cost=response.input_cost,
            output_cost=response.output_cost,
            total_cost=response.total_cost,
            response_time=response.response_time,
        )

    async def _get_structured_response(
        self,
        response_model: type[T],
        user_prompt: str,
        system_prompt: str | None = None,
        temperature: float = 0.3,
        top_p: float = 1.0,
    ) -> LLMJSONResponse:
        """Provider-specific implementation for structured responses.

        Default implementation injects the JSON schema into the system prompt.
        Providers should override this to use native structured output features.

        Args:
            response_model: Pydantic model class defining the expected structure.
            user_prompt: The user's input prompt.
            system_prompt: Optional system prompt to set context/behavior.
            temperature: Sampling temperature (0.0-2.0).
            top_p: Nucleus sampling parameter (0.0-1.0).

        Returns:
            LLMJSONResponse containing the parsed JSON content.
        """
        schema = response_model.model_json_schema()
        combined_system_prompt = build_schema_prompt(schema, system_prompt)

        if self.supports_temperature_top_p:
            return await self.get_json_response(
                user_prompt, combined_system_prompt, temperature, top_p
            )
        else:
            return await self.get_json_response(user_prompt, combined_system_prompt)

init ¶

__init__(
    provider,
    model,
    input_cost,
    output_cost,
    supports_temperature_top_p=True,
    use_web_search=False,
    api_key=None,
    api_key_alias=None,
)

Initialize the LLM instance.

Parameters:

Name	Type	Description	Default
`provider`	`str`	The LLM provider name.	required
`model`	`str`	The model identifier.	required
`input_cost`	`float`	Cost per million input tokens in USD.	required
`output_cost`	`float`	Cost per million output tokens in USD.	required
`supports_temperature_top_p`	`bool`	Whether temperature/top_p are supported.	`True`
`use_web_search`	`bool`	Enable web search capability (Anthropic only).	`False`
`api_key`	`str \| None`	The API key (used to compute hash for logging).	`None`
`api_key_alias`	`str \| None`	Optional human-readable name for the API key.	`None`

Source code in src/majordomo_llm/base.py

def __init__(
    self,
    provider: str,
    model: str,
    input_cost: float,
    output_cost: float,
    supports_temperature_top_p: bool = True,
    use_web_search: bool = False,
    api_key: str | None = None,
    api_key_alias: str | None = None,
) -> None:
    """Initialize the LLM instance.

    Args:
        provider: The LLM provider name.
        model: The model identifier.
        input_cost: Cost per million input tokens in USD.
        output_cost: Cost per million output tokens in USD.
        supports_temperature_top_p: Whether temperature/top_p are supported.
        use_web_search: Enable web search capability (Anthropic only).
        api_key: The API key (used to compute hash for logging).
        api_key_alias: Optional human-readable name for the API key.
    """
    self.provider = provider
    self.model = model
    self.input_cost = input_cost
    self.output_cost = output_cost
    self.supports_temperature_top_p = supports_temperature_top_p
    self.use_web_search = use_web_search
    self.api_key_hash = _hash_api_key(api_key) if api_key else None
    self.api_key_alias = api_key_alias

get_full_model_name ¶

get_full_model_name()

Get the fully qualified model name.

Returns:

Type	Description
`str`	Model name in the format "provider:model" (e.g., "anthropic:claude-sonnet-4-20250514").

Source code in src/majordomo_llm/base.py

def get_full_model_name(self) -> str:
    """Get the fully qualified model name.

    Returns:
        Model name in the format "provider:model" (e.g., "anthropic:claude-sonnet-4-20250514").
    """
    return f"{self.provider}:{self.model}"

get_json_response `async` ¶

get_json_response(
    user_prompt,
    system_prompt=None,
    temperature=0.3,
    top_p=1.0,
)

Get a JSON response from the LLM.

Automatically parses the LLM's text response as JSON.

Parameters:

Name	Type	Description	Default
`user_prompt`	`str`	The user's input prompt.	required
`system_prompt`	`str \| None`	Optional system prompt to set context/behavior.	`None`
`temperature`	`float`	Sampling temperature (0.0-2.0). Lower is more deterministic.	`0.3`
`top_p`	`float`	Nucleus sampling parameter (0.0-1.0).	`1.0`

Returns:

Type	Description
`LLMJSONResponse`	LLMJSONResponse containing the parsed JSON dict and usage metrics.

Raises:

Type	Description
`ResponseParsingError`	If the response cannot be parsed as JSON.
`Exception`	If the API request fails after retries.

Source code in src/majordomo_llm/base.py

@retry(wait=wait_random_exponential(min=0.2, max=1), stop=stop_after_attempt(3))
async def get_json_response(
    self,
    user_prompt: str,
    system_prompt: str | None = None,
    temperature: float = 0.3,
    top_p: float = 1.0,
) -> LLMJSONResponse:
    """Get a JSON response from the LLM.

    Automatically parses the LLM's text response as JSON.

    Args:
        user_prompt: The user's input prompt.
        system_prompt: Optional system prompt to set context/behavior.
        temperature: Sampling temperature (0.0-2.0). Lower is more deterministic.
        top_p: Nucleus sampling parameter (0.0-1.0).

    Returns:
        LLMJSONResponse containing the parsed JSON dict and usage metrics.

    Raises:
        ResponseParsingError: If the response cannot be parsed as JSON.
        Exception: If the API request fails after retries.
    """
    response = await self.get_response(user_prompt, system_prompt, temperature, top_p)
    # Strip markdown code fencing if present
    content = response.content.replace("```json", "").replace("```", "").strip()
    try:
        parsed_content = json.loads(content)
    except json.JSONDecodeError as e:
        raise ResponseParsingError(
            f"Failed to parse JSON response: {e}",
            raw_content=response.content,
        ) from e
    return LLMJSONResponse(
        content=parsed_content,
        input_tokens=response.input_tokens,
        output_tokens=response.output_tokens,
        cached_tokens=response.cached_tokens,
        input_cost=response.input_cost,
        output_cost=response.output_cost,
        total_cost=response.total_cost,
        response_time=response.response_time,
    )

get_response `abstractmethod` `async` ¶

get_response(
    user_prompt,
    system_prompt=None,
    temperature=0.3,
    top_p=1.0,
)

Get a plain text response from the LLM.

Parameters:

Name	Type	Description	Default
`user_prompt`	`str`	The user's input prompt.	required
`system_prompt`	`str \| None`	Optional system prompt to set context/behavior.	`None`
`temperature`	`float`	Sampling temperature (0.0-2.0). Lower is more deterministic.	`0.3`
`top_p`	`float`	Nucleus sampling parameter (0.0-1.0).	`1.0`

Returns:

Type	Description
`LLMResponse`	LLMResponse containing the text content and usage metrics.

Raises:

Type	Description
`Exception`	If the API request fails after retries.

Source code in src/majordomo_llm/base.py

@abstractmethod
async def get_response(
    self,
    user_prompt: str,
    system_prompt: str | None = None,
    temperature: float = 0.3,
    top_p: float = 1.0,
) -> LLMResponse:
    """Get a plain text response from the LLM.

    Args:
        user_prompt: The user's input prompt.
        system_prompt: Optional system prompt to set context/behavior.
        temperature: Sampling temperature (0.0-2.0). Lower is more deterministic.
        top_p: Nucleus sampling parameter (0.0-1.0).

    Returns:
        LLMResponse containing the text content and usage metrics.

    Raises:
        Exception: If the API request fails after retries.
    """
    raise NotImplementedError()

get_structured_json_response `async` ¶

get_structured_json_response(
    response_model,
    user_prompt,
    system_prompt=None,
    temperature=0.3,
    top_p=1.0,
)

Get a structured response validated against a Pydantic model.

Uses provider-specific mechanisms (tool calling, response schemas) to ensure the response conforms to the specified Pydantic model schema.

Parameters:

Name	Type	Description	Default
`response_model`	`type[T]`	Pydantic model class defining the expected structure.	required
`user_prompt`	`str`	The user's input prompt.	required
`system_prompt`	`str \| None`	Optional system prompt to set context/behavior.	`None`
`temperature`	`float`	Sampling temperature (0.0-2.0). Lower is more deterministic.	`0.3`
`top_p`	`float`	Nucleus sampling parameter (0.0-1.0).	`1.0`

Returns:

Type	Description
`LLMStructuredResponse`	LLMStructuredResponse containing the validated Pydantic model instance.

Raises:

Type	Description
`ValidationError`	If the response doesn't match the model schema.
`Exception`	If the API request fails after retries.

Example

from pydantic import BaseModel class Person(BaseModel): ... name: str ... age: int response = await llm.get_structured_json_response( ... response_model=Person, ... user_prompt="Extract: John is 30 years old", ... ) print(response.content.name) John

Source code in src/majordomo_llm/base.py

async def get_structured_json_response(
    self,
    response_model: type[T],
    user_prompt: str,
    system_prompt: str | None = None,
    temperature: float = 0.3,
    top_p: float = 1.0,
) -> LLMStructuredResponse:
    """Get a structured response validated against a Pydantic model.

    Uses provider-specific mechanisms (tool calling, response schemas) to
    ensure the response conforms to the specified Pydantic model schema.

    Args:
        response_model: Pydantic model class defining the expected structure.
        user_prompt: The user's input prompt.
        system_prompt: Optional system prompt to set context/behavior.
        temperature: Sampling temperature (0.0-2.0). Lower is more deterministic.
        top_p: Nucleus sampling parameter (0.0-1.0).

    Returns:
        LLMStructuredResponse containing the validated Pydantic model instance.

    Raises:
        pydantic.ValidationError: If the response doesn't match the model schema.
        Exception: If the API request fails after retries.

    Example:
        >>> from pydantic import BaseModel
        >>> class Person(BaseModel):
        ...     name: str
        ...     age: int
        >>> response = await llm.get_structured_json_response(
        ...     response_model=Person,
        ...     user_prompt="Extract: John is 30 years old",
        ... )
        >>> print(response.content.name)
        John
    """
    response = await self._get_structured_response(
        response_model=response_model,
        user_prompt=user_prompt,
        system_prompt=system_prompt,
        temperature=temperature,
        top_p=top_p,
    )
    parsed_content = response_model.model_validate(response.content)

    return LLMStructuredResponse(
        content=parsed_content,
        input_tokens=response.input_tokens,
        output_tokens=response.output_tokens,
        cached_tokens=response.cached_tokens,
        input_cost=response.input_cost,
        output_cost=response.output_cost,
        total_cost=response.total_cost,
        response_time=response.response_time,
    )

Bases: Usage

Response from an LLM containing plain text content.

Inherits all usage metrics from :class:Usage.

Attributes:

Name	Type	Description
`content`	`str`	The text content of the LLM response.

Source code in src/majordomo_llm/base.py

@dataclass
class LLMResponse(Usage):
    """Response from an LLM containing plain text content.

    Inherits all usage metrics from :class:`Usage`.

    Attributes:
        content: The text content of the LLM response.
    """

    content: str

Bases: Usage

Response from an LLM containing parsed JSON content.

Inherits all usage metrics from :class:Usage.

Attributes:

Name	Type	Description
`content`	`dict[str, Any]`	The parsed JSON content as a Python dict.

Source code in src/majordomo_llm/base.py

@dataclass
class LLMJSONResponse(Usage):
    """Response from an LLM containing parsed JSON content.

    Inherits all usage metrics from :class:`Usage`.

    Attributes:
        content: The parsed JSON content as a Python dict.
    """

    content: dict[str, Any]

Bases: Usage

Response from an LLM containing a validated Pydantic model.

Inherits all usage metrics from :class:Usage.

Attributes:

Name	Type	Description
`content`	`BaseModel`	The validated Pydantic model instance.

Source code in src/majordomo_llm/base.py

@dataclass
class LLMStructuredResponse(Usage):
    """Response from an LLM containing a validated Pydantic model.

    Inherits all usage metrics from :class:`Usage`.

    Attributes:
        content: The validated Pydantic model instance.
    """

    content: BaseModel

Base Classes¶

__init__ ¶

get_full_model_name ¶

get_json_response async ¶

get_response abstractmethod async ¶

get_structured_json_response async ¶

init ¶

get_json_response `async` ¶

get_response `abstractmethod` `async` ¶

get_structured_json_response `async` ¶