Skip to content

Base Classes

Bases: ABC

Abstract base class for LLM provider implementations.

Provides a unified interface for interacting with different LLM providers (OpenAI, Anthropic, Gemini) with automatic retry logic and cost tracking.

Subclasses must implement the :meth:get_response method. Other methods have default implementations that can be overridden for provider-specific optimizations.

Attributes:

Name Type Description
provider

The LLM provider name (e.g., "openai", "anthropic", "gemini").

model

The specific model identifier (e.g., "gpt-4o", "claude-sonnet-4-20250514").

input_cost

Cost per million input tokens in USD.

output_cost

Cost per million output tokens in USD.

supports_temperature_top_p

Whether the model supports temperature/top_p params.

use_web_search

Whether to enable web search (Anthropic only).

api_key_hash

Truncated SHA256 hash of the API key (for logging).

api_key_alias

Optional human-readable name for the API key.

Example

from majordomo_llm import get_llm_instance llm = get_llm_instance("anthropic", "claude-sonnet-4-20250514") response = await llm.get_response("What is 2+2?") print(response.content) 4 print(f"Cost: ${response.total_cost:.6f}")

Source code in src/majordomo_llm/base.py
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
class LLM(ABC):
    """Abstract base class for LLM provider implementations.

    Provides a unified interface for interacting with different LLM providers
    (OpenAI, Anthropic, Gemini) with automatic retry logic and cost tracking.

    Subclasses must implement the :meth:`get_response` method. Other methods
    have default implementations that can be overridden for provider-specific
    optimizations.

    Attributes:
        provider: The LLM provider name (e.g., "openai", "anthropic", "gemini").
        model: The specific model identifier (e.g., "gpt-4o", "claude-sonnet-4-20250514").
        input_cost: Cost per million input tokens in USD.
        output_cost: Cost per million output tokens in USD.
        supports_temperature_top_p: Whether the model supports temperature/top_p params.
        use_web_search: Whether to enable web search (Anthropic only).
        api_key_hash: Truncated SHA256 hash of the API key (for logging).
        api_key_alias: Optional human-readable name for the API key.

    Example:
        >>> from majordomo_llm import get_llm_instance
        >>> llm = get_llm_instance("anthropic", "claude-sonnet-4-20250514")
        >>> response = await llm.get_response("What is 2+2?")
        >>> print(response.content)
        4
        >>> print(f"Cost: ${response.total_cost:.6f}")
    """

    def __init__(
        self,
        provider: str,
        model: str,
        input_cost: float,
        output_cost: float,
        supports_temperature_top_p: bool = True,
        use_web_search: bool = False,
        api_key: str | None = None,
        api_key_alias: str | None = None,
    ) -> None:
        """Initialize the LLM instance.

        Args:
            provider: The LLM provider name.
            model: The model identifier.
            input_cost: Cost per million input tokens in USD.
            output_cost: Cost per million output tokens in USD.
            supports_temperature_top_p: Whether temperature/top_p are supported.
            use_web_search: Enable web search capability (Anthropic only).
            api_key: The API key (used to compute hash for logging).
            api_key_alias: Optional human-readable name for the API key.
        """
        self.provider = provider
        self.model = model
        self.input_cost = input_cost
        self.output_cost = output_cost
        self.supports_temperature_top_p = supports_temperature_top_p
        self.use_web_search = use_web_search
        self.api_key_hash = _hash_api_key(api_key) if api_key else None
        self.api_key_alias = api_key_alias

    def get_full_model_name(self) -> str:
        """Get the fully qualified model name.

        Returns:
            Model name in the format "provider:model" (e.g., "anthropic:claude-sonnet-4-20250514").
        """
        return f"{self.provider}:{self.model}"

    def _calculate_costs(
        self, input_tokens: int, output_tokens: int
    ) -> tuple[float, float, float]:
        """Calculate costs for a request.

        Args:
            input_tokens: Number of input tokens.
            output_tokens: Number of output tokens.

        Returns:
            Tuple of (input_cost, output_cost, total_cost) in USD.
        """
        input_cost = (input_tokens * self.input_cost) / TOKENS_PER_MILLION
        output_cost = (output_tokens * self.output_cost) / TOKENS_PER_MILLION
        return input_cost, output_cost, input_cost + output_cost

    @abstractmethod
    async def get_response(
        self,
        user_prompt: str,
        system_prompt: str | None = None,
        temperature: float = 0.3,
        top_p: float = 1.0,
    ) -> LLMResponse:
        """Get a plain text response from the LLM.

        Args:
            user_prompt: The user's input prompt.
            system_prompt: Optional system prompt to set context/behavior.
            temperature: Sampling temperature (0.0-2.0). Lower is more deterministic.
            top_p: Nucleus sampling parameter (0.0-1.0).

        Returns:
            LLMResponse containing the text content and usage metrics.

        Raises:
            Exception: If the API request fails after retries.
        """
        raise NotImplementedError()

    @retry(wait=wait_random_exponential(min=0.2, max=1), stop=stop_after_attempt(3))
    async def get_json_response(
        self,
        user_prompt: str,
        system_prompt: str | None = None,
        temperature: float = 0.3,
        top_p: float = 1.0,
    ) -> LLMJSONResponse:
        """Get a JSON response from the LLM.

        Automatically parses the LLM's text response as JSON.

        Args:
            user_prompt: The user's input prompt.
            system_prompt: Optional system prompt to set context/behavior.
            temperature: Sampling temperature (0.0-2.0). Lower is more deterministic.
            top_p: Nucleus sampling parameter (0.0-1.0).

        Returns:
            LLMJSONResponse containing the parsed JSON dict and usage metrics.

        Raises:
            ResponseParsingError: If the response cannot be parsed as JSON.
            Exception: If the API request fails after retries.
        """
        response = await self.get_response(user_prompt, system_prompt, temperature, top_p)
        # Strip markdown code fencing if present
        content = response.content.replace("```json", "").replace("```", "").strip()
        try:
            parsed_content = json.loads(content)
        except json.JSONDecodeError as e:
            raise ResponseParsingError(
                f"Failed to parse JSON response: {e}",
                raw_content=response.content,
            ) from e
        return LLMJSONResponse(
            content=parsed_content,
            input_tokens=response.input_tokens,
            output_tokens=response.output_tokens,
            cached_tokens=response.cached_tokens,
            input_cost=response.input_cost,
            output_cost=response.output_cost,
            total_cost=response.total_cost,
            response_time=response.response_time,
        )

    async def get_structured_json_response(
        self,
        response_model: type[T],
        user_prompt: str,
        system_prompt: str | None = None,
        temperature: float = 0.3,
        top_p: float = 1.0,
    ) -> LLMStructuredResponse:
        """Get a structured response validated against a Pydantic model.

        Uses provider-specific mechanisms (tool calling, response schemas) to
        ensure the response conforms to the specified Pydantic model schema.

        Args:
            response_model: Pydantic model class defining the expected structure.
            user_prompt: The user's input prompt.
            system_prompt: Optional system prompt to set context/behavior.
            temperature: Sampling temperature (0.0-2.0). Lower is more deterministic.
            top_p: Nucleus sampling parameter (0.0-1.0).

        Returns:
            LLMStructuredResponse containing the validated Pydantic model instance.

        Raises:
            pydantic.ValidationError: If the response doesn't match the model schema.
            Exception: If the API request fails after retries.

        Example:
            >>> from pydantic import BaseModel
            >>> class Person(BaseModel):
            ...     name: str
            ...     age: int
            >>> response = await llm.get_structured_json_response(
            ...     response_model=Person,
            ...     user_prompt="Extract: John is 30 years old",
            ... )
            >>> print(response.content.name)
            John
        """
        response = await self._get_structured_response(
            response_model=response_model,
            user_prompt=user_prompt,
            system_prompt=system_prompt,
            temperature=temperature,
            top_p=top_p,
        )
        parsed_content = response_model.model_validate(response.content)

        return LLMStructuredResponse(
            content=parsed_content,
            input_tokens=response.input_tokens,
            output_tokens=response.output_tokens,
            cached_tokens=response.cached_tokens,
            input_cost=response.input_cost,
            output_cost=response.output_cost,
            total_cost=response.total_cost,
            response_time=response.response_time,
        )

    async def _get_structured_response(
        self,
        response_model: type[T],
        user_prompt: str,
        system_prompt: str | None = None,
        temperature: float = 0.3,
        top_p: float = 1.0,
    ) -> LLMJSONResponse:
        """Provider-specific implementation for structured responses.

        Default implementation injects the JSON schema into the system prompt.
        Providers should override this to use native structured output features.

        Args:
            response_model: Pydantic model class defining the expected structure.
            user_prompt: The user's input prompt.
            system_prompt: Optional system prompt to set context/behavior.
            temperature: Sampling temperature (0.0-2.0).
            top_p: Nucleus sampling parameter (0.0-1.0).

        Returns:
            LLMJSONResponse containing the parsed JSON content.
        """
        schema = response_model.model_json_schema()
        combined_system_prompt = build_schema_prompt(schema, system_prompt)

        if self.supports_temperature_top_p:
            return await self.get_json_response(
                user_prompt, combined_system_prompt, temperature, top_p
            )
        else:
            return await self.get_json_response(user_prompt, combined_system_prompt)

__init__

__init__(
    provider,
    model,
    input_cost,
    output_cost,
    supports_temperature_top_p=True,
    use_web_search=False,
    api_key=None,
    api_key_alias=None,
)

Initialize the LLM instance.

Parameters:

Name Type Description Default
provider str

The LLM provider name.

required
model str

The model identifier.

required
input_cost float

Cost per million input tokens in USD.

required
output_cost float

Cost per million output tokens in USD.

required
supports_temperature_top_p bool

Whether temperature/top_p are supported.

True
use_web_search bool

Enable web search capability (Anthropic only).

False
api_key str | None

The API key (used to compute hash for logging).

None
api_key_alias str | None

Optional human-readable name for the API key.

None
Source code in src/majordomo_llm/base.py
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
def __init__(
    self,
    provider: str,
    model: str,
    input_cost: float,
    output_cost: float,
    supports_temperature_top_p: bool = True,
    use_web_search: bool = False,
    api_key: str | None = None,
    api_key_alias: str | None = None,
) -> None:
    """Initialize the LLM instance.

    Args:
        provider: The LLM provider name.
        model: The model identifier.
        input_cost: Cost per million input tokens in USD.
        output_cost: Cost per million output tokens in USD.
        supports_temperature_top_p: Whether temperature/top_p are supported.
        use_web_search: Enable web search capability (Anthropic only).
        api_key: The API key (used to compute hash for logging).
        api_key_alias: Optional human-readable name for the API key.
    """
    self.provider = provider
    self.model = model
    self.input_cost = input_cost
    self.output_cost = output_cost
    self.supports_temperature_top_p = supports_temperature_top_p
    self.use_web_search = use_web_search
    self.api_key_hash = _hash_api_key(api_key) if api_key else None
    self.api_key_alias = api_key_alias

get_full_model_name

get_full_model_name()

Get the fully qualified model name.

Returns:

Type Description
str

Model name in the format "provider:model" (e.g., "anthropic:claude-sonnet-4-20250514").

Source code in src/majordomo_llm/base.py
236
237
238
239
240
241
242
def get_full_model_name(self) -> str:
    """Get the fully qualified model name.

    Returns:
        Model name in the format "provider:model" (e.g., "anthropic:claude-sonnet-4-20250514").
    """
    return f"{self.provider}:{self.model}"

get_json_response async

get_json_response(
    user_prompt,
    system_prompt=None,
    temperature=0.3,
    top_p=1.0,
)

Get a JSON response from the LLM.

Automatically parses the LLM's text response as JSON.

Parameters:

Name Type Description Default
user_prompt str

The user's input prompt.

required
system_prompt str | None

Optional system prompt to set context/behavior.

None
temperature float

Sampling temperature (0.0-2.0). Lower is more deterministic.

0.3
top_p float

Nucleus sampling parameter (0.0-1.0).

1.0

Returns:

Type Description
LLMJSONResponse

LLMJSONResponse containing the parsed JSON dict and usage metrics.

Raises:

Type Description
ResponseParsingError

If the response cannot be parsed as JSON.

Exception

If the API request fails after retries.

Source code in src/majordomo_llm/base.py
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
@retry(wait=wait_random_exponential(min=0.2, max=1), stop=stop_after_attempt(3))
async def get_json_response(
    self,
    user_prompt: str,
    system_prompt: str | None = None,
    temperature: float = 0.3,
    top_p: float = 1.0,
) -> LLMJSONResponse:
    """Get a JSON response from the LLM.

    Automatically parses the LLM's text response as JSON.

    Args:
        user_prompt: The user's input prompt.
        system_prompt: Optional system prompt to set context/behavior.
        temperature: Sampling temperature (0.0-2.0). Lower is more deterministic.
        top_p: Nucleus sampling parameter (0.0-1.0).

    Returns:
        LLMJSONResponse containing the parsed JSON dict and usage metrics.

    Raises:
        ResponseParsingError: If the response cannot be parsed as JSON.
        Exception: If the API request fails after retries.
    """
    response = await self.get_response(user_prompt, system_prompt, temperature, top_p)
    # Strip markdown code fencing if present
    content = response.content.replace("```json", "").replace("```", "").strip()
    try:
        parsed_content = json.loads(content)
    except json.JSONDecodeError as e:
        raise ResponseParsingError(
            f"Failed to parse JSON response: {e}",
            raw_content=response.content,
        ) from e
    return LLMJSONResponse(
        content=parsed_content,
        input_tokens=response.input_tokens,
        output_tokens=response.output_tokens,
        cached_tokens=response.cached_tokens,
        input_cost=response.input_cost,
        output_cost=response.output_cost,
        total_cost=response.total_cost,
        response_time=response.response_time,
    )

get_response abstractmethod async

get_response(
    user_prompt,
    system_prompt=None,
    temperature=0.3,
    top_p=1.0,
)

Get a plain text response from the LLM.

Parameters:

Name Type Description Default
user_prompt str

The user's input prompt.

required
system_prompt str | None

Optional system prompt to set context/behavior.

None
temperature float

Sampling temperature (0.0-2.0). Lower is more deterministic.

0.3
top_p float

Nucleus sampling parameter (0.0-1.0).

1.0

Returns:

Type Description
LLMResponse

LLMResponse containing the text content and usage metrics.

Raises:

Type Description
Exception

If the API request fails after retries.

Source code in src/majordomo_llm/base.py
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
@abstractmethod
async def get_response(
    self,
    user_prompt: str,
    system_prompt: str | None = None,
    temperature: float = 0.3,
    top_p: float = 1.0,
) -> LLMResponse:
    """Get a plain text response from the LLM.

    Args:
        user_prompt: The user's input prompt.
        system_prompt: Optional system prompt to set context/behavior.
        temperature: Sampling temperature (0.0-2.0). Lower is more deterministic.
        top_p: Nucleus sampling parameter (0.0-1.0).

    Returns:
        LLMResponse containing the text content and usage metrics.

    Raises:
        Exception: If the API request fails after retries.
    """
    raise NotImplementedError()

get_structured_json_response async

get_structured_json_response(
    response_model,
    user_prompt,
    system_prompt=None,
    temperature=0.3,
    top_p=1.0,
)

Get a structured response validated against a Pydantic model.

Uses provider-specific mechanisms (tool calling, response schemas) to ensure the response conforms to the specified Pydantic model schema.

Parameters:

Name Type Description Default
response_model type[T]

Pydantic model class defining the expected structure.

required
user_prompt str

The user's input prompt.

required
system_prompt str | None

Optional system prompt to set context/behavior.

None
temperature float

Sampling temperature (0.0-2.0). Lower is more deterministic.

0.3
top_p float

Nucleus sampling parameter (0.0-1.0).

1.0

Returns:

Type Description
LLMStructuredResponse

LLMStructuredResponse containing the validated Pydantic model instance.

Raises:

Type Description
ValidationError

If the response doesn't match the model schema.

Exception

If the API request fails after retries.

Example

from pydantic import BaseModel class Person(BaseModel): ... name: str ... age: int response = await llm.get_structured_json_response( ... response_model=Person, ... user_prompt="Extract: John is 30 years old", ... ) print(response.content.name) John

Source code in src/majordomo_llm/base.py
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
async def get_structured_json_response(
    self,
    response_model: type[T],
    user_prompt: str,
    system_prompt: str | None = None,
    temperature: float = 0.3,
    top_p: float = 1.0,
) -> LLMStructuredResponse:
    """Get a structured response validated against a Pydantic model.

    Uses provider-specific mechanisms (tool calling, response schemas) to
    ensure the response conforms to the specified Pydantic model schema.

    Args:
        response_model: Pydantic model class defining the expected structure.
        user_prompt: The user's input prompt.
        system_prompt: Optional system prompt to set context/behavior.
        temperature: Sampling temperature (0.0-2.0). Lower is more deterministic.
        top_p: Nucleus sampling parameter (0.0-1.0).

    Returns:
        LLMStructuredResponse containing the validated Pydantic model instance.

    Raises:
        pydantic.ValidationError: If the response doesn't match the model schema.
        Exception: If the API request fails after retries.

    Example:
        >>> from pydantic import BaseModel
        >>> class Person(BaseModel):
        ...     name: str
        ...     age: int
        >>> response = await llm.get_structured_json_response(
        ...     response_model=Person,
        ...     user_prompt="Extract: John is 30 years old",
        ... )
        >>> print(response.content.name)
        John
    """
    response = await self._get_structured_response(
        response_model=response_model,
        user_prompt=user_prompt,
        system_prompt=system_prompt,
        temperature=temperature,
        top_p=top_p,
    )
    parsed_content = response_model.model_validate(response.content)

    return LLMStructuredResponse(
        content=parsed_content,
        input_tokens=response.input_tokens,
        output_tokens=response.output_tokens,
        cached_tokens=response.cached_tokens,
        input_cost=response.input_cost,
        output_cost=response.output_cost,
        total_cost=response.total_cost,
        response_time=response.response_time,
    )

Bases: Usage

Response from an LLM containing plain text content.

Inherits all usage metrics from :class:Usage.

Attributes:

Name Type Description
content str

The text content of the LLM response.

Source code in src/majordomo_llm/base.py
137
138
139
140
141
142
143
144
145
146
147
@dataclass
class LLMResponse(Usage):
    """Response from an LLM containing plain text content.

    Inherits all usage metrics from :class:`Usage`.

    Attributes:
        content: The text content of the LLM response.
    """

    content: str

Bases: Usage

Response from an LLM containing parsed JSON content.

Inherits all usage metrics from :class:Usage.

Attributes:

Name Type Description
content dict[str, Any]

The parsed JSON content as a Python dict.

Source code in src/majordomo_llm/base.py
150
151
152
153
154
155
156
157
158
159
160
@dataclass
class LLMJSONResponse(Usage):
    """Response from an LLM containing parsed JSON content.

    Inherits all usage metrics from :class:`Usage`.

    Attributes:
        content: The parsed JSON content as a Python dict.
    """

    content: dict[str, Any]

Bases: Usage

Response from an LLM containing a validated Pydantic model.

Inherits all usage metrics from :class:Usage.

Attributes:

Name Type Description
content BaseModel

The validated Pydantic model instance.

Source code in src/majordomo_llm/base.py
163
164
165
166
167
168
169
170
171
172
173
@dataclass
class LLMStructuredResponse(Usage):
    """Response from an LLM containing a validated Pydantic model.

    Inherits all usage metrics from :class:`Usage`.

    Attributes:
        content: The validated Pydantic model instance.
    """

    content: BaseModel