LLMOps: A pluggable LLMOps toolkit for TypeScript applications.

A Variant is a specific model configuration that can be attached to configs. Variants contain the actual LLM settings: provider, model, parameters, and system prompt.

Overview

Each variant:

Has a name for easy identification
Contains model configuration (provider, model, parameters)
Automatically creates versions when edited
Can be attached to multiple configs
Supports Nunjucks/Jinja2 template syntax in system prompts

Creating a Variant

When creating or editing a variant, you configure:

Property	Description
Name	Human-readable name (e.g., "GPT-4 Turbo - Helpful")
Provider	The LLM provider (e.g., OpenAI, Anthropic)
Model	The model to use (e.g., `gpt-4-turbo`, `claude-3-sonnet`)
System Prompt	Instructions that define the assistant's behavior
Parameters	Model parameters like temperature, max tokens, etc.

Model Parameters

Parameter	Description	Range
Temperature	Controls randomness. Lower = more focused, higher = more creative	0 - 2
Max Tokens	Maximum length of the response	Varies by model
Top P	Nucleus sampling. Alternative to temperature	0 - 1
Frequency Penalty	Reduces repetition of tokens	-2 - 2
Presence Penalty	Encourages new topics	-2 - 2

Template Variables

System prompts support Nunjucks/Jinja2 template syntax, allowing you to create dynamic prompts with variables that are filled at runtime.

Supported Syntax

Syntax	Description	Example
`{{ variable }}`	Variable interpolation	`{{ user_name }}`
`{% for item in items %}`	Loop through arrays	`{% for tag in tags %}{{ tag }}{% endfor %}`
`{% if condition %}`	Conditional blocks	`{% if premium %}Premium user{% endif %}`
`{# comment #}`	Comments (not rendered)	`{# TODO: improve this #}`

Example System Prompt with Templates

You are a {{ role }} assistant for {{ company_name }}.

User Profile:
- Name: {{ user.name }}
- Plan: {{ user.plan }}

{% if user.preferences %}
User Preferences:
{% for pref in user.preferences %}
- {{ pref }}
{% endfor %}
{% endif %}

Please respond in a {{ tone }} tone.

When calling the API, pass template variables in the input_variables field of your request body. This field is used for template rendering and is removed before the request is sent to the LLM provider.

curl -X POST https://your-api.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-llmops-prompt: your-config-id" \
  -H "Authorization: Bearer your-env-secret" \
  -d '{
    "messages": [{"role": "user", "content": "Hello!"}],
    "input_variables": {
      "role": "customer support",
      "company_name": "Acme Inc",
      "user": {
        "name": "John Doe",
        "plan": "premium",
        "preferences": ["concise responses", "technical details"]
      },
      "tone": "friendly"
    }
  }'

The input_variables field works with all OpenAI-compatible endpoints including /chat/completions and /responses.

Behavior Notes

Missing variables: If a variable is not provided in input_variables, it will be rendered as empty (no error is thrown)
Nested objects: Access nested properties with dot notation: {{ user.name }}
Arrays: Use {% for %} loops to iterate over arrays
Filters: Nunjucks filters are supported: {{ name | upper }}, {{ items | join(", ") }}

Versioning

Every time you edit a variant, a new version is created. This provides:

Audit Trail: See exactly what configuration was used at any point in time
Safe Rollbacks: Revert to a previous version if issues arise
Version Pinning: Production can use a specific version while staging tests newer versions

Version Pinning vs Latest

When targeting a variant to an environment:

Mode	Behavior
Pinned Version	Always serves the specified version (recommended for production)
Latest	Automatically serves the newest version (useful for staging/development)

Example Workflow

Create a variant with your initial configuration
- Provider: OpenAI
- Model: gpt-4-turbo
- System Prompt: "You are a helpful assistant..."
- Temperature: 0.7
- Creates Version 1
Attach to a config and set up targeting rules
- Production → Version 1 (pinned)
- Staging → Latest
Iterate on the variant
- Update the system prompt
- Creates Version 2
- Staging automatically uses Version 2
- Production still uses Version 1
Promote to production
- After testing in staging, update production targeting to Version 2

Multiple Variants per Config

A single config can have multiple variants attached. This enables:

A/B Testing: Compare different model configurations
Gradual Rollouts: Slowly shift traffic to new configurations
Fallback Options: Use a backup variant if the primary fails

Example Variants

Variant Name	Provider	Model	Use Case
GPT-4 Turbo - Helpful	OpenAI	gpt-4-turbo	General assistance
Claude 3 - Concise	Anthropic	claude-3-sonnet	Brief responses
Llama 3 - Local	Ollama	llama3	Development/testing
GPT-4o Mini - Fast	OpenAI	gpt-4o-mini	Quick, cost-effective responses

Variants