Live Documentation

ModelReins Guide

A lightweight model dispatch layer — route prompts to Claude, local LLMs, or any CLI tool, schedule tasks via cron, and wire Claude Code in as a background worker over MCP.

⚡

Fast Setup

Install the npm package, point it at a server, and dispatch your first request in under two minutes.

🔌

Any Backend

Cloud APIs, local models, or shell scripts — all behind one uniform interface.

🕐

Scheduled Agents

Cron-driven tasks that run prompts on a schedule and store results for later retrieval.

📈

Web Dashboard

Inspect dispatch history, live provider status, and scheduled job runs in one place.

Quick Start

ModelReins exposes a small HTTP API. The easiest way to interact with it is the modelreins npm package, which wraps every endpoint and handles auth. You can also hit the REST API directly from any language.

Install the client

shell

$ npm install -g modelreins
+ [email protected]
added 42 packages in 3.2s

$ modelreins --version
1.0.0

Connect to a server

Point the client at your ModelReins instance. The base URL is stored in ~/.modelreins/config.json and can be overridden with MODELREINS_URL and MODELREINS_TOKEN environment variables.

shell

$ modelreins config set url http://192.168.0.246:8484
$ modelreins config set token YOUR_API_TOKEN

$ modelreins ping
✓ connected to ModelReins at http://192.168.0.246:8484
  version  : 1.0.0
  providers: 3 active
  uptime   : 4d 12h 7m

Dispatch your first prompt

modelreins dispatch sends a prompt to the default provider and streams the response to stdout. Use --provider to target a specific backend.

shell

$ modelreins dispatch "Summarise the last 5 git commits"

Dispatching to provider: claude (default)
────────────────────────────────────────
The last 5 commits focused on...

From JavaScript / TypeScript

typescript

import ModelReins from 'modelreins';

const mr = new ModelReins({
  url:   'http://192.168.0.246:8484',
  token: process.env.MODELREINS_TOKEN,
});

const result = await mr.dispatch({
  prompt:   'Explain quantum entanglement simply.',
  provider: 'claude',          // optional, uses default if omitted
  model:    'claude-sonnet-4-6', // optional, uses provider default
});

console.log(result.text);
console.log('tokens used:', result.usage);

Raw HTTP

bash

curl -s -X POST http://192.168.0.246:8484/api/dispatch \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_TOKEN' \
  -d '{"prompt":"Hello world","provider":"claude"}' \
  | jq '.text'

💡

Authentication tokens are managed in the Dashboard under Settings → API Tokens. Tokens can be scoped to specific providers or operations.

Providers

A provider is a named backend that ModelReins can route prompts to. Each provider has its own config block in config.yaml on the server side. Multiple providers of the same type can coexist with different names.

✨

Set default: true on one provider to make it the target when no provider field is supplied in a dispatch request.

🤖

Claude Anthropic API

Hosted Claude models via the Anthropic Messages API

Requires an Anthropic API key. Supports streaming, system prompts, multi-turn conversation context, and tool use pass-through.

yaml — config.yaml

providers:
  claude:
    type:     anthropic
    default: true
    apiKey:  ${ANTHROPIC_API_KEY}
    model:   claude-sonnet-4-6   # default model
    maxTokens: 8192
    systemPrompt: |
      You are a helpful assistant running in ModelReins.

Supported models

Model ID	Context	Notes
`claude-opus-4-6`	200k	Most capable, highest cost
`claude-sonnet-4-6`	200k	Best balance — recommended default
`claude-haiku-4-5-20251001`	200k	Fast & cheap, great for classification

📀

LM Studio Local

OpenAI-compatible API served by LM Studio on the local network

LM Studio exposes an OpenAI-compatible endpoint on localhost:1234 by default. ModelReins uses the openai-compat type for any server speaking the OpenAI Chat Completions API.

yaml — config.yaml

providers:
  lmstudio:
    type:    openai-compat
    baseUrl: http://192.168.0.109:1234/v1
    apiKey:  lm-studio          # any string works
    model:   lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF
    stream:  true

⚠️

Enable the Local Server toggle in LM Studio and load a model before dispatching. LM Studio must remain running for the provider to be reachable.

🌞

Ollama Local

Native Ollama API or its OpenAI-compat shim

Ollama runs models locally and exposes both a native API on port 11434 and an OpenAI-compatible shim. ModelReins supports both; the native type gives access to Ollama-specific options like num_ctx.

yaml — config.yaml

providers:
  ollama:
    type:    ollama
    baseUrl: http://192.168.0.109:11434
    model:   llama3.2
    options:
      num_ctx:  32768
      num_gpu:  99
      temperature: 0.7

Pull a model

shell

$ ollama pull llama3.2
$ ollama pull qwen2.5-coder:7b
$ ollama list
NAME                        ID            SIZE
llama3.2:latest             ...           2.0 GB
qwen2.5-coder:7b            ...           4.7 GB

🔸

1minAI Cloud

Unified API hub — access 100+ models with one key

1minAI provides a single API key that unlocks access to GPT-4o, Gemini, Mistral, and many others via an OpenAI-compatible surface. Useful for fallback or cost comparison without managing multiple keys.

yaml — config.yaml

providers:
  1minai:
    type:    openai-compat
    baseUrl: https://api.1min.ai/v1
    apiKey:  ${ONEMINAI_API_KEY}
    model:   gpt-4o
    fallback: true  # used when primary provider errors

🛠

Custom CLI Bring Your Own

Pipe prompts through any shell command

The cli provider type shells out to an arbitrary command. The prompt is piped to stdin; the response is read from stdout. This unlocks any tool that accepts text input — scripts, local Claude Code instances, custom wrappers, etc.

yaml — config.yaml

providers:
  local-claude-code:
    type:    cli
    command: claude --print --dangerously-skip-permissions
    timeout: 120000  # ms
    env:
      ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}

  my-script:
    type:    cli
    command: /opt/modelreins/scripts/my_llm.sh
    timeout: 30000

💡

CLI providers inherit the server process's environment unless you override it with the env block. The working directory is the ModelReins server root.

Silicon Workers

A silicon worker is a bot registered as a first-class identity in your tenant. It declares what it can do at signup, carries a token of its own, heartbeats its presence, and every action it takes is attributed back to it in the audit log. Not an anonymous API-key holder — a durable employee.

Why this shape

A worker minted as worker:my-triager shows up in your review queue, your presence dashboard, and every audit entry by that name. Revocation is a single click. The row stays for audit even after revoke; the token stops authenticating immediately. If you ever need to ask "which bot did that?", the question has an answer.

Register a worker

Two paths — both produce the same result.

From the dashboard: go to /workers, fill the form, get a one-time-view URL, click to reveal your token. Save it.

From the API:

curl -X POST https://app.modelreins.com/api/v1/workers/register \
  -H "Authorization: Bearer $MODELREINS_ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-triager",
    "description": "Triages incoming support email",
    "capabilities": [
      {"name": "email.triage", "risk_tier": "auto"},
      {"name": "email.reply",  "risk_tier": "audit"}
    ]
  }'

The response carries a view_url. Open it once, copy the token, save it as MODELREINS_TOKEN.

Capabilities & risk tiers

Every capability has a name and a risk tier:

Tier	Behavior
`auto`	Runs freely.
`audit`	Runs, but every action is logged for later review.
`approve`	Pauses for a human to release it.
`session`	Approved once per session, not per action.

A worker can have different tiers for different capabilities — for example, it can read a feed on auto, reply on audit, and publish on approve. The Matriarch reads these at dispatch time and routes the friction accordingly.

The Python SDK

The fastest path to a running silicon worker is the official Python SDK, published to PyPI. stdlib-only — no external dependencies.

pip install modelreins-worker

Then ten lines of Python:

from modelreins_worker import Worker

def handle_job(job):
    prompt = job["prompt"]
    # ... do the work ...
    return "Triage complete: priority=high, category=billing"

Worker(name="my-triager").run(handler=handle_job)

The worker heartbeats every 5 seconds, polls its inbox for jobs assigned to its name, and calls your handle_job for each one. Exceptions auto-fail the job with the traceback as output. Full API surface + capabilities reference at /docs/sdk/python.

Secrets: bring your own vault

Workers that need credentials (webhook URLs, API tokens, CRM keys) declare them as vault:// references in their requires_secrets. ModelReins resolves those references against your vault (Vaultwarden, Bitwarden, HashiCorp Vault, Keeper) at dispatch time. The credential flows straight from your vault into the worker runtime. ModelReins never holds your secrets. Configure the connector at /settings/vault.

API Keys

Self-serve key management lives at /settings/api-keys. Mint new keys, revoke old ones, see when each was last used. Raw tokens are delivered through a one-time-view URL with a short TTL — the raw value is shown exactly once, then the link dies. ModelReins stores only the hash, so losing a key is never a security incident; it's a speed bump. Mint a new one, revoke the old prefix, move on.

Why the one-time view

Raw credentials don't belong in email, Slack, or chat transcripts. The one-time-view URL collapses the surface area to exactly one HTTPS page-view. If someone forwards the URL to an attacker after you've already opened it, they see "already viewed" — the raw value is off our disks. If it hasn't been opened yet and the link expires, the raw value is deleted by the same sweep. Either way, the only window for disclosure is the one you explicitly opened.

Worker tokens use the same pipe

When you register a silicon worker, the response carries a view_url pointing to the same delivery page. Same TTL, same one-and-done semantics. There is no separate channel for credential delivery anywhere in the product.

Scheduled Tasks

Scheduled tasks dispatch a prompt on a cron schedule and store the result. Results are retrievable via the API or visible in the Dashboard. Tasks can target any configured provider and optionally POST their result to a webhook.

Cron API

Tasks are managed through the /api/tasks endpoint family.

typescript

import ModelReins from 'modelreins';
const mr = new ModelReins({ url: 'http://192.168.0.246:8484', token: TOKEN });

// Create a scheduled task
const task = await mr.tasks.create({
  name:     'daily-standup-summary',
  schedule: '0 9 * * 1-5',        // 9 AM Mon–Fri
  provider: 'claude',
  prompt:   'Summarise open GitHub PRs and suggest priorities.',
  webhook:  'https://hooks.slack.com/T.../B.../...',  // optional
  enabled:  true,
});
console.log(task.id);  // task_abc123

// List tasks
const all = await mr.tasks.list();

// Get latest result
const result = await mr.tasks.latestResult(task.id);

// Trigger immediately (one-off run)
await mr.tasks.run(task.id);

// Disable / enable
await mr.tasks.update(task.id, { enabled: false });

// Delete
await mr.tasks.delete(task.id);

Schedule presets

Common cron expressions — click any chip to copy the expression.

0 * * * * Every hour

*/15 * * * * Every 15 min

0 9 * * 1-5 9 AM weekdays

0 8 * * 1 Monday 8 AM

0 0 * * * Midnight daily

0 6 1 * * 1st of month

*/5 * * * * Every 5 min

0 18 * * 5 Friday 6 PM

Examples

Log monitor

typescript

await mr.tasks.create({
  name:     'error-log-digest',
  schedule: '0 */4 * * *',  // every 4 hours
  provider: 'claude',
  prompt: `
    Read /var/log/app/error.log (last 500 lines).
    Summarise recurring errors, count by type,
    and flag anything that looks critical.
    Be concise — 10 lines max.
  `,
  context: { readFile: '/var/log/app/error.log' },
});

Weekly report via webhook

typescript

await mr.tasks.create({
  name:     'weekly-metrics',
  schedule: '0 8 * * 1',  // Monday 8 AM
  provider: 'claude',
  prompt:   'Generate a weekly metrics summary for the team.',
  webhook:  'https://hooks.slack.com/services/...',
  webhookFormat: 'slack',  // wraps text in Slack payload
});

Cron fields reference

Field	Range	Special
Minute	0–59	`*/n` every n minutes
Hour	0–23	`*/n` every n hours
Day of month	1–31	`?` unspecified
Month	1–12 or JAN–DEC	`*` every month
Day of week	0–6 (Sun=0) or SUN–SAT	`1-5` weekdays

MCP Channel

ModelReins can register itself as an MCP (Model Context Protocol) server. This lets any MCP-aware client — including Claude Code — call ModelReins tools directly from within a conversation, turning Claude Code into a background worker that dispatches sub-tasks through ModelReins.

🔗

MCP is Anthropic's open protocol for connecting AI models to external tools and data. Claude Code reads .mcp.json in the project root (or user-global ~/.claude/mcp.json) to discover available MCP servers.

Configure `.mcp.json`

Add a ModelReins entry to your project's .mcp.json. The MCP server is exposed by ModelReins itself over stdio or SSE — choose whichever your client supports.

stdio transport (recommended for Claude Code)

json — .mcp.json

{
  "mcpServers": {
    "modelreins": {
      "type": "stdio",
      "command": "npx",
      "args": ["modelreins", "mcp"],
      "env": {
        "MODELREINS_URL":   "http://192.168.0.246:8484",
        "MODELREINS_TOKEN": "YOUR_TOKEN_HERE"
      }
    }
  }
}

SSE transport (for web clients)

json — .mcp.json

{
  "mcpServers": {
    "modelreins": {
      "type": "sse",
      "url":  "http://192.168.0.246:8484/mcp/sse",
      "headers": {
        "Authorization": "Bearer YOUR_TOKEN_HERE"
      }
    }
  }
}

Available MCP tools

Once connected, these tools appear inside Claude Code (and other MCP clients):

Tool name	Description	Key params
`mr_dispatch`	Send a prompt to a provider and return the result	`prompt`, `provider`, `model`
`mr_providers`	List all configured providers and their status	—
`mr_task_create`	Create a new scheduled task	`name`, `schedule`, `prompt`
`mr_task_list`	List scheduled tasks with last-run status	—
`mr_task_result`	Fetch the latest result for a task	`taskId`
`mr_task_run`	Trigger a task immediately	`taskId`

Example: Claude Code as a worker

With ModelReins as an MCP server and a cli provider pointing at Claude Code, the outer Claude instance can spin up inner Claude Code sessions for long-running tasks:

user prompt inside Claude Code

Use the mr_dispatch tool to have the local-claude-code provider
refactor the auth module at src/auth/* and return a summary of changes.

Claude Code will call mr_dispatch with provider: "local-claude-code", ModelReins shells out to claude --print, and the inner session performs the refactor. The outer session receives the result as the tool response.

⚠️

When using --dangerously-skip-permissions on the inner CLI provider, ensure it runs in an isolated working directory or container. Grant only the permissions the task actually needs.

Verify the connection

shell — inside claude code project

$ claude mcp list
modelreins   stdio   connected
  Tools: mr_dispatch, mr_providers, mr_task_create,
         mr_task_list, mr_task_result, mr_task_run

Dashboard

The web UI is served at the root of your ModelReins instance (http://192.168.0.246:8484). It provides a real-time view of the system without requiring any CLI setup.

🏠

Overview

Live provider status, total dispatches today, error rate, and average latency. Updates every 10 seconds.

📄

Dispatch History

Full log of every dispatch with prompt, response, provider, model, token count, latency, and status.

🕐

Scheduled Tasks

Create, edit, enable/disable, and manually trigger tasks. View run history and last output inline.

🔌

Provider Health

Per-provider uptime, last-error details, and a live "ping" button to test reachability.

📈

Usage Graphs

Token consumption and request volume over time, broken out by provider and model.

🔒

Settings

Manage API tokens, configure webhooks, edit server config (with live reload), and view server logs.

Keyboard shortcuts

Key	Action
`G D`	Go to Dispatch History
`G T`	Go to Scheduled Tasks
`G P`	Go to Providers
`G S`	Go to Settings
`/`	Focus search bar
`R`	Refresh current view
`?`	Show keyboard help

API Reference

All endpoints are under http://<host>:8484/api. Requests require an Authorization: Bearer <token> header unless the server is configured with auth: none. All request/response bodies are JSON.

📄

An OpenAPI 3.1 spec is available at /api/openapi.json and can be imported directly into Postman, Insomnia, or any compatible client.

POST /api/dispatch Send a prompt, get a response

Request body

json

{
  "prompt":   "Explain binary search trees.",
  "provider": "claude",          // optional
  "model":    "claude-sonnet-4-6", // optional
  "system":   "You are a CS tutor.", // optional
  "stream":   false,               // set true for SSE stream
  "options":  { "temperature": 0.5 }  // optional provider opts
}

Response

json

{
  "id":       "dsp_01jx...",
  "text":     "A binary search tree is...",
  "provider": "claude",
  "model":    "claude-sonnet-4-6",
  "usage":    { "input_tokens": 12, "output_tokens": 284 },
  "latencyMs": 1423,
  "createdAt": "2026-04-04T09:12:33Z"
}

GET /api/providers List all providers

json — response

[
  {
    "name":      "claude",
    "type":      "anthropic",
    "default":   true,
    "status":    "healthy",
    "lastCheck": "2026-04-04T09:10:00Z"
  }
]

GET /api/tasks List scheduled tasks

Returns an array of task objects including id, name, schedule, enabled, lastRunAt, and lastStatus.

POST /api/tasks Create a scheduled task

json — request body

{
  "name":      "my-task",
  "schedule":  "0 9 * * 1-5",
  "provider":  "claude",
  "prompt":    "Your prompt here.",
  "enabled":   true,
  "webhook":   "https://..."   // optional
}

POST /api/tasks/:id/run Trigger a task immediately

Runs the task once regardless of its schedule. Returns a runId that can be polled at /api/tasks/:id/results/:runId.

PATCH /api/tasks/:id Update a task

Accepts any subset of the creation fields. Changes take effect at the next scheduled run.

DELETE /api/tasks/:id Delete a task

Permanently removes the task and its run history. Returns 204 No Content.

GET /api/tasks/:id/results Task run history

Returns paginated results. Query params: limit (default 20), offset.

GET /api/health Server health check

json — response

{
  "status":    "ok",
  "version":   "1.0.0",
  "uptime":    391627,
  "providers": { "healthy": 3, "degraded": 0 }
}

Error responses

Status	Code	Meaning
`400`	`VALIDATION_ERROR`	Missing or invalid request fields
`401`	`UNAUTHORIZED`	Missing or invalid token
`404`	`NOT_FOUND`	Task or resource does not exist
`429`	`RATE_LIMITED`	Provider rate limit hit; retry after header set
`502`	`PROVIDER_ERROR`	Upstream provider returned an error
`504`	`PROVIDER_TIMEOUT`	Provider did not respond within configured timeout

json — error envelope

{
  "error": {
    "code":    "PROVIDER_ERROR",
    "message": "Anthropic API returned 529 Overloaded",
    "provider":"claude",
    "retryAfter": 30
  }
}

ModelReins Guide

Fast Setup

Any Backend

Scheduled Agents

Web Dashboard

Quick Start

Install the client

Connect to a server

Dispatch your first prompt

From JavaScript / TypeScript

Raw HTTP

Providers

Claude Anthropic API

Supported models

LM Studio Local

Ollama Local

Pull a model

1minAI Cloud

Custom CLI Bring Your Own

Silicon Workers

Why this shape

Register a worker

Capabilities & risk tiers

The Python SDK

Secrets: bring your own vault

API Keys

Why the one-time view

Worker tokens use the same pipe

Scheduled Tasks

Cron API

Schedule presets

Examples

Log monitor

Weekly report via webhook

Cron fields reference

MCP Channel

Configure .mcp.json

stdio transport (recommended for Claude Code)

SSE transport (for web clients)

Available MCP tools

Example: Claude Code as a worker

Verify the connection

Dashboard

Overview

Dispatch History

Scheduled Tasks

Provider Health

Usage Graphs

Settings

Keyboard shortcuts

API Reference

Request body

Response

Error responses

Configure `.mcp.json`