ModelReins Guide
A lightweight model dispatch layer — route prompts to Claude, local LLMs, or any CLI tool, schedule tasks via cron, and wire Claude Code in as a background worker over MCP.
Fast Setup
Install the npm package, point it at a server, and dispatch your first request in under two minutes.
Any Backend
Cloud APIs, local models, or shell scripts — all behind one uniform interface.
Scheduled Agents
Cron-driven tasks that run prompts on a schedule and store results for later retrieval.
Web Dashboard
Inspect dispatch history, live provider status, and scheduled job runs in one place.
Quick Start
ModelReins exposes a small HTTP API. The easiest way to interact with it is the
modelreins npm package, which wraps every endpoint and handles auth.
You can also hit the REST API directly from any language.
Install the client
$ npm install -g modelreins + [email protected] added 42 packages in 3.2s $ modelreins --version 1.0.0
Connect to a server
Point the client at your ModelReins instance. The base URL is stored in
~/.modelreins/config.json and can be overridden with
MODELREINS_URL and MODELREINS_TOKEN environment variables.
$ modelreins config set url http://192.168.0.246:8484 $ modelreins config set token YOUR_API_TOKEN $ modelreins ping ✓ connected to ModelReins at http://192.168.0.246:8484 version : 1.0.0 providers: 3 active uptime : 4d 12h 7m
Dispatch your first prompt
modelreins dispatch sends a prompt to the default provider and streams the
response to stdout. Use --provider to target a specific backend.
$ modelreins dispatch "Summarise the last 5 git commits" Dispatching to provider: claude (default) ──────────────────────────────────────── The last 5 commits focused on...
From JavaScript / TypeScript
import ModelReins from 'modelreins';
const mr = new ModelReins({
url: 'http://192.168.0.246:8484',
token: process.env.MODELREINS_TOKEN,
});
const result = await mr.dispatch({
prompt: 'Explain quantum entanglement simply.',
provider: 'claude', // optional, uses default if omitted
model: 'claude-sonnet-4-6', // optional, uses provider default
});
console.log(result.text);
console.log('tokens used:', result.usage);
Raw HTTP
curl -s -X POST http://192.168.0.246:8484/api/dispatch \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_TOKEN' \
-d '{"prompt":"Hello world","provider":"claude"}' \
| jq '.text'
Authentication tokens are managed in the Dashboard under Settings → API Tokens. Tokens can be scoped to specific providers or operations.
Providers
A provider is a named backend that ModelReins can route prompts to.
Each provider has its own config block in config.yaml on the server side.
Multiple providers of the same type can coexist with different names.
Set default: true on one provider to make it the target when no
provider field is supplied in a dispatch request.
Claude Anthropic API
Hosted Claude models via the Anthropic Messages API
Requires an Anthropic API key. Supports streaming, system prompts, multi-turn conversation context, and tool use pass-through.
providers:
claude:
type: anthropic
default: true
apiKey: ${ANTHROPIC_API_KEY}
model: claude-sonnet-4-6 # default model
maxTokens: 8192
systemPrompt: |
You are a helpful assistant running in ModelReins.
Supported models
| Model ID | Context | Notes |
|---|---|---|
claude-opus-4-6 | 200k | Most capable, highest cost |
claude-sonnet-4-6 | 200k | Best balance — recommended default |
claude-haiku-4-5-20251001 | 200k | Fast & cheap, great for classification |
LM Studio Local
OpenAI-compatible API served by LM Studio on the local network
LM Studio exposes an OpenAI-compatible endpoint on localhost:1234
by default. ModelReins uses the openai-compat type for any
server speaking the OpenAI Chat Completions API.
providers:
lmstudio:
type: openai-compat
baseUrl: http://192.168.0.109:1234/v1
apiKey: lm-studio # any string works
model: lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF
stream: true
Enable the Local Server toggle in LM Studio and load a model before dispatching. LM Studio must remain running for the provider to be reachable.
Ollama Local
Native Ollama API or its OpenAI-compat shim
Ollama runs models locally and exposes both a native API on port 11434
and an OpenAI-compatible shim. ModelReins supports both; the native type gives
access to Ollama-specific options like num_ctx.
providers:
ollama:
type: ollama
baseUrl: http://192.168.0.109:11434
model: llama3.2
options:
num_ctx: 32768
num_gpu: 99
temperature: 0.7
Pull a model
$ ollama pull llama3.2 $ ollama pull qwen2.5-coder:7b $ ollama list NAME ID SIZE llama3.2:latest ... 2.0 GB qwen2.5-coder:7b ... 4.7 GB
1minAI Cloud
Unified API hub — access 100+ models with one key
1minAI provides a single API key that unlocks access to GPT-4o, Gemini, Mistral, and many others via an OpenAI-compatible surface. Useful for fallback or cost comparison without managing multiple keys.
providers:
1minai:
type: openai-compat
baseUrl: https://api.1min.ai/v1
apiKey: ${ONEMINAI_API_KEY}
model: gpt-4o
fallback: true # used when primary provider errors
Custom CLI Bring Your Own
Pipe prompts through any shell command
The cli provider type shells out to an arbitrary command.
The prompt is piped to stdin; the response is read from stdout.
This unlocks any tool that accepts text input — scripts, local Claude Code
instances, custom wrappers, etc.
providers:
local-claude-code:
type: cli
command: claude --print --dangerously-skip-permissions
timeout: 120000 # ms
env:
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
my-script:
type: cli
command: /opt/modelreins/scripts/my_llm.sh
timeout: 30000
CLI providers inherit the server process's environment unless you override it with the
env block. The working directory is the ModelReins server root.
Silicon Workers
A silicon worker is a bot registered as a first-class identity in your tenant. It declares what it can do at signup, carries a token of its own, heartbeats its presence, and every action it takes is attributed back to it in the audit log. Not an anonymous API-key holder — a durable employee.
Why this shape
A worker minted as worker:my-triager shows up in your review queue,
your presence dashboard, and every audit entry by that name. Revocation is a single
click. The row stays for audit even after revoke; the token stops authenticating
immediately. If you ever need to ask "which bot did that?", the question
has an answer.
Register a worker
Two paths — both produce the same result.
From the dashboard: go to /workers, fill the form, get a one-time-view URL, click to reveal your token. Save it.
From the API:
curl -X POST https://app.modelreins.com/api/v1/workers/register \
-H "Authorization: Bearer $MODELREINS_ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "my-triager",
"description": "Triages incoming support email",
"capabilities": [
{"name": "email.triage", "risk_tier": "auto"},
{"name": "email.reply", "risk_tier": "audit"}
]
}'
The response carries a view_url. Open it once, copy the token, save it as MODELREINS_TOKEN.
Capabilities & risk tiers
Every capability has a name and a risk tier:
| Tier | Behavior |
|---|---|
auto | Runs freely. |
audit | Runs, but every action is logged for later review. |
approve | Pauses for a human to release it. |
session | Approved once per session, not per action. |
A worker can have different tiers for different capabilities — for example, it can read a feed on auto, reply on audit, and publish on approve. The Matriarch reads these at dispatch time and routes the friction accordingly.
The Python SDK
The fastest path to a running silicon worker is the official Python SDK, published to PyPI. stdlib-only — no external dependencies.
pip install modelreins-worker
Then ten lines of Python:
from modelreins_worker import Worker
def handle_job(job):
prompt = job["prompt"]
# ... do the work ...
return "Triage complete: priority=high, category=billing"
Worker(name="my-triager").run(handler=handle_job)
The worker heartbeats every 5 seconds, polls its inbox for jobs assigned to its name, and calls your handle_job for each one. Exceptions auto-fail the job with the traceback as output. Full API surface + capabilities reference at /docs/sdk/python.
Secrets: bring your own vault
Workers that need credentials (webhook URLs, API tokens, CRM keys) declare them as vault:// references in their requires_secrets. ModelReins resolves those references against your vault (Vaultwarden, Bitwarden, HashiCorp Vault, Keeper) at dispatch time. The credential flows straight from your vault into the worker runtime. ModelReins never holds your secrets. Configure the connector at /settings/vault.
API Keys
Self-serve key management lives at /settings/api-keys. Mint new keys, revoke old ones, see when each was last used. Raw tokens are delivered through a one-time-view URL with a short TTL — the raw value is shown exactly once, then the link dies. ModelReins stores only the hash, so losing a key is never a security incident; it's a speed bump. Mint a new one, revoke the old prefix, move on.
Why the one-time view
Raw credentials don't belong in email, Slack, or chat transcripts. The one-time-view URL collapses the surface area to exactly one HTTPS page-view. If someone forwards the URL to an attacker after you've already opened it, they see "already viewed" — the raw value is off our disks. If it hasn't been opened yet and the link expires, the raw value is deleted by the same sweep. Either way, the only window for disclosure is the one you explicitly opened.
Worker tokens use the same pipe
When you register a silicon worker, the response carries a view_url pointing to the same delivery page. Same TTL, same one-and-done semantics. There is no separate channel for credential delivery anywhere in the product.
Scheduled Tasks
Scheduled tasks dispatch a prompt on a cron schedule and store the result. Results are retrievable via the API or visible in the Dashboard. Tasks can target any configured provider and optionally POST their result to a webhook.
Cron API
Tasks are managed through the /api/tasks endpoint family.
import ModelReins from 'modelreins';
const mr = new ModelReins({ url: 'http://192.168.0.246:8484', token: TOKEN });
// Create a scheduled task
const task = await mr.tasks.create({
name: 'daily-standup-summary',
schedule: '0 9 * * 1-5', // 9 AM Mon–Fri
provider: 'claude',
prompt: 'Summarise open GitHub PRs and suggest priorities.',
webhook: 'https://hooks.slack.com/T.../B.../...', // optional
enabled: true,
});
console.log(task.id); // task_abc123
// List tasks
const all = await mr.tasks.list();
// Get latest result
const result = await mr.tasks.latestResult(task.id);
// Trigger immediately (one-off run)
await mr.tasks.run(task.id);
// Disable / enable
await mr.tasks.update(task.id, { enabled: false });
// Delete
await mr.tasks.delete(task.id);
Schedule presets
Common cron expressions — click any chip to copy the expression.
Examples
Log monitor
await mr.tasks.create({
name: 'error-log-digest',
schedule: '0 */4 * * *', // every 4 hours
provider: 'claude',
prompt: `
Read /var/log/app/error.log (last 500 lines).
Summarise recurring errors, count by type,
and flag anything that looks critical.
Be concise — 10 lines max.
`,
context: { readFile: '/var/log/app/error.log' },
});
Weekly report via webhook
await mr.tasks.create({
name: 'weekly-metrics',
schedule: '0 8 * * 1', // Monday 8 AM
provider: 'claude',
prompt: 'Generate a weekly metrics summary for the team.',
webhook: 'https://hooks.slack.com/services/...',
webhookFormat: 'slack', // wraps text in Slack payload
});
Cron fields reference
| Field | Range | Special |
|---|---|---|
| Minute | 0–59 | */n every n minutes |
| Hour | 0–23 | */n every n hours |
| Day of month | 1–31 | ? unspecified |
| Month | 1–12 or JAN–DEC | * every month |
| Day of week | 0–6 (Sun=0) or SUN–SAT | 1-5 weekdays |
MCP Channel
ModelReins can register itself as an MCP (Model Context Protocol) server. This lets any MCP-aware client — including Claude Code — call ModelReins tools directly from within a conversation, turning Claude Code into a background worker that dispatches sub-tasks through ModelReins.
MCP is Anthropic's open protocol for connecting AI models to external tools and data.
Claude Code reads .mcp.json in the project root (or user-global
~/.claude/mcp.json) to discover available MCP servers.
Configure .mcp.json
Add a ModelReins entry to your project's .mcp.json. The MCP server
is exposed by ModelReins itself over stdio or SSE — choose whichever your client supports.
stdio transport (recommended for Claude Code)
{
"mcpServers": {
"modelreins": {
"type": "stdio",
"command": "npx",
"args": ["modelreins", "mcp"],
"env": {
"MODELREINS_URL": "http://192.168.0.246:8484",
"MODELREINS_TOKEN": "YOUR_TOKEN_HERE"
}
}
}
}
SSE transport (for web clients)
{
"mcpServers": {
"modelreins": {
"type": "sse",
"url": "http://192.168.0.246:8484/mcp/sse",
"headers": {
"Authorization": "Bearer YOUR_TOKEN_HERE"
}
}
}
}
Available MCP tools
Once connected, these tools appear inside Claude Code (and other MCP clients):
| Tool name | Description | Key params |
|---|---|---|
mr_dispatch |
Send a prompt to a provider and return the result | prompt, provider, model |
mr_providers |
List all configured providers and their status | — |
mr_task_create |
Create a new scheduled task | name, schedule, prompt |
mr_task_list |
List scheduled tasks with last-run status | — |
mr_task_result |
Fetch the latest result for a task | taskId |
mr_task_run |
Trigger a task immediately | taskId |
Example: Claude Code as a worker
With ModelReins as an MCP server and a cli provider pointing at Claude Code,
the outer Claude instance can spin up inner Claude Code sessions for long-running tasks:
Use the mr_dispatch tool to have the local-claude-code provider
refactor the auth module at src/auth/* and return a summary of changes.
Claude Code will call mr_dispatch with provider: "local-claude-code",
ModelReins shells out to claude --print, and the inner session performs
the refactor. The outer session receives the result as the tool response.
When using --dangerously-skip-permissions on the inner CLI provider,
ensure it runs in an isolated working directory or container. Grant only the
permissions the task actually needs.
Verify the connection
$ claude mcp list modelreins stdio connected Tools: mr_dispatch, mr_providers, mr_task_create, mr_task_list, mr_task_result, mr_task_run
Dashboard
The web UI is served at the root of your ModelReins instance
(http://192.168.0.246:8484). It provides a real-time view
of the system without requiring any CLI setup.
Overview
Live provider status, total dispatches today, error rate, and average latency. Updates every 10 seconds.
Dispatch History
Full log of every dispatch with prompt, response, provider, model, token count, latency, and status.
Scheduled Tasks
Create, edit, enable/disable, and manually trigger tasks. View run history and last output inline.
Provider Health
Per-provider uptime, last-error details, and a live "ping" button to test reachability.
Usage Graphs
Token consumption and request volume over time, broken out by provider and model.
Settings
Manage API tokens, configure webhooks, edit server config (with live reload), and view server logs.
Keyboard shortcuts
| Key | Action |
|---|---|
G D | Go to Dispatch History |
G T | Go to Scheduled Tasks |
G P | Go to Providers |
G S | Go to Settings |
/ | Focus search bar |
R | Refresh current view |
? | Show keyboard help |
API Reference
All endpoints are under http://<host>:8484/api.
Requests require an Authorization: Bearer <token> header unless
the server is configured with auth: none.
All request/response bodies are JSON.
An OpenAPI 3.1 spec is available at /api/openapi.json and can be
imported directly into Postman, Insomnia, or any compatible client.
Request body
{
"prompt": "Explain binary search trees.",
"provider": "claude", // optional
"model": "claude-sonnet-4-6", // optional
"system": "You are a CS tutor.", // optional
"stream": false, // set true for SSE stream
"options": { "temperature": 0.5 } // optional provider opts
}
Response
{
"id": "dsp_01jx...",
"text": "A binary search tree is...",
"provider": "claude",
"model": "claude-sonnet-4-6",
"usage": { "input_tokens": 12, "output_tokens": 284 },
"latencyMs": 1423,
"createdAt": "2026-04-04T09:12:33Z"
}
[
{
"name": "claude",
"type": "anthropic",
"default": true,
"status": "healthy",
"lastCheck": "2026-04-04T09:10:00Z"
}
]
Returns an array of task objects including id, name,
schedule, enabled, lastRunAt, and lastStatus.
{
"name": "my-task",
"schedule": "0 9 * * 1-5",
"provider": "claude",
"prompt": "Your prompt here.",
"enabled": true,
"webhook": "https://..." // optional
}
Runs the task once regardless of its schedule. Returns a runId that can be
polled at /api/tasks/:id/results/:runId.
Accepts any subset of the creation fields. Changes take effect at the next scheduled run.
Permanently removes the task and its run history. Returns 204 No Content.
Returns paginated results. Query params: limit (default 20), offset.
{
"status": "ok",
"version": "1.0.0",
"uptime": 391627,
"providers": { "healthy": 3, "degraded": 0 }
}
Error responses
| Status | Code | Meaning |
|---|---|---|
400 | VALIDATION_ERROR | Missing or invalid request fields |
401 | UNAUTHORIZED | Missing or invalid token |
404 | NOT_FOUND | Task or resource does not exist |
429 | RATE_LIMITED | Provider rate limit hit; retry after header set |
502 | PROVIDER_ERROR | Upstream provider returned an error |
504 | PROVIDER_TIMEOUT | Provider did not respond within configured timeout |
{
"error": {
"code": "PROVIDER_ERROR",
"message": "Anthropic API returned 529 Overloaded",
"provider":"claude",
"retryAfter": 30
}
}