AI & Voice
Muxit AI lets you control lab hardware using natural language — via chat or voice in the browser UI. It uses a managed AI proxy service with a safety gate that requires user approval before executing device commands.
In one line
MCP (bring your own AI client) is free on every tier. The built-in Chat Panel, voice, and the script ai() function (including the ai(prompt, image) vision overload) require Maker. The Vision AI tool surface (Chat Panel + MCP take_snapshot, object detection, spatial mapping), Local LLM, and Autonomous Agents require Pro. See AI features by tier below for the full picture.
AI features by tier
| Capability | Free | Maker | Pro+ |
|---|---|---|---|
| MCP server — Claude Desktop / Claude Code / ChatGPT* drives your hardware via your own AI provider | ✓ | ✓ | ✓ |
Connector ai: { instructions } — your safety rules and device context appear in every AI prompt | ✓ | ✓ | ✓ |
| JS scripts — write automations (3 on Free, unlimited on Maker+) | ✓ | ✓ | ✓ |
Hand-authored connectors — write .js connector configs yourself | ✓ | ✓ | ✓ |
say(), ask.confirm/choose/text — TTS and dashboard prompts | ✓ | ✓ | ✓ |
| Chat Panel — built-in chat with tool calling, in the browser UI | — | ✓ | ✓ |
| Voice — speech-to-text input, spoken responses | — | ✓ | ✓ |
ai(prompt) and ai(prompt, image) script globals — single-shot LLM calls from automations, including base64-JPEG vision input | — | ✓ | ✓ |
| AI-assisted SCPI authoring — "Set up with AI" button drafts a connector from a programming manual | — | ✓ | ✓ |
| AI credits / month | — | 500 | 1500 (Pro) |
Vision AI tools — agentic camera workflows in Chat Panel and MCP (take_snapshot, identify_objects, locate_object, OpenCV trackers, spatial calibration) | — | — | ✓ (Pro) |
| Local LLM — point Muxit at Ollama or LM Studio for offline operation | — | — | ✓ (Pro) |
| Autonomous Agents — long-running goal-driven AI | — | — | ✓ (Pro) |
* ChatGPT's connector UI rejects loopback URLs, so it needs a public HTTPS tunnel and the security caveats in the MCP Server section. Claude Desktop and Claude Code work locally with no extra setup.
What you get on Free
A lot. Point Claude Desktop, Claude Code, or ChatGPT (with a tunnel — see below) at Muxit's MCP server and your AI client gets the same 56+ tools the built-in chat uses — list connectors, read and write properties, call actions, take camera snapshots, run and write scripts, write connector configs. You drive your bench in natural language using your own AI subscription, and Muxit charges nothing for it.
You can write JS scripts (up to 3), author connectors by hand, and embed ai: { instructions } blocks so any AI that touches your hardware sees your safety rules and device context. say(), ask.confirm, and dashboard prompts all work. For a one-person lab already using Claude Code, this is a complete workflow.
What Maker unlocks
The things MCP can't give you because they live inside the Muxit window:
- Chat Panel and voice — talk to your bench while your hands are full, see tool calls inline, no external client to juggle.
ai()in scripts —if (ai("Is this reading abnormal?", reading) === "yes") …. Single-shot inference inside your automations, no API key plumbing. Pass a base64 JPEG as the second argument (e.g.ai("Is the part aligned?", connector('webcam').snapshot)) for explicit vision input — no extra tier required.- AI-assisted SCPI authoring — point the AI at a programming manual; it probes the device, drafts a connector, validates it, and brings it online. Hours of vendor-PDF spelunking become a chat turn.
- 500 AI credits / month included — managed proxy, no per-request billing to set up.
Beyond Maker
Pro adds the Vision AI tool surface — the Chat Panel and MCP gain take_snapshot, OpenCV-backed object detection, spatial mapping, and visual-servoing tools so the AI assistant can decide on its own when to look at the bench. (Maker users can already drive vision explicitly from scripts via ai(prompt, image); Pro is what lets the agent request a frame.) Pro also unlocks Local LLM support (Ollama / LM Studio for fully offline operation) and Autonomous Agents for long-running goal-driven work — plus a 1500-credit allowance.
Ways to Use AI
| Method | What it is | Best for |
|---|---|---|
| Chat Panel | Browser sidebar with natural language input | Interactive use in the web UI |
| MCP Server | Model Context Protocol over HTTP and stdio | Claude Code, Claude Desktop, MCP-compatible AI tools |
Quick Start
1. Chat Panel (Browser Web UI)
The Chat Panel provides a full AI assistant that can control your devices, read sensors, write scripts, and answer questions — all via natural language (text or voice). It uses an agentic tool-calling loop: the LLM decides which tools to call, executes them, and continues until it has a final answer.
Muxit AI is the default — no API keys needed. It uses your Muxit license to authenticate with a managed AI proxy service at api.muxit.io.
Click the gear icon to open Settings. Under AI Services, select Muxit AI.
Model selection:
- Auto (recommended) — Muxit selects the best model for each request based on task complexity and your subscription tier. This is the default.
- Manual — Pick a specific model from your favorites list to always use that model.
Credits are included with your Muxit subscription. You can purchase additional credits at muxit.lemonsqueezy.com.
Model favorites: You can curate a favorites list in Settings under Model Favorites. These appear in both the Settings model dropdown and the Chat header model picker for quick switching.
Starting the server
node start.js server # MuxitServer (serves web UI at http://127.0.0.1:8765)
node start.js server # Auto-open browserOpen AI Chat by clicking the chat icon in the Activity Bar.
2. MCP Server
The MCP server is built directly into MuxitServer (C#), exposing device control tools and resources over the Model Context Protocol. It supports both HTTP and stdio transports.
HTTP transport — Available at /mcp on the running server (e.g. http://127.0.0.1:8765/mcp). Starts automatically with the server.
Stdio transport — For Claude Code and Claude Desktop integration:
node start.js mcp # Via start script
dotnet run --project MuxitServer -- --mcp # DirectClaude Code
If you have an installed Muxit, register the MCP server with Claude Code from the directory you launch Claude Code in:
claude mcp add muxit -- muxit --mcp --workspace "/path/to/your/workspace"Replace muxit with the full path to the binary if it's not on your PATH (e.g. C:\Program Files\Muxit\muxit.exe or /Applications/Muxit.app/Contents/MacOS/muxit). This writes the config to ~/.claude.json so it's available across all your projects.
If you're working in this repository's source tree, the project-level .mcp.json at the repo root is already wired up — Claude Code picks it up automatically when you launch from this directory. It runs dotnet run --project MuxitServer -- --mcp --workspace workspace, so it only works inside this checkout.
Claude Desktop — Add to claude_desktop_config.json (~/Library/Application Support/Claude/claude_desktop_config.json on macOS, %APPDATA%\Claude\claude_desktop_config.json on Windows):
{
"mcpServers": {
"muxit": {
"command": "/path/to/muxit",
"args": ["--mcp", "--workspace", "/path/to/your/workspace"]
}
}
}Use the absolute path to the installed muxit executable. Restart Claude Desktop after editing.
ChatGPT — ChatGPT's connector UI rejects loopback and private-IP addresses (127.0.0.1, localhost, 10.*, 192.168.*) as "unsafe url" and requires a publicly reachable HTTPS endpoint with a valid TLS certificate. This is a constraint of ChatGPT, not Muxit — there's no way to point ChatGPT at the local /mcp URL directly.
To use ChatGPT with Muxit you have to expose the local endpoint over a public HTTPS tunnel. Quickest path with Cloudflare Tunnel:
cloudflared tunnel --url http://127.0.0.1:8765
# → https://<random-name>.trycloudflare.comThen in ChatGPT, Settings → Apps → Advanced settings → Create app, enter https://<random-name>.trycloudflare.com/mcp as the MCP server URL with No Auth.
Security warning — read this before running a tunnel
Muxit's HTTP API treats every loopback request as authenticated, and a tunnel daemon (cloudflared, ngrok, etc.) running on the same machine reaches Muxit over loopback. A public tunnel exposes your hardware to anyone who guesses the URL — they can read sensors, write properties, and run scripts with no credentials.
Muxit's optional password (security.remoteAccess + X-Auth-Token header) doesn't help here either: it's bypassed for loopback callers, and it uses a custom header name that ChatGPT's connector can't send.
Acceptable mitigations:
- Tailscale Funnel restricted to your tailnet — the tunnel isn't actually public, only your devices can reach it.
- Cloudflare Access in front of a Cloudflare Tunnel — adds an auth gate Muxit doesn't ship today.
- A short-lived
cloudflared/ngrokURL spun up only for the duration of a session and torn down right after.
If none of those apply, prefer Claude Desktop or Claude Code, which connect to the local MCP server over stdio and don't need a tunnel at all.
TIP
Open Settings > MCP in the Muxit GUI for ready-to-copy config snippets with your actual server paths.
MCP Tools
| Tool | Description | Category |
|---|---|---|
list_connectors | List all devices and capabilities | Device Control |
get_connector_schema | Get full schema for a device | Device Control |
read_property | Read a device property | Device Control |
write_property | Set a device property | Device Control |
call_action | Execute a device action | Device Control |
get_device_state | Snapshot of all cached state | Device Control |
list_scripts | List available and running scripts | Scripts |
run_script | Start a named script (bounded wait, returns running on long scripts) | Scripts |
run_code | Execute inline JavaScript (bounded wait, returns running on long scripts) | Scripts |
stop_script | Stop a running script | Scripts |
get_script_status | Check status / result / error of a running or recently-finished script | Scripts |
get_script_output | Fetch buffered log output (with since_seq cursor) | Scripts |
wait_for_script | Block up to 120s waiting for a running script to finish | Scripts |
read_script_source | Read a script's source code | Files |
write_script | Create or update a script file | Files |
read_startup_script | Read a startup script's source code | Files |
write_startup_script | Create or update a startup script | Files |
read_connector_config | Read a connector config file | Files |
write_connector_config | Create or update a connector config | Files |
list_drivers | List available drivers | System |
get_driver_schema | Get full schema for a specific driver | System |
get_server_config | Get server configuration (redacted) | System |
take_snapshot | Capture a camera image for visual analysis | Vision |
configure_vision | Set up OpenCV trackers for real-time detection | Vision |
read_detections | Read current tracked object positions (fast) | Vision |
identify_objects | AI-powered object detection with names/positions | Vision |
locate_object | Find object and return world coordinates | Vision |
calibrate_camera | Camera-to-world coordinate calibration | Vision |
teach_object | Teach an object for persistent real-time tracking | Vision |
forget_object | Remove a taught object | Vision |
list_objects | List all taught objects | Vision |
approach_object | Visual servoing — iterative guidance to move toward an object | Vision |
list_agents | List agent configs and running instances | Agents |
start_agent | Start an agent with a goal | Agents |
stop_agent | Stop a running agent | Agents |
get_agent_status | Get current status of an agent | Agents |
pause_agent | Pause a running agent | Agents |
resume_agent | Resume a paused agent | Agents |
read_instructions | Read the central lab instructions file | Instructions |
write_instructions | Update the central lab instructions file | Instructions |
list_processes | List available process definitions | Processes |
read_process | Read a process definition by name | Processes |
write_process | Create or update a process definition | Processes |
list_dashboards | List dashboard files in workspace/dashboards/ | Dashboards |
read_dashboard | Read a dashboard's JSON content | Dashboards |
write_dashboard | Create or update a .dashboard.json file | Dashboards |
MCP Resources
| URI | Description |
|---|---|
muxit://connectors | List of all connectors with schemas |
muxit://connector/{name}/schema | Schema for a specific connector |
muxit://state | Current device state snapshot |
Tool-Call Approvals
Tool-call approval prompts are driven by the global safety level (set via the SafetyChip in the status strip), not by a separate AI-only switch. In Observe and Assisted, the Chat Panel shows an approval dialog with the tool name, parameters, and Allow/Deny buttons. In Active and Unrestricted, tool calls execute without prompting (destructive actions still confirm per the safety policy). See the Safety Guide for the full behaviour table.
AI Chat Tools
The Chat Panel uses the same tools as the MCP server. The AI assistant decides which tools to call based on your request. Depending on the current safety level, tool calls may show an approval dialog before executing.
SCPI instrument setup
Adding a new oscilloscope, multimeter, power supply, or signal generator can be a chat-driven flow: probe the device with *IDN?, look up its command set online, write the connector config, and hot-reload — no server restart. Click Set up with AI on the Add Connector dialog (visible when you select GenericScpi), or ask the assistant directly — mentions of SCPI, an instrument class, or a well-known vendor (Rigol / Siglent / Keysight / Tektronix / Keithley / …) route the turn through four dedicated tools and enable OpenRouter's web-search plugin for that turn. See AI-assisted SCPI setup for the full walkthrough.
Guided Tour & Docs Access
The chat has direct access to every page in this documentation site, so "how do I…" questions get real guide content quoted back instead of a best-guess answer. Three tools power this:
| Tool | What it does |
|---|---|
list_docs | Enumerates every bundled doc path (guides, getting-started, reference, examples). |
search_docs | Full-text, case-insensitive search — returns the top hits with a short snippet. |
read_doc | Returns the raw markdown for a given path, e.g. guides/connectors.md. |
The docs are embedded into the server binary at build time, so they're available offline and always match the running version. The chat picks them up automatically when your question includes phrases like how do I, walk me through, tour, tutorial, getting started, or what is.
The Help → Guided Tour menu in the title bar is the fastest way in: each item opens the chat and sends a pre-written starter prompt so the AI walks you through a specific task.
- Guided Tour (AI) — "Give me a tour of Muxit"
- Tour: Install a Driver — finds the current driver-marketplace procedure
- Tour: Create a Connector — walks through the first-connector guide
- Tour: Drag Properties into a Script — explains how the hardware pane's drag targets work
- Tour: Build a Dashboard — walks through dashboards and widget binding
You can also ask for a tour in your own words — the menu is a convenience, not a gate.
Vision (Camera Snapshots)
The take_snapshot tool captures a single frame from any camera connector (USB webcam, IP camera) and returns it as an image content block. The LLM can then see and describe the image — useful for checking robot positions, reading instrument displays, or verifying alignments.
Requirements:
- A camera connector must be configured and initialized
- The LLM model must support vision (Claude Sonnet/Opus, GPT-4o, Gemini)
Example prompts:
- "Take a snapshot from the webcam and tell me what you see"
- "Is the robot arm aligned with the target? Check the camera"
- "What's the reading on the oscilloscope display?"
Voice Commands
Voice is a core Muxit feature — control your devices by speaking. The Chat Panel header includes a mic button with a dropdown for voice settings.
Two Modes
| Mode | How it works |
|---|---|
| Push to Talk (PTT) | Hold the mic button to talk. Release to stop. Your speech is transcribed and sent as a chat message. |
| Hands-free | Click the mic button to toggle listening. Behavior depends on wake word setting. |
Hands-free sub-modes (configured in Settings > Voice > Wake Word):
| Wake Word Setting | Hands-free Behavior |
|---|---|
| Enabled | Say "Muxit" to start a conversation. Muxit listens, sends your command, speaks the AI response (if TTS is on), then automatically resumes listening — no need to say the wake word again. The conversation ends after an adjustable silence timeout (default 10s). |
| Disabled | Muxit listens continuously. When you pause speaking, the transcript is automatically sent. |
Conversation Loop
When both wake word and TTS are enabled in hands-free mode, Muxit supports a natural spoken conversation:
- Say "Muxit" to start the conversation
- Speak your command — it's transcribed and sent to the AI
- The AI responds and TTS speaks the response
- Muxit automatically starts listening again (no wake word needed)
- Continue the conversation naturally
- When you stop talking, the conversation ends after a configurable idle timeout
Idle timeout: Adjust how long Muxit waits for you to speak before ending the conversation. Open the mic dropdown (▾ next to the mic button) and use the Idle slider (3–30 seconds, default 10s). You can also manually end the conversation by clicking the Stop button.
Text-to-Speech
Toggle TTS on/off with the speaker icon in the chat header. When enabled, AI responses are spoken aloud using browser speech synthesis. The AI automatically keeps responses short and voice-friendly.
Configure voice, rate, and pitch in Settings > Voice > Text-to-Speech.
The chat-header toggle controls AI replies only. Script say() output has its own toggle in the top status strip (next to STOP ALL) so you can mute one source without silencing the other. Both toggles share the same voice / rate / pitch settings.
Wake Word
Say "Muxit" to start a voice conversation hands-free. Enable in Settings > Voice > Wake Word.
Voice uses browser speech recognition (Web Speech API). Works in Chrome and Edge.
Settings Apply Instantly
Changes to Voice, AI Services, and AI Behavior settings take effect without a page refresh:
- AI model and system prompt — applied on your next chat message.
- TTS voice, rate, pitch, and enabled toggle — applied on the next utterance.
- Wake word phrase, enable/disable, and microphone selection — applied immediately; the background listener restarts in place.
A small Saved — applied immediately badge appears in the top-right of the Settings pane after each change to confirm it was saved.
AI in Scripts
Scripts can make single-shot LLM calls using the ai() global:
// Text-only query
const answer = ai("Classify this reading as normal or abnormal: 47.3 ohms");
log.info(answer);
// Vision: analyze a camera image
const cam = connector('webcam');
const frame = cam.snapshot;
const description = ai("Describe what you see in this image", frame);
log.info(description);
// Video recording
const file = cam.record({ seconds: 10 });
log.info(`Recorded: ${file}`);ai(prompt, image?) is a single-shot call — no conversation memory, no tool access, no agentic loop.
Agent Mode
Agents are autonomous AI instances that can coordinate multiple devices to accomplish goals. Unlike the chat loop (which responds to single messages), agents are persistent, goal-oriented, and can react to device events.
Quick Start
From chat:
"Pick up the red part and place it in the tray"From saved config (workspace/agents/*.agent.json):
{
"name": "pick-and-place",
"description": "Pick a part and place it in the tray",
"devices": ["robot", "camera"],
"autonomy": "supervised",
"safety": {
"maxSpeed": 50,
"workspace": { "x": [0, 600], "y": [-300, 300], "z": [50, 400] },
"maxForce": 20,
"requireVisionConfirm": true
},
"instructions": "Always approach from above. Verify grip before lifting.",
"parameters": {
"partColor": { "type": "string", "default": "red" }
}
}Autonomy Levels
| Level | Behavior | Best For |
|---|---|---|
| supervised | Each action shown to user in real-time, proceeds unless stopped | First-time tasks, dangerous ops |
| plan-approve | Agent creates plan, user approves, then executes freely | Repetitive tasks |
| guardrails | Runs freely within safety boundaries, pauses only on limit violation | Trusted tasks |
| full | No approval needed (safety limits still enforced) | Well-tested automation |
Safety Boundaries
Enforced regardless of autonomy level:
| Boundary | Config Key | Description |
|---|---|---|
| Workspace limits | safety.workspace | Bounding box the robot cannot leave |
| Speed limits | safety.maxSpeed | Max velocity as % of device max |
| Force limits | safety.maxForce | Max force (N) before emergency stop |
| Rate limits | safety.maxActionsPerMinute | Prevent hammering hardware |
| Vision confirm | safety.requireVisionConfirm | Camera snapshot before/after critical actions |
When requireVisionConfirm is enabled, the safety gate enforces that a vision tool (take_snapshot, read_detections, identify_objects, locate_object, or approach_object) was called within the last 30 seconds before any movement or grip action is allowed. If no recent snapshot exists, the action is blocked with an error message prompting the agent to take a snapshot first.
Operational Limits
| Limit | Default | Config |
|---|---|---|
| Max concurrent agents | 3 | agents.maxConcurrent in server.json |
| Approval timeout (supervised mode) | 5 minutes | Not configurable — action is auto-denied if user doesn't respond |
| Max iterations | 100 | maxIterations in agent config (-1 for unlimited) |
| Max runtime | 300 seconds | timeoutSeconds in agent config |
| Completed agent cleanup | 5 minutes | Completed/failed/stopped agents are removed from the running list after 5 minutes |
Starting Agents from Chat
You can start an agent directly from the AI chat by describing a multi-step task. The chat system creates an ad-hoc agent with default settings:
"Pick up the red part and place it in the tray"Ad-hoc agents use supervised autonomy by default and inherit the global LLM model. For more control, create a saved config file in workspace/agents/*.agent.json.
Event Triggers
Agents can start automatically when device conditions are met:
{
"triggers": [
{
"event": "state:psu.temperature",
"condition": "gt:80",
"cooldownSeconds": 120,
"goal": "Temperature too high — reduce power"
}
]
}Condition formats: gt:80, lt:10, eq:true, neq:off, changed
Process Files
Agents can reference structured procedure files that guide them through complex tasks. Process files live in workspace/processes/*.process.md and are linked via the processFile config field:
{
"name": "pick-and-place",
"devices": ["robot", "camera"],
"processFile": "pick-and-place",
"instructions": "Additional agent-specific notes..."
}When the agent starts, the process content is injected into its system prompt. See Process Definitions for the file format.
Agent Timeline & Transparency
Running agents stream a real-time timeline of their execution, showing what the agent is thinking, doing, and observing. Each entry has a type:
| Type | Meaning | Example |
|---|---|---|
thought | LLM reasoning text | "I can see the red ball at (320, 240). I'll move above it first." |
action | Tool call being executed | call_action({connector: "robot", action: "moveJ", ...}) |
observation | Tool result received | read_property → {"x": 150, "y": 30, "z": 200} |
error | Tool failure or safety block | Blocked: moveJ — speed=75% exceeds max 50% |
user_input | Agent asked a question | "Which object should I pick up?" |
The Reasoning field shows the agent's latest LLM output — what it's currently "thinking" before choosing its next action.
Agent status includes:
iteration— number of LLM calls madetokensUsed— approximate tokens consumedelapsedSeconds— time since starttimeline— full chronological event listreasoning— latest LLM reasoning text
Use agent.detail WebSocket message to fetch the full timeline, or subscribe to agent.timeline broadcasts for real-time streaming. The UI's AgentMonitor component displays all of this with tabs for Timeline, Plan, and Logs.
Per-Agent Model Selection (Cost Optimization)
Each agent can override the global LLM model and token limit. Simple agents (monitoring, condition checking) can use cheap models while complex agents (vision-guided manipulation) use premium models.
{
"name": "temperature-monitor",
"devices": ["test-device"],
"model": "google/gemini-2.5-flash",
"maxTokens": 1024,
"maxIterations": 30
}Model cost tiers:
| Tier | Models | ~Cost/MTok |
|---|---|---|
| $ (cheap) | google/gemini-2.5-flash, deepseek/deepseek-chat-v3-0324, openai/gpt-4o-mini | $0.10–0.30 |
| $$ (mid) | anthropic/claude-haiku-4-5, google/gemini-2.5-pro, openai/gpt-4o | $1–5 |
| $$$ (premium) | anthropic/claude-sonnet-4-5 | $5–15 |
If model is omitted, the agent uses the global model from Settings. Anthropic models automatically use prompt caching for repeated system prompts (~50% input token savings in agent loops).
Smart Tool Filtering (Automatic Cost Optimization)
Muxit automatically classifies each chat message to determine which tools are relevant, then sends only those tools to the AI model. This dramatically reduces token overhead and improves reliability with cheaper models.
How it works:
- Muxit AI: A free classifier model analyzes your message before the main AI call. This adds ~100-300ms latency but costs nothing.
- MCP providers: Classification is handled by the external AI client.
- Fallback: If classification fails, all tools are sent (same as previous behavior).
Example impact:
| Request | Tools Sent | Token Savings |
|---|---|---|
| "Move robot to 500,500,400" | 6 (core only) | ~80% fewer tool tokens |
| "Write a monitoring script" | 12 (core + scripts) | ~65% fewer tool tokens |
| "Take a photo of the workspace" | 12 (core + vision) | ~65% fewer tool tokens |
| "What's the meaning of life?" | 34 (all tools) | 0% (general fallback) |
Tool groups:
| Group | Tools | When Active |
|---|---|---|
| Core | list_connectors, read/write_property, call_action, get_device_state, get_connector_schema | Always |
| Scripts | list/run/stop/write scripts, run_code | Script/automation requests |
| Vision | take_snapshot, configure_vision, detections, identify/locate objects, calibrate, teach/forget/list objects, approach_object | Camera/vision requests |
| Agents | list/start/stop agents, agent status | Agent/autonomous requests |
| Memory | save/delete/list memories | Memory requests |
| Config | connector configs, drivers, server config | Config requests |
| Instructions | instructions, process definitions | Process/instruction requests |
The script API guide (~500 tokens) is also excluded from the system prompt when scripts are not relevant, further reducing costs.
This feature requires no configuration — it's always active and falls back gracefully.
When to Use What
| Approach | Best For | How It Works |
|---|---|---|
| Chat Panel | Interactive questions, one-off commands | You type, AI responds with tool calls |
Scripts with ai() | Automated decisions, classification, vision | Script calls ai() for a single LLM response |
| Agents | Multi-step autonomous goals, reactive automation | Persistent AI loop: plan, act, observe, re-plan |
| MCP Tools | External AI tools (Claude Code/Desktop) | AI calls Muxit tools via MCP |
AI Memory
Muxit AI remembers facts and preferences across sessions. Tell the AI "remember that..." to save a memory, or "forget that..." to remove one.
Memories are stored locally in workspace/config/ai-memory.json. Click the M button in the Chat Panel header to view, edit, or delete memories.
| Category | Examples |
|---|---|
preference | "User prefers metric units" |
device | "The PSU on the left bench is named 'main-psu'" |
procedure | "Calibration: reset PSU, set 5V, wait 10s, read" |
note | "Don't run scripts during backup (2-3am)" |
Vision-Guided Robot Control
Muxit supports closed-loop visual servoing — using camera feedback to guide robot movements in real time. The system uses a two-speed approach:
- Fast local CV (VisionDriver + OpenCV): Runs color/contour detection at camera frame rate for real-time tracking. No LLM calls, no latency.
- Slow LLM vision (
take_snapshot+ AI): Captures a frame and sends it to the LLM for high-level scene understanding, object identification, and planning. Used for initial assessment and verification, not tight control loops.
The combination lets agents plan with full visual intelligence but execute with the speed of classical computer vision.
Setting Up a Vision Connector
Create a connector config that uses the Vision driver with a webcam source:
// workspace/connectors/eye.js
export default {
driver: "Vision",
config: { source: "webcam" },
properties: {
frame: () => driver.frame(),
detections: () => driver.detections()
},
methods: {
detectColor: [() => driver.detectColor(), "Run HSV color detection"],
detectContours: [() => driver.detectContours(), "Run contour detection"]
},
poll: ["detections"]
};The source can be "webcam" (default USB camera) or a camera connector name for IP cameras.
Tracker Types
| Tracker | How It Works | Best For |
|---|---|---|
| color (HSV) | Filters pixels by hue/saturation/value range, finds centroids | Tracking brightly colored objects (red ball, green LED) |
| contour | Edge detection + contour finding, returns bounding boxes and areas | Tracking shapes regardless of color (parts, tools, containers) |
Both trackers run locally via OpenCV with no LLM calls, providing detections at frame rate.
Vision Workflow
The vision system combines two approaches:
| Layer | Tool | Speed | Capability |
|---|---|---|---|
| Fast CV | configure_vision + read_detections | ~30 fps | Track objects by color/shape (HSV, contour) |
| AI Vision | take_snapshot | ~2-5s | Identify objects by name, understand scenes, reason about spatial layout |
| AI Detection | identify_objects | ~2-5s | Find named objects with pixel coordinates and bounding boxes |
| Spatial | locate_object | ~2-5s | Find object and return world (robot) coordinates if calibrated |
| Teaching | teach_object | ~3-5s | Learn an object for persistent real-time tracking |
| Servoing | approach_object | ~0.1-5s | Iterative guidance to move robot toward an object |
Typical workflow for vision-guided tasks:
take_snapshot— understand what's in the sceneidentify_objects(query: "red spoon")— find specific object with coordinatesteach_object— learn the object for persistent real-time tracking (saves profile + creates CV tracker)read_detections— poll tracker position at high speed during approachapproach_object— iterative visual servoing to move toward the object (works without calibration)
Use AI tools (take_snapshot, identify_objects) for planning and verification. Use CV tools (read_detections) or approach_object for fast feedback during motion. Use teach_object to bridge the gap — AI identifies objects once, then OpenCV tracks them at 30fps.
AI Object Detection (identify_objects)
Sends a camera snapshot to the LLM with a structured detection prompt. Returns named objects with pixel coordinates and an annotated verification image showing bounding boxes drawn over each detected object.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
camera | string | No | Camera connector name (auto-detects if omitted) |
query | string | No | Specific object to find (e.g., "red spoon"). Omit to detect all visible objects |
Response:
[
{
"name": "red spoon",
"center": { "x": 320, "y": 240 },
"size": { "width": 80, "height": 30 },
"confidence": "high"
},
{
"name": "metal bowl",
"center": { "x": 500, "y": 350 },
"size": { "width": 120, "height": 100 },
"confidence": "medium"
}
]Key differences from read_detections:
identify_objects | read_detections | |
|---|---|---|
| Speed | ~2-5 seconds (LLM call) | Instant (local OpenCV) |
| Setup | None | Requires configure_vision first |
| Capabilities | Identifies objects by name | Tracks by color/shape only |
| Cost | Uses LLM tokens | Free (local processing) |
Results are cached for 5 seconds — repeated calls within that window are free.
The tool also returns an annotated camera image with green bounding boxes, red centroid dots, and name/confidence labels drawn over each detected object. This image is injected into the AI conversation for visual verification — the LLM can see what it detected and confirm accuracy before proceeding.
Locating Objects (locate_object)
Combines AI object detection with spatial mapping to return physical coordinates.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
camera | string | Yes | Camera connector name |
object_name | string | Yes | Object to find (e.g., "spoon", "red ball") |
Response (with calibration):
{
"found": true,
"name": "red spoon",
"pixel": { "x": 320, "y": 240 },
"world": { "x": 150.5, "y": -30.2, "z": 85.0 },
"confidence": "high",
"calibrated": true
}Response (without calibration):
{
"found": true,
"name": "red spoon",
"pixel": { "x": 320, "y": 240 },
"confidence": "high",
"calibrated": false,
"note": "Camera not calibrated — only pixel coordinates available. Use calibrate_camera to enable world coordinate mapping."
}Spatial Mapping (Camera Calibration)
The calibrate_camera tool teaches the system how pixel coordinates map to real-world (robot) coordinates using a teach-by-example approach. No camera intrinsic parameters or lens models needed.
How it works:
- Move the robot end-effector to a position visible in the camera
- Record the pixel position (from
identify_objectsorread_detections) and the robot's known coordinates - Repeat for 4+ positions spread across the workspace
- The system computes a 2D affine transform (least-squares fit)
Calibration actions:
| Action | Parameters | Description |
|---|---|---|
start | camera | Begin a new calibration session |
point | camera, pixel_x, pixel_y, world_x, world_y, world_z | Record a calibration point |
finish | camera | Compute transform and save |
status | camera | Check calibration state |
Example calibration flow:
1. calibrate_camera(camera: "webcam", action: "start")
2. Move robot to position (100, 0, 80), see it at pixel (150, 400)
calibrate_camera(camera: "webcam", action: "point", pixel_x: 150, pixel_y: 400, world_x: 100, world_y: 0, world_z: 80)
3. Repeat for 3-5 more positions...
4. calibrate_camera(camera: "webcam", action: "finish")Tips for good calibration:
- Use at least 4 points spread across the workspace (corners + center)
- All points should be at roughly the same Z height (the system assumes a flat work plane)
- The more spread out the points, the better the accuracy
- Recalibrate if the camera or its mount moves
Storage: Calibration data persists in workspace/config/calibrations/{camera}.json.
Object Teaching (teach_object)
The teach_object tool bridges AI vision and fast CV tracking. When the LLM identifies an object via identify_objects or take_snapshot, it can "teach" the object to the vision system — sampling its color, storing a persistent profile, and creating a real-time OpenCV tracker.
How it works:
- AI vision identifies an object (e.g., "screwdriver" at pixel 320, 240)
teach_objectsamples the color at that pixel location viacalibrateColor- A profile is saved to
workspace/config/objects.jsonwith HSV range, typical size, and description - A color tracker is auto-created on the vision connector for 30fps tracking
- A verification snapshot is captured showing the new tracker's detection overlay — this image is returned to the AI conversation so both the LLM and user can confirm the tracker is working correctly
- On server restart, taught objects auto-restore their trackers
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Object name (e.g., "screwdriver") |
camera | string | Yes | Camera connector name |
pixel_x | number | Yes | Center X pixel coordinate |
pixel_y | number | Yes | Center Y pixel coordinate |
width | number | No | Bounding box width (improves color sampling) |
height | number | No | Bounding box height |
description | string | No | Visual description for context |
tracker_type | string | No | "color" (default) or "contour" |
Example conversation:
User: "Learn what the red ball looks like"
AI: Takes a snapshot, identifies the red ball at pixel (320, 240) with size 60x60.Calls
teach_object(name: "red ball", camera: "webcam", pixel_x: 320, pixel_y: 240, width: 60, height: 60)"I've taught the vision system to recognize the red ball. Here's the verification image — I can see the green bounding box tracking it correctly in the center of the frame. It's now being tracked in real-time at 30fps."
Related tools:
forget_object(name)— Remove a taught object and its trackerlist_objects()— List all taught objects with their profiles
Storage: Object profiles persist in workspace/config/objects.json.
Direct Vision Annotation
For faster, cheaper object teaching without AI, you can draw bounding boxes directly on the camera feed in the dashboard. This bypasses the LLM entirely — you see it, box it, name it.
Setup:
- Add a Canvas widget to your dashboard with
streamset tovision:annotated - In the widget config, set
Vision Connectortovision(or your vision connector name) - An annotation toolbar appears on the canvas
Teaching objects:
- Click Draw in the toolbar — cursor becomes a crosshair
- Click and drag to draw a bounding box around the object you want to track
- Type a name in the popup input and press Enter (or click Teach)
- The system samples the color in that region, creates a tracker, and begins real-time tracking immediately
- You'll see the green tracking box appear within 1-2 frames
Managing objects:
- Click an object's label in the overlay to select it
- Click the X button to delete (forget) a tracked object
- Press Escape to cancel drawing or deselect
This uses the same vision.teach, vision.forget, and vision.list WebSocket messages — see the WebSocket API reference for details.
Visual Servoing (approach_object)
The approach_object tool provides iterative guidance for moving a robot toward an object without calibration. It works by computing how far the object is from the center of the camera frame, then suggesting a move direction and step size.
How it works:
- Finds the object using fast CV tracking (if taught) or AI detection (fallback)
- Computes error: how far the object's center is from the frame center (normalized -1 to 1)
- Returns a suggested move vector proportional to the error
- The LLM/agent moves the robot, then calls
approach_objectagain - Repeat until the object is centered (error < 10%)
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
camera | string | Yes | Camera connector name |
object_name | string | Yes | Object to approach |
step_size | number | No | Move step size in mm (default: 10) |
vision_connector | string | No | Vision connector name (default: "vision") |
Response:
{
"found": true,
"centered": false,
"object_name": "screwdriver",
"pixel": { "x": 450, "y": 180 },
"frame": { "width": 640, "height": 480 },
"normalized": { "x": 0.703, "y": 0.375 },
"error": { "x": 0.406, "y": -0.25 },
"suggestion": { "dx": -4.1, "dy": 2.5 },
"distance": "medium",
"quadrant": "right-top",
"message": "Object 'screwdriver' is in right-top quadrant. Suggested move: dx=-4.1, dy=2.5 mm."
}Usage pattern (agent or script):
loop:
result = approach_object(camera: "webcam", object_name: "screwdriver", step_size: 5)
if result.centered: break
robot.moveRelative(result.suggestion.dx, result.suggestion.dy, 0)
wait 500msAxis Mapping
Camera X/Y may not align with robot X/Y depending on camera mounting. Move the robot in one axis and observe the camera to determine the mapping. Save it to AI memory with save_memory.
See also: workspace/processes/visual-servo.process.md — structured procedure for visual servoing tasks.
Central Instructions
Lab-wide AI instructions live in workspace/config/instructions.md. This is the "standing orders" file — both AI chat and agents read it. Edit it to customize how the AI behaves in your lab.
File Format
# Lab Instructions
## General
You are controlling a robotics lab. Be careful with motion commands.
Always confirm before moving the robot to a new position.
Prefer slow, safe movements over fast ones.
## Devices
### robot
Collaborative robot arm (Fairino FR series). Max speed 29 when homing.
Always check motionDone before starting new moves.
Coordinates are in mm and degrees.
### webcam
USB webcam for live video streaming.
Safe to adjust quality and mirror settings at any time.
### psu
Power supply unit. Never exceed 30V / 5A.
Always ramp voltage gradually — never jump directly to target.
## Safety
- Never move robot without checking camera first
- Maximum approach speed near objects: 20%
- Always confirm before executing motion commands
- If force readings spike unexpectedly, stop all motionHow It Works
| Priority | Source | Scope |
|---|---|---|
| 1 (highest) | workspace/config/instructions.md — ## Devices / ### {name} | Per-device |
| 2 | workspace/config/instructions.md — ## General + ## Safety | All AI interactions |
| 3 | Connector config ai.instructions field | Per-device fallback |
| 4 (lowest) | Built-in defaults | Always present |
The central file takes priority. If a device has a ### device-name section in instructions.md, that replaces the connector's ai.instructions. If not, the connector-level instructions are used as fallback.
Hot Reload
The file is watched for changes — edit it and the new instructions take effect on the next AI interaction (no restart needed).
Editing
- Via UI: The instructions editor is available via WebSocket (
instructions.get/instructions.set) - Via AI: Ask the AI to update instructions: "Add a safety rule: never exceed 20% speed near the basket"
- Via file: Edit
workspace/config/instructions.mddirectly in any text editor
Tools: read_instructions, write_instructions
Process Definitions
Process files are structured "recipes" for complex autonomous tasks. They live in workspace/processes/*.process.md and teach the AI how to decompose goals into steps.
Why Processes?
When a user says "pick up the spoon and drop it in the basket", the AI needs to know:
- What is a spoon? How to find it? (object detection)
- Where is the basket? (spatial awareness)
- How to approach, grasp, transport, and place? (procedure)
- What to do if the grip fails? (error recovery)
Process files capture this procedural knowledge so it can be reused.
File Format
Files are named {name}.process.md and placed in workspace/processes/. The format uses markdown with standard sections:
# Pick and Place
## Goal
Pick up an object and place it at a target location.
## Requirements
- Devices: robot, camera (or webcam)
- The robot must have a gripper
## Steps
1. **Survey** — Take a camera snapshot. Identify the target object and destination.
2. **Locate object** — Use AI vision to determine the object's position.
If uncertain, take snapshots from different angles.
3. **Plan approach** — Calculate an approach path. Move to a safe height first.
4. **Approach** — Move above the object at safe speed (max 20%).
Take a snapshot to verify position.
5. **Grasp** — Lower to grasp height. Close gripper.
Take a snapshot to verify grip.
6. **Transport** — Lift to safe height. Move to above the destination.
7. **Place** — Lower to placement height. Open gripper.
Verify placement with camera.
8. **Retreat** — Return to home position.
## Error Handling
- If object not found: ask user to point it out
- If grip fails (object drops): retry from step 4, max 2 retries
- If position uncertain: take additional snapshots
## Safety
- Never exceed 20% speed when near objects
- Always approach from above
- Verify with camera before and after grippingUsing Processes
With agents: Reference a process in the agent config:
{
"name": "pick-and-place",
"processFile": "pick-and-place",
"devices": ["robot", "camera"]
}From AI chat: The AI can search for relevant processes:
User: "Pick up the spoon and put it in the basket"
AI: [calls list_processes] → finds "pick-and-place"
AI: [calls read_process("pick-and-place")] → reads procedure
AI: [calls start_agent with process] → follows the stepsCreating processes: Ask the AI to create one:
"Create a process for inspecting PCBs under the microscope"The AI will use write_process to save it to workspace/processes/.
Sections Reference
| Section | Required | Purpose |
|---|---|---|
# Title | Yes | Process name (H1 heading) |
## Goal | Yes | One-line description of what the process achieves |
## Requirements | No | Devices, tools, or conditions needed |
## Steps | Yes | Numbered procedure steps |
## Error Handling | No | What to do when things go wrong |
## Safety | No | Safety constraints specific to this procedure |
Tools: list_processes, read_process, write_process
Running on a local LLM
Muxit AI can run against a local model server instead of the managed cloud proxy. Useful for air-gapped labs, regulated environments, or for cutting cloud spend on long-running agents.
Local providers are gated behind the Local LLM Pro feature — activate a Pro license (or trial) before switching.
Ollama
- Install Ollama and start the daemon (
ollama serve). - Pull a model with tool-use support:
ollama pull llama3.2(orqwen2.5,mistral-nemo, …). - Open Settings → AI Services, pick Ollama, leave the base URL at
http://localhost:11434/v1, set the model name to match what you pulled, and click Test connection.
LM Studio
- Install LM Studio and load a model from the "Discover" tab.
- Open the Local Server tab and start the server on port 1234.
- In Muxit, Settings → AI Services → LM Studio. Base URL defaults to
http://localhost:1234/v1. Useautoas the model name to pick whatever LM Studio currently has loaded.
Any OpenAI-compatible endpoint
Pick OpenAI-compatible for vLLM, llama-server, OpenRouter direct, or any service exposing /chat/completions. Provide the full base URL (including the /v1 segment if required) and an API key.
What works on local providers
- Streaming chat with tool use — works on Ollama ≥ 0.4 and LM Studio (current versions). Older releases without OpenAI-style tool support will fall back to plain chat without tool calls.
- The
ai()script global, the agent inference loop, and AI-powered object detection all transparently use the active provider. - Vision works against multimodal models like
llava,qwen2-vl, orllama3.2-vision. Useai("describe this", await camera.snapshot())from scripts.
What doesn't
- Per-call usage / credits reporting (cloud-only).
- The OpenRouter web-search plugin used for SCPI authoring (cloud-only).
Configuration Reference
AI Config (server.json)
| Field | Type | Default | Description |
|---|---|---|---|
ai.provider | string | "muxit" | LLM backend — "muxit", "ollama", "lmstudio", or "openai-compatible" |
ai.model | string | provider default | Active model id (cloud uses OpenRouter ids; local uses model name) |
ai.maxTokens | number | 4096 | Max tokens per response |
ai.instructions | string | "" | Custom AI system prompt |
ai.promptProfile | string | "standard" | "minimal" strips device schemas + decision tree from the system prompt — fits 4–8K local-LLM context windows |
ai.providers.<id>.baseUrl | string | per provider | Endpoint URL for local providers |
ai.providers.<id>.apiKey | string | "" | Optional bearer token (openai-compatible only) |
ai.providers.<id>.model | string | per provider | Per-provider default model |
ai.access.<connector>.<item>.enabled | boolean | true | Hide a property/action from AI |
ai.access.<connector>.<item>.instructions | string | "" | Per-property/action AI note |
You can also embed AI instructions directly in connector config files using the ai.instructions field — see the Connector Guide for details.
Voice Config (server.json)
| Field | Type | Default | Description |
|---|---|---|---|
voice.tts.enabled | boolean | false | Enable text-to-speech for AI chat replies |
voice.tts.scriptEnabled | boolean | true | Speak script say() output through the status-strip toggle |
voice.wakeWord.enabled | boolean | false | Enable "Muxit" wake word |
voice.autoSend | boolean | true | Auto-send voice input |