Skip to content

AI & Voice

Muxit AI lets you control lab hardware using natural language — via chat or voice in the browser UI. It uses a managed AI proxy service with a safety gate that requires user approval before executing device commands.

In one line

MCP (bring your own AI client) is free on every tier. The built-in Chat Panel, voice, and the script ai() function (including the ai(prompt, image) vision overload) require Maker. The Vision AI tool surface (Chat Panel + MCP take_snapshot, object detection, spatial mapping), Local LLM, and Autonomous Agents require Pro. See AI features by tier below for the full picture.

AI features by tier

CapabilityFreeMakerPro+
MCP server — Claude Desktop / Claude Code / ChatGPT* drives your hardware via your own AI provider
Connector ai: { instructions } — your safety rules and device context appear in every AI prompt
JS scripts — write automations (3 on Free, unlimited on Maker+)
Hand-authored connectors — write .js connector configs yourself
say(), ask.confirm/choose/text — TTS and dashboard prompts
Chat Panel — built-in chat with tool calling, in the browser UI
Voice — speech-to-text input, spoken responses
ai(prompt) and ai(prompt, image) script globals — single-shot LLM calls from automations, including base64-JPEG vision input
AI-assisted SCPI authoring — "Set up with AI" button drafts a connector from a programming manual
AI credits / month5001500 (Pro)
Vision AI tools — agentic camera workflows in Chat Panel and MCP (take_snapshot, identify_objects, locate_object, OpenCV trackers, spatial calibration)✓ (Pro)
Local LLM — point Muxit at Ollama or LM Studio for offline operation✓ (Pro)
Autonomous Agents — long-running goal-driven AI✓ (Pro)

* ChatGPT's connector UI rejects loopback URLs, so it needs a public HTTPS tunnel and the security caveats in the MCP Server section. Claude Desktop and Claude Code work locally with no extra setup.

What you get on Free

A lot. Point Claude Desktop, Claude Code, or ChatGPT (with a tunnel — see below) at Muxit's MCP server and your AI client gets the same 56+ tools the built-in chat uses — list connectors, read and write properties, call actions, take camera snapshots, run and write scripts, write connector configs. You drive your bench in natural language using your own AI subscription, and Muxit charges nothing for it.

You can write JS scripts (up to 3), author connectors by hand, and embed ai: { instructions } blocks so any AI that touches your hardware sees your safety rules and device context. say(), ask.confirm, and dashboard prompts all work. For a one-person lab already using Claude Code, this is a complete workflow.

What Maker unlocks

The things MCP can't give you because they live inside the Muxit window:

  • Chat Panel and voice — talk to your bench while your hands are full, see tool calls inline, no external client to juggle.
  • ai() in scriptsif (ai("Is this reading abnormal?", reading) === "yes") …. Single-shot inference inside your automations, no API key plumbing. Pass a base64 JPEG as the second argument (e.g. ai("Is the part aligned?", connector('webcam').snapshot)) for explicit vision input — no extra tier required.
  • AI-assisted SCPI authoring — point the AI at a programming manual; it probes the device, drafts a connector, validates it, and brings it online. Hours of vendor-PDF spelunking become a chat turn.
  • 500 AI credits / month included — managed proxy, no per-request billing to set up.

Beyond Maker

Pro adds the Vision AI tool surface — the Chat Panel and MCP gain take_snapshot, OpenCV-backed object detection, spatial mapping, and visual-servoing tools so the AI assistant can decide on its own when to look at the bench. (Maker users can already drive vision explicitly from scripts via ai(prompt, image); Pro is what lets the agent request a frame.) Pro also unlocks Local LLM support (Ollama / LM Studio for fully offline operation) and Autonomous Agents for long-running goal-driven work — plus a 1500-credit allowance.

Ways to Use AI

MethodWhat it isBest for
Chat PanelBrowser sidebar with natural language inputInteractive use in the web UI
MCP ServerModel Context Protocol over HTTP and stdioClaude Code, Claude Desktop, MCP-compatible AI tools

Quick Start

1. Chat Panel (Browser Web UI)

The Chat Panel provides a full AI assistant that can control your devices, read sensors, write scripts, and answer questions — all via natural language (text or voice). It uses an agentic tool-calling loop: the LLM decides which tools to call, executes them, and continues until it has a final answer.

Muxit AI is the default — no API keys needed. It uses your Muxit license to authenticate with a managed AI proxy service at api.muxit.io.

Click the gear icon to open Settings. Under AI Services, select Muxit AI.

Model selection:

  • Auto (recommended) — Muxit selects the best model for each request based on task complexity and your subscription tier. This is the default.
  • Manual — Pick a specific model from your favorites list to always use that model.

Credits are included with your Muxit subscription. You can purchase additional credits at muxit.lemonsqueezy.com.

Model favorites: You can curate a favorites list in Settings under Model Favorites. These appear in both the Settings model dropdown and the Chat header model picker for quick switching.

Starting the server

bash
node start.js server       # MuxitServer (serves web UI at http://127.0.0.1:8765)
node start.js server       # Auto-open browser

Open AI Chat by clicking the chat icon in the Activity Bar.

2. MCP Server

The MCP server is built directly into MuxitServer (C#), exposing device control tools and resources over the Model Context Protocol. It supports both HTTP and stdio transports.

HTTP transport — Available at /mcp on the running server (e.g. http://127.0.0.1:8765/mcp). Starts automatically with the server.

Stdio transport — For Claude Code and Claude Desktop integration:

bash
node start.js mcp                    # Via start script
dotnet run --project MuxitServer -- --mcp  # Direct

Claude Code

If you have an installed Muxit, register the MCP server with Claude Code from the directory you launch Claude Code in:

bash
claude mcp add muxit -- muxit --mcp --workspace "/path/to/your/workspace"

Replace muxit with the full path to the binary if it's not on your PATH (e.g. C:\Program Files\Muxit\muxit.exe or /Applications/Muxit.app/Contents/MacOS/muxit). This writes the config to ~/.claude.json so it's available across all your projects.

If you're working in this repository's source tree, the project-level .mcp.json at the repo root is already wired up — Claude Code picks it up automatically when you launch from this directory. It runs dotnet run --project MuxitServer -- --mcp --workspace workspace, so it only works inside this checkout.

Claude Desktop — Add to claude_desktop_config.json (~/Library/Application Support/Claude/claude_desktop_config.json on macOS, %APPDATA%\Claude\claude_desktop_config.json on Windows):

json
{
  "mcpServers": {
    "muxit": {
      "command": "/path/to/muxit",
      "args": ["--mcp", "--workspace", "/path/to/your/workspace"]
    }
  }
}

Use the absolute path to the installed muxit executable. Restart Claude Desktop after editing.

ChatGPT — ChatGPT's connector UI rejects loopback and private-IP addresses (127.0.0.1, localhost, 10.*, 192.168.*) as "unsafe url" and requires a publicly reachable HTTPS endpoint with a valid TLS certificate. This is a constraint of ChatGPT, not Muxit — there's no way to point ChatGPT at the local /mcp URL directly.

To use ChatGPT with Muxit you have to expose the local endpoint over a public HTTPS tunnel. Quickest path with Cloudflare Tunnel:

bash
cloudflared tunnel --url http://127.0.0.1:8765
# → https://<random-name>.trycloudflare.com

Then in ChatGPT, Settings → Apps → Advanced settings → Create app, enter https://<random-name>.trycloudflare.com/mcp as the MCP server URL with No Auth.

Security warning — read this before running a tunnel

Muxit's HTTP API treats every loopback request as authenticated, and a tunnel daemon (cloudflared, ngrok, etc.) running on the same machine reaches Muxit over loopback. A public tunnel exposes your hardware to anyone who guesses the URL — they can read sensors, write properties, and run scripts with no credentials.

Muxit's optional password (security.remoteAccess + X-Auth-Token header) doesn't help here either: it's bypassed for loopback callers, and it uses a custom header name that ChatGPT's connector can't send.

Acceptable mitigations:

  • Tailscale Funnel restricted to your tailnet — the tunnel isn't actually public, only your devices can reach it.
  • Cloudflare Access in front of a Cloudflare Tunnel — adds an auth gate Muxit doesn't ship today.
  • A short-lived cloudflared/ngrok URL spun up only for the duration of a session and torn down right after.

If none of those apply, prefer Claude Desktop or Claude Code, which connect to the local MCP server over stdio and don't need a tunnel at all.

TIP

Open Settings > MCP in the Muxit GUI for ready-to-copy config snippets with your actual server paths.

MCP Tools

ToolDescriptionCategory
list_connectorsList all devices and capabilitiesDevice Control
get_connector_schemaGet full schema for a deviceDevice Control
read_propertyRead a device propertyDevice Control
write_propertySet a device propertyDevice Control
call_actionExecute a device actionDevice Control
get_device_stateSnapshot of all cached stateDevice Control
list_scriptsList available and running scriptsScripts
run_scriptStart a named script (bounded wait, returns running on long scripts)Scripts
run_codeExecute inline JavaScript (bounded wait, returns running on long scripts)Scripts
stop_scriptStop a running scriptScripts
get_script_statusCheck status / result / error of a running or recently-finished scriptScripts
get_script_outputFetch buffered log output (with since_seq cursor)Scripts
wait_for_scriptBlock up to 120s waiting for a running script to finishScripts
read_script_sourceRead a script's source codeFiles
write_scriptCreate or update a script fileFiles
read_startup_scriptRead a startup script's source codeFiles
write_startup_scriptCreate or update a startup scriptFiles
read_connector_configRead a connector config fileFiles
write_connector_configCreate or update a connector configFiles
list_driversList available driversSystem
get_driver_schemaGet full schema for a specific driverSystem
get_server_configGet server configuration (redacted)System
take_snapshotCapture a camera image for visual analysisVision
configure_visionSet up OpenCV trackers for real-time detectionVision
read_detectionsRead current tracked object positions (fast)Vision
identify_objectsAI-powered object detection with names/positionsVision
locate_objectFind object and return world coordinatesVision
calibrate_cameraCamera-to-world coordinate calibrationVision
teach_objectTeach an object for persistent real-time trackingVision
forget_objectRemove a taught objectVision
list_objectsList all taught objectsVision
approach_objectVisual servoing — iterative guidance to move toward an objectVision
list_agentsList agent configs and running instancesAgents
start_agentStart an agent with a goalAgents
stop_agentStop a running agentAgents
get_agent_statusGet current status of an agentAgents
pause_agentPause a running agentAgents
resume_agentResume a paused agentAgents
read_instructionsRead the central lab instructions fileInstructions
write_instructionsUpdate the central lab instructions fileInstructions
list_processesList available process definitionsProcesses
read_processRead a process definition by nameProcesses
write_processCreate or update a process definitionProcesses
list_dashboardsList dashboard files in workspace/dashboards/Dashboards
read_dashboardRead a dashboard's JSON contentDashboards
write_dashboardCreate or update a .dashboard.json fileDashboards

MCP Resources

URIDescription
muxit://connectorsList of all connectors with schemas
muxit://connector/{name}/schemaSchema for a specific connector
muxit://stateCurrent device state snapshot

Tool-Call Approvals

Tool-call approval prompts are driven by the global safety level (set via the SafetyChip in the status strip), not by a separate AI-only switch. In Observe and Assisted, the Chat Panel shows an approval dialog with the tool name, parameters, and Allow/Deny buttons. In Active and Unrestricted, tool calls execute without prompting (destructive actions still confirm per the safety policy). See the Safety Guide for the full behaviour table.


AI Chat Tools

The Chat Panel uses the same tools as the MCP server. The AI assistant decides which tools to call based on your request. Depending on the current safety level, tool calls may show an approval dialog before executing.

SCPI instrument setup

Adding a new oscilloscope, multimeter, power supply, or signal generator can be a chat-driven flow: probe the device with *IDN?, look up its command set online, write the connector config, and hot-reload — no server restart. Click Set up with AI on the Add Connector dialog (visible when you select GenericScpi), or ask the assistant directly — mentions of SCPI, an instrument class, or a well-known vendor (Rigol / Siglent / Keysight / Tektronix / Keithley / …) route the turn through four dedicated tools and enable OpenRouter's web-search plugin for that turn. See AI-assisted SCPI setup for the full walkthrough.

Guided Tour & Docs Access

The chat has direct access to every page in this documentation site, so "how do I…" questions get real guide content quoted back instead of a best-guess answer. Three tools power this:

ToolWhat it does
list_docsEnumerates every bundled doc path (guides, getting-started, reference, examples).
search_docsFull-text, case-insensitive search — returns the top hits with a short snippet.
read_docReturns the raw markdown for a given path, e.g. guides/connectors.md.

The docs are embedded into the server binary at build time, so they're available offline and always match the running version. The chat picks them up automatically when your question includes phrases like how do I, walk me through, tour, tutorial, getting started, or what is.

The Help → Guided Tour menu in the title bar is the fastest way in: each item opens the chat and sends a pre-written starter prompt so the AI walks you through a specific task.

  • Guided Tour (AI) — "Give me a tour of Muxit"
  • Tour: Install a Driver — finds the current driver-marketplace procedure
  • Tour: Create a Connector — walks through the first-connector guide
  • Tour: Drag Properties into a Script — explains how the hardware pane's drag targets work
  • Tour: Build a Dashboard — walks through dashboards and widget binding

You can also ask for a tour in your own words — the menu is a convenience, not a gate.

Vision (Camera Snapshots)

The take_snapshot tool captures a single frame from any camera connector (USB webcam, IP camera) and returns it as an image content block. The LLM can then see and describe the image — useful for checking robot positions, reading instrument displays, or verifying alignments.

Requirements:

  • A camera connector must be configured and initialized
  • The LLM model must support vision (Claude Sonnet/Opus, GPT-4o, Gemini)

Example prompts:

  • "Take a snapshot from the webcam and tell me what you see"
  • "Is the robot arm aligned with the target? Check the camera"
  • "What's the reading on the oscilloscope display?"

Voice Commands

Voice is a core Muxit feature — control your devices by speaking. The Chat Panel header includes a mic button with a dropdown for voice settings.

Two Modes

ModeHow it works
Push to Talk (PTT)Hold the mic button to talk. Release to stop. Your speech is transcribed and sent as a chat message.
Hands-freeClick the mic button to toggle listening. Behavior depends on wake word setting.

Hands-free sub-modes (configured in Settings > Voice > Wake Word):

Wake Word SettingHands-free Behavior
EnabledSay "Muxit" to start a conversation. Muxit listens, sends your command, speaks the AI response (if TTS is on), then automatically resumes listening — no need to say the wake word again. The conversation ends after an adjustable silence timeout (default 10s).
DisabledMuxit listens continuously. When you pause speaking, the transcript is automatically sent.

Conversation Loop

When both wake word and TTS are enabled in hands-free mode, Muxit supports a natural spoken conversation:

  1. Say "Muxit" to start the conversation
  2. Speak your command — it's transcribed and sent to the AI
  3. The AI responds and TTS speaks the response
  4. Muxit automatically starts listening again (no wake word needed)
  5. Continue the conversation naturally
  6. When you stop talking, the conversation ends after a configurable idle timeout

Idle timeout: Adjust how long Muxit waits for you to speak before ending the conversation. Open the mic dropdown (▾ next to the mic button) and use the Idle slider (3–30 seconds, default 10s). You can also manually end the conversation by clicking the Stop button.

Text-to-Speech

Toggle TTS on/off with the speaker icon in the chat header. When enabled, AI responses are spoken aloud using browser speech synthesis. The AI automatically keeps responses short and voice-friendly.

Configure voice, rate, and pitch in Settings > Voice > Text-to-Speech.

The chat-header toggle controls AI replies only. Script say() output has its own toggle in the top status strip (next to STOP ALL) so you can mute one source without silencing the other. Both toggles share the same voice / rate / pitch settings.

Wake Word

Say "Muxit" to start a voice conversation hands-free. Enable in Settings > Voice > Wake Word.

Voice uses browser speech recognition (Web Speech API). Works in Chrome and Edge.

Settings Apply Instantly

Changes to Voice, AI Services, and AI Behavior settings take effect without a page refresh:

  • AI model and system prompt — applied on your next chat message.
  • TTS voice, rate, pitch, and enabled toggle — applied on the next utterance.
  • Wake word phrase, enable/disable, and microphone selection — applied immediately; the background listener restarts in place.

A small Saved — applied immediately badge appears in the top-right of the Settings pane after each change to confirm it was saved.


AI in Scripts

Scripts can make single-shot LLM calls using the ai() global:

javascript
// Text-only query
const answer = ai("Classify this reading as normal or abnormal: 47.3 ohms");
log.info(answer);

// Vision: analyze a camera image
const cam = connector('webcam');
const frame = cam.snapshot;
const description = ai("Describe what you see in this image", frame);
log.info(description);

// Video recording
const file = cam.record({ seconds: 10 });
log.info(`Recorded: ${file}`);

ai(prompt, image?) is a single-shot call — no conversation memory, no tool access, no agentic loop.


Agent Mode

Agents are autonomous AI instances that can coordinate multiple devices to accomplish goals. Unlike the chat loop (which responds to single messages), agents are persistent, goal-oriented, and can react to device events.

Quick Start

From chat:

"Pick up the red part and place it in the tray"

From saved config (workspace/agents/*.agent.json):

json
{
    "name": "pick-and-place",
    "description": "Pick a part and place it in the tray",
    "devices": ["robot", "camera"],
    "autonomy": "supervised",
    "safety": {
        "maxSpeed": 50,
        "workspace": { "x": [0, 600], "y": [-300, 300], "z": [50, 400] },
        "maxForce": 20,
        "requireVisionConfirm": true
    },
    "instructions": "Always approach from above. Verify grip before lifting.",
    "parameters": {
        "partColor": { "type": "string", "default": "red" }
    }
}

Autonomy Levels

LevelBehaviorBest For
supervisedEach action shown to user in real-time, proceeds unless stoppedFirst-time tasks, dangerous ops
plan-approveAgent creates plan, user approves, then executes freelyRepetitive tasks
guardrailsRuns freely within safety boundaries, pauses only on limit violationTrusted tasks
fullNo approval needed (safety limits still enforced)Well-tested automation

Safety Boundaries

Enforced regardless of autonomy level:

BoundaryConfig KeyDescription
Workspace limitssafety.workspaceBounding box the robot cannot leave
Speed limitssafety.maxSpeedMax velocity as % of device max
Force limitssafety.maxForceMax force (N) before emergency stop
Rate limitssafety.maxActionsPerMinutePrevent hammering hardware
Vision confirmsafety.requireVisionConfirmCamera snapshot before/after critical actions

When requireVisionConfirm is enabled, the safety gate enforces that a vision tool (take_snapshot, read_detections, identify_objects, locate_object, or approach_object) was called within the last 30 seconds before any movement or grip action is allowed. If no recent snapshot exists, the action is blocked with an error message prompting the agent to take a snapshot first.

Operational Limits

LimitDefaultConfig
Max concurrent agents3agents.maxConcurrent in server.json
Approval timeout (supervised mode)5 minutesNot configurable — action is auto-denied if user doesn't respond
Max iterations100maxIterations in agent config (-1 for unlimited)
Max runtime300 secondstimeoutSeconds in agent config
Completed agent cleanup5 minutesCompleted/failed/stopped agents are removed from the running list after 5 minutes

Starting Agents from Chat

You can start an agent directly from the AI chat by describing a multi-step task. The chat system creates an ad-hoc agent with default settings:

"Pick up the red part and place it in the tray"

Ad-hoc agents use supervised autonomy by default and inherit the global LLM model. For more control, create a saved config file in workspace/agents/*.agent.json.

Event Triggers

Agents can start automatically when device conditions are met:

json
{
    "triggers": [
        {
            "event": "state:psu.temperature",
            "condition": "gt:80",
            "cooldownSeconds": 120,
            "goal": "Temperature too high — reduce power"
        }
    ]
}

Condition formats: gt:80, lt:10, eq:true, neq:off, changed

Process Files

Agents can reference structured procedure files that guide them through complex tasks. Process files live in workspace/processes/*.process.md and are linked via the processFile config field:

json
{
    "name": "pick-and-place",
    "devices": ["robot", "camera"],
    "processFile": "pick-and-place",
    "instructions": "Additional agent-specific notes..."
}

When the agent starts, the process content is injected into its system prompt. See Process Definitions for the file format.

Agent Timeline & Transparency

Running agents stream a real-time timeline of their execution, showing what the agent is thinking, doing, and observing. Each entry has a type:

TypeMeaningExample
thoughtLLM reasoning text"I can see the red ball at (320, 240). I'll move above it first."
actionTool call being executedcall_action({connector: "robot", action: "moveJ", ...})
observationTool result receivedread_property → {"x": 150, "y": 30, "z": 200}
errorTool failure or safety blockBlocked: moveJ — speed=75% exceeds max 50%
user_inputAgent asked a question"Which object should I pick up?"

The Reasoning field shows the agent's latest LLM output — what it's currently "thinking" before choosing its next action.

Agent status includes:

  • iteration — number of LLM calls made
  • tokensUsed — approximate tokens consumed
  • elapsedSeconds — time since start
  • timeline — full chronological event list
  • reasoning — latest LLM reasoning text

Use agent.detail WebSocket message to fetch the full timeline, or subscribe to agent.timeline broadcasts for real-time streaming. The UI's AgentMonitor component displays all of this with tabs for Timeline, Plan, and Logs.

Per-Agent Model Selection (Cost Optimization)

Each agent can override the global LLM model and token limit. Simple agents (monitoring, condition checking) can use cheap models while complex agents (vision-guided manipulation) use premium models.

json
{
    "name": "temperature-monitor",
    "devices": ["test-device"],
    "model": "google/gemini-2.5-flash",
    "maxTokens": 1024,
    "maxIterations": 30
}

Model cost tiers:

TierModels~Cost/MTok
$ (cheap)google/gemini-2.5-flash, deepseek/deepseek-chat-v3-0324, openai/gpt-4o-mini$0.10–0.30
$$ (mid)anthropic/claude-haiku-4-5, google/gemini-2.5-pro, openai/gpt-4o$1–5
$$$ (premium)anthropic/claude-sonnet-4-5$5–15

If model is omitted, the agent uses the global model from Settings. Anthropic models automatically use prompt caching for repeated system prompts (~50% input token savings in agent loops).

Smart Tool Filtering (Automatic Cost Optimization)

Muxit automatically classifies each chat message to determine which tools are relevant, then sends only those tools to the AI model. This dramatically reduces token overhead and improves reliability with cheaper models.

How it works:

  • Muxit AI: A free classifier model analyzes your message before the main AI call. This adds ~100-300ms latency but costs nothing.
  • MCP providers: Classification is handled by the external AI client.
  • Fallback: If classification fails, all tools are sent (same as previous behavior).

Example impact:

RequestTools SentToken Savings
"Move robot to 500,500,400"6 (core only)~80% fewer tool tokens
"Write a monitoring script"12 (core + scripts)~65% fewer tool tokens
"Take a photo of the workspace"12 (core + vision)~65% fewer tool tokens
"What's the meaning of life?"34 (all tools)0% (general fallback)

Tool groups:

GroupToolsWhen Active
Corelist_connectors, read/write_property, call_action, get_device_state, get_connector_schemaAlways
Scriptslist/run/stop/write scripts, run_codeScript/automation requests
Visiontake_snapshot, configure_vision, detections, identify/locate objects, calibrate, teach/forget/list objects, approach_objectCamera/vision requests
Agentslist/start/stop agents, agent statusAgent/autonomous requests
Memorysave/delete/list memoriesMemory requests
Configconnector configs, drivers, server configConfig requests
Instructionsinstructions, process definitionsProcess/instruction requests

The script API guide (~500 tokens) is also excluded from the system prompt when scripts are not relevant, further reducing costs.

This feature requires no configuration — it's always active and falls back gracefully.


When to Use What

ApproachBest ForHow It Works
Chat PanelInteractive questions, one-off commandsYou type, AI responds with tool calls
Scripts with ai()Automated decisions, classification, visionScript calls ai() for a single LLM response
AgentsMulti-step autonomous goals, reactive automationPersistent AI loop: plan, act, observe, re-plan
MCP ToolsExternal AI tools (Claude Code/Desktop)AI calls Muxit tools via MCP

AI Memory

Muxit AI remembers facts and preferences across sessions. Tell the AI "remember that..." to save a memory, or "forget that..." to remove one.

Memories are stored locally in workspace/config/ai-memory.json. Click the M button in the Chat Panel header to view, edit, or delete memories.

CategoryExamples
preference"User prefers metric units"
device"The PSU on the left bench is named 'main-psu'"
procedure"Calibration: reset PSU, set 5V, wait 10s, read"
note"Don't run scripts during backup (2-3am)"

Vision-Guided Robot Control

Muxit supports closed-loop visual servoing — using camera feedback to guide robot movements in real time. The system uses a two-speed approach:

  • Fast local CV (VisionDriver + OpenCV): Runs color/contour detection at camera frame rate for real-time tracking. No LLM calls, no latency.
  • Slow LLM vision (take_snapshot + AI): Captures a frame and sends it to the LLM for high-level scene understanding, object identification, and planning. Used for initial assessment and verification, not tight control loops.

The combination lets agents plan with full visual intelligence but execute with the speed of classical computer vision.

Setting Up a Vision Connector

Create a connector config that uses the Vision driver with a webcam source:

js
// workspace/connectors/eye.js
export default {
  driver: "Vision",
  config: { source: "webcam" },
  properties: {
    frame: () => driver.frame(),
    detections: () => driver.detections()
  },
  methods: {
    detectColor: [() => driver.detectColor(), "Run HSV color detection"],
    detectContours: [() => driver.detectContours(), "Run contour detection"]
  },
  poll: ["detections"]
};

The source can be "webcam" (default USB camera) or a camera connector name for IP cameras.

Tracker Types

TrackerHow It WorksBest For
color (HSV)Filters pixels by hue/saturation/value range, finds centroidsTracking brightly colored objects (red ball, green LED)
contourEdge detection + contour finding, returns bounding boxes and areasTracking shapes regardless of color (parts, tools, containers)

Both trackers run locally via OpenCV with no LLM calls, providing detections at frame rate.

Vision Workflow

The vision system combines two approaches:

LayerToolSpeedCapability
Fast CVconfigure_vision + read_detections~30 fpsTrack objects by color/shape (HSV, contour)
AI Visiontake_snapshot~2-5sIdentify objects by name, understand scenes, reason about spatial layout
AI Detectionidentify_objects~2-5sFind named objects with pixel coordinates and bounding boxes
Spatiallocate_object~2-5sFind object and return world (robot) coordinates if calibrated
Teachingteach_object~3-5sLearn an object for persistent real-time tracking
Servoingapproach_object~0.1-5sIterative guidance to move robot toward an object

Typical workflow for vision-guided tasks:

  1. take_snapshot — understand what's in the scene
  2. identify_objects(query: "red spoon") — find specific object with coordinates
  3. teach_object — learn the object for persistent real-time tracking (saves profile + creates CV tracker)
  4. read_detections — poll tracker position at high speed during approach
  5. approach_object — iterative visual servoing to move toward the object (works without calibration)

Use AI tools (take_snapshot, identify_objects) for planning and verification. Use CV tools (read_detections) or approach_object for fast feedback during motion. Use teach_object to bridge the gap — AI identifies objects once, then OpenCV tracks them at 30fps.

AI Object Detection (identify_objects)

Sends a camera snapshot to the LLM with a structured detection prompt. Returns named objects with pixel coordinates and an annotated verification image showing bounding boxes drawn over each detected object.

Parameters:

ParameterTypeRequiredDescription
camerastringNoCamera connector name (auto-detects if omitted)
querystringNoSpecific object to find (e.g., "red spoon"). Omit to detect all visible objects

Response:

json
[
  {
    "name": "red spoon",
    "center": { "x": 320, "y": 240 },
    "size": { "width": 80, "height": 30 },
    "confidence": "high"
  },
  {
    "name": "metal bowl",
    "center": { "x": 500, "y": 350 },
    "size": { "width": 120, "height": 100 },
    "confidence": "medium"
  }
]

Key differences from read_detections:

identify_objectsread_detections
Speed~2-5 seconds (LLM call)Instant (local OpenCV)
SetupNoneRequires configure_vision first
CapabilitiesIdentifies objects by nameTracks by color/shape only
CostUses LLM tokensFree (local processing)

Results are cached for 5 seconds — repeated calls within that window are free.

The tool also returns an annotated camera image with green bounding boxes, red centroid dots, and name/confidence labels drawn over each detected object. This image is injected into the AI conversation for visual verification — the LLM can see what it detected and confirm accuracy before proceeding.

Locating Objects (locate_object)

Combines AI object detection with spatial mapping to return physical coordinates.

Parameters:

ParameterTypeRequiredDescription
camerastringYesCamera connector name
object_namestringYesObject to find (e.g., "spoon", "red ball")

Response (with calibration):

json
{
  "found": true,
  "name": "red spoon",
  "pixel": { "x": 320, "y": 240 },
  "world": { "x": 150.5, "y": -30.2, "z": 85.0 },
  "confidence": "high",
  "calibrated": true
}

Response (without calibration):

json
{
  "found": true,
  "name": "red spoon",
  "pixel": { "x": 320, "y": 240 },
  "confidence": "high",
  "calibrated": false,
  "note": "Camera not calibrated — only pixel coordinates available. Use calibrate_camera to enable world coordinate mapping."
}

Spatial Mapping (Camera Calibration)

The calibrate_camera tool teaches the system how pixel coordinates map to real-world (robot) coordinates using a teach-by-example approach. No camera intrinsic parameters or lens models needed.

How it works:

  1. Move the robot end-effector to a position visible in the camera
  2. Record the pixel position (from identify_objects or read_detections) and the robot's known coordinates
  3. Repeat for 4+ positions spread across the workspace
  4. The system computes a 2D affine transform (least-squares fit)

Calibration actions:

ActionParametersDescription
startcameraBegin a new calibration session
pointcamera, pixel_x, pixel_y, world_x, world_y, world_zRecord a calibration point
finishcameraCompute transform and save
statuscameraCheck calibration state

Example calibration flow:

1. calibrate_camera(camera: "webcam", action: "start")
2. Move robot to position (100, 0, 80), see it at pixel (150, 400)
   calibrate_camera(camera: "webcam", action: "point", pixel_x: 150, pixel_y: 400, world_x: 100, world_y: 0, world_z: 80)
3. Repeat for 3-5 more positions...
4. calibrate_camera(camera: "webcam", action: "finish")

Tips for good calibration:

  • Use at least 4 points spread across the workspace (corners + center)
  • All points should be at roughly the same Z height (the system assumes a flat work plane)
  • The more spread out the points, the better the accuracy
  • Recalibrate if the camera or its mount moves

Storage: Calibration data persists in workspace/config/calibrations/{camera}.json.

Object Teaching (teach_object)

The teach_object tool bridges AI vision and fast CV tracking. When the LLM identifies an object via identify_objects or take_snapshot, it can "teach" the object to the vision system — sampling its color, storing a persistent profile, and creating a real-time OpenCV tracker.

How it works:

  1. AI vision identifies an object (e.g., "screwdriver" at pixel 320, 240)
  2. teach_object samples the color at that pixel location via calibrateColor
  3. A profile is saved to workspace/config/objects.json with HSV range, typical size, and description
  4. A color tracker is auto-created on the vision connector for 30fps tracking
  5. A verification snapshot is captured showing the new tracker's detection overlay — this image is returned to the AI conversation so both the LLM and user can confirm the tracker is working correctly
  6. On server restart, taught objects auto-restore their trackers

Parameters:

ParameterTypeRequiredDescription
namestringYesObject name (e.g., "screwdriver")
camerastringYesCamera connector name
pixel_xnumberYesCenter X pixel coordinate
pixel_ynumberYesCenter Y pixel coordinate
widthnumberNoBounding box width (improves color sampling)
heightnumberNoBounding box height
descriptionstringNoVisual description for context
tracker_typestringNo"color" (default) or "contour"

Example conversation:

User: "Learn what the red ball looks like"

AI: Takes a snapshot, identifies the red ball at pixel (320, 240) with size 60x60.Calls teach_object(name: "red ball", camera: "webcam", pixel_x: 320, pixel_y: 240, width: 60, height: 60)

"I've taught the vision system to recognize the red ball. Here's the verification image — I can see the green bounding box tracking it correctly in the center of the frame. It's now being tracked in real-time at 30fps."

Related tools:

  • forget_object(name) — Remove a taught object and its tracker
  • list_objects() — List all taught objects with their profiles

Storage: Object profiles persist in workspace/config/objects.json.

Direct Vision Annotation

For faster, cheaper object teaching without AI, you can draw bounding boxes directly on the camera feed in the dashboard. This bypasses the LLM entirely — you see it, box it, name it.

Setup:

  1. Add a Canvas widget to your dashboard with stream set to vision:annotated
  2. In the widget config, set Vision Connector to vision (or your vision connector name)
  3. An annotation toolbar appears on the canvas

Teaching objects:

  1. Click Draw in the toolbar — cursor becomes a crosshair
  2. Click and drag to draw a bounding box around the object you want to track
  3. Type a name in the popup input and press Enter (or click Teach)
  4. The system samples the color in that region, creates a tracker, and begins real-time tracking immediately
  5. You'll see the green tracking box appear within 1-2 frames

Managing objects:

  • Click an object's label in the overlay to select it
  • Click the X button to delete (forget) a tracked object
  • Press Escape to cancel drawing or deselect

This uses the same vision.teach, vision.forget, and vision.list WebSocket messages — see the WebSocket API reference for details.

Visual Servoing (approach_object)

The approach_object tool provides iterative guidance for moving a robot toward an object without calibration. It works by computing how far the object is from the center of the camera frame, then suggesting a move direction and step size.

How it works:

  1. Finds the object using fast CV tracking (if taught) or AI detection (fallback)
  2. Computes error: how far the object's center is from the frame center (normalized -1 to 1)
  3. Returns a suggested move vector proportional to the error
  4. The LLM/agent moves the robot, then calls approach_object again
  5. Repeat until the object is centered (error < 10%)

Parameters:

ParameterTypeRequiredDescription
camerastringYesCamera connector name
object_namestringYesObject to approach
step_sizenumberNoMove step size in mm (default: 10)
vision_connectorstringNoVision connector name (default: "vision")

Response:

json
{
  "found": true,
  "centered": false,
  "object_name": "screwdriver",
  "pixel": { "x": 450, "y": 180 },
  "frame": { "width": 640, "height": 480 },
  "normalized": { "x": 0.703, "y": 0.375 },
  "error": { "x": 0.406, "y": -0.25 },
  "suggestion": { "dx": -4.1, "dy": 2.5 },
  "distance": "medium",
  "quadrant": "right-top",
  "message": "Object 'screwdriver' is in right-top quadrant. Suggested move: dx=-4.1, dy=2.5 mm."
}

Usage pattern (agent or script):

loop:
  result = approach_object(camera: "webcam", object_name: "screwdriver", step_size: 5)
  if result.centered: break
  robot.moveRelative(result.suggestion.dx, result.suggestion.dy, 0)
  wait 500ms

Axis Mapping

Camera X/Y may not align with robot X/Y depending on camera mounting. Move the robot in one axis and observe the camera to determine the mapping. Save it to AI memory with save_memory.

See also: workspace/processes/visual-servo.process.md — structured procedure for visual servoing tasks.


Central Instructions

Lab-wide AI instructions live in workspace/config/instructions.md. This is the "standing orders" file — both AI chat and agents read it. Edit it to customize how the AI behaves in your lab.

File Format

markdown
# Lab Instructions

## General
You are controlling a robotics lab. Be careful with motion commands.
Always confirm before moving the robot to a new position.
Prefer slow, safe movements over fast ones.

## Devices
### robot
Collaborative robot arm (Fairino FR series). Max speed 29 when homing.
Always check motionDone before starting new moves.
Coordinates are in mm and degrees.

### webcam
USB webcam for live video streaming.
Safe to adjust quality and mirror settings at any time.

### psu
Power supply unit. Never exceed 30V / 5A.
Always ramp voltage gradually — never jump directly to target.

## Safety
- Never move robot without checking camera first
- Maximum approach speed near objects: 20%
- Always confirm before executing motion commands
- If force readings spike unexpectedly, stop all motion

How It Works

PrioritySourceScope
1 (highest)workspace/config/instructions.md## Devices / ### {name}Per-device
2workspace/config/instructions.md## General + ## SafetyAll AI interactions
3Connector config ai.instructions fieldPer-device fallback
4 (lowest)Built-in defaultsAlways present

The central file takes priority. If a device has a ### device-name section in instructions.md, that replaces the connector's ai.instructions. If not, the connector-level instructions are used as fallback.

Hot Reload

The file is watched for changes — edit it and the new instructions take effect on the next AI interaction (no restart needed).

Editing

  • Via UI: The instructions editor is available via WebSocket (instructions.get / instructions.set)
  • Via AI: Ask the AI to update instructions: "Add a safety rule: never exceed 20% speed near the basket"
  • Via file: Edit workspace/config/instructions.md directly in any text editor

Tools: read_instructions, write_instructions


Process Definitions

Process files are structured "recipes" for complex autonomous tasks. They live in workspace/processes/*.process.md and teach the AI how to decompose goals into steps.

Why Processes?

When a user says "pick up the spoon and drop it in the basket", the AI needs to know:

  • What is a spoon? How to find it? (object detection)
  • Where is the basket? (spatial awareness)
  • How to approach, grasp, transport, and place? (procedure)
  • What to do if the grip fails? (error recovery)

Process files capture this procedural knowledge so it can be reused.

File Format

Files are named {name}.process.md and placed in workspace/processes/. The format uses markdown with standard sections:

markdown
# Pick and Place

## Goal
Pick up an object and place it at a target location.

## Requirements
- Devices: robot, camera (or webcam)
- The robot must have a gripper

## Steps
1. **Survey** — Take a camera snapshot. Identify the target object and destination.
2. **Locate object** — Use AI vision to determine the object's position.
   If uncertain, take snapshots from different angles.
3. **Plan approach** — Calculate an approach path. Move to a safe height first.
4. **Approach** — Move above the object at safe speed (max 20%).
   Take a snapshot to verify position.
5. **Grasp** — Lower to grasp height. Close gripper.
   Take a snapshot to verify grip.
6. **Transport** — Lift to safe height. Move to above the destination.
7. **Place** — Lower to placement height. Open gripper.
   Verify placement with camera.
8. **Retreat** — Return to home position.

## Error Handling
- If object not found: ask user to point it out
- If grip fails (object drops): retry from step 4, max 2 retries
- If position uncertain: take additional snapshots

## Safety
- Never exceed 20% speed when near objects
- Always approach from above
- Verify with camera before and after gripping

Using Processes

With agents: Reference a process in the agent config:

json
{
    "name": "pick-and-place",
    "processFile": "pick-and-place",
    "devices": ["robot", "camera"]
}

From AI chat: The AI can search for relevant processes:

User: "Pick up the spoon and put it in the basket"
AI: [calls list_processes] → finds "pick-and-place"
AI: [calls read_process("pick-and-place")] → reads procedure
AI: [calls start_agent with process] → follows the steps

Creating processes: Ask the AI to create one:

"Create a process for inspecting PCBs under the microscope"

The AI will use write_process to save it to workspace/processes/.

Sections Reference

SectionRequiredPurpose
# TitleYesProcess name (H1 heading)
## GoalYesOne-line description of what the process achieves
## RequirementsNoDevices, tools, or conditions needed
## StepsYesNumbered procedure steps
## Error HandlingNoWhat to do when things go wrong
## SafetyNoSafety constraints specific to this procedure

Tools: list_processes, read_process, write_process


Running on a local LLM

Muxit AI can run against a local model server instead of the managed cloud proxy. Useful for air-gapped labs, regulated environments, or for cutting cloud spend on long-running agents.

Local providers are gated behind the Local LLM Pro feature — activate a Pro license (or trial) before switching.

Ollama

  1. Install Ollama and start the daemon (ollama serve).
  2. Pull a model with tool-use support: ollama pull llama3.2 (or qwen2.5, mistral-nemo, …).
  3. Open Settings → AI Services, pick Ollama, leave the base URL at http://localhost:11434/v1, set the model name to match what you pulled, and click Test connection.

LM Studio

  1. Install LM Studio and load a model from the "Discover" tab.
  2. Open the Local Server tab and start the server on port 1234.
  3. In Muxit, Settings → AI Services → LM Studio. Base URL defaults to http://localhost:1234/v1. Use auto as the model name to pick whatever LM Studio currently has loaded.

Any OpenAI-compatible endpoint

Pick OpenAI-compatible for vLLM, llama-server, OpenRouter direct, or any service exposing /chat/completions. Provide the full base URL (including the /v1 segment if required) and an API key.

What works on local providers

  • Streaming chat with tool use — works on Ollama ≥ 0.4 and LM Studio (current versions). Older releases without OpenAI-style tool support will fall back to plain chat without tool calls.
  • The ai() script global, the agent inference loop, and AI-powered object detection all transparently use the active provider.
  • Vision works against multimodal models like llava, qwen2-vl, or llama3.2-vision. Use ai("describe this", await camera.snapshot()) from scripts.

What doesn't

  • Per-call usage / credits reporting (cloud-only).
  • The OpenRouter web-search plugin used for SCPI authoring (cloud-only).

Configuration Reference

AI Config (server.json)

FieldTypeDefaultDescription
ai.providerstring"muxit"LLM backend — "muxit", "ollama", "lmstudio", or "openai-compatible"
ai.modelstringprovider defaultActive model id (cloud uses OpenRouter ids; local uses model name)
ai.maxTokensnumber4096Max tokens per response
ai.instructionsstring""Custom AI system prompt
ai.promptProfilestring"standard""minimal" strips device schemas + decision tree from the system prompt — fits 4–8K local-LLM context windows
ai.providers.<id>.baseUrlstringper providerEndpoint URL for local providers
ai.providers.<id>.apiKeystring""Optional bearer token (openai-compatible only)
ai.providers.<id>.modelstringper providerPer-provider default model
ai.access.<connector>.<item>.enabledbooleantrueHide a property/action from AI
ai.access.<connector>.<item>.instructionsstring""Per-property/action AI note

You can also embed AI instructions directly in connector config files using the ai.instructions field — see the Connector Guide for details.

Voice Config (server.json)

FieldTypeDefaultDescription
voice.tts.enabledbooleanfalseEnable text-to-speech for AI chat replies
voice.tts.scriptEnabledbooleantrueSpeak script say() output through the status-strip toggle
voice.wakeWord.enabledbooleanfalseEnable "Muxit" wake word
voice.autoSendbooleantrueAuto-send voice input

Muxit — Hardware Orchestration Platform