Skip to content

AI & Voice

Muxit AI lets you control lab hardware using natural language — via chat or voice in the browser UI. It supports Claude (Anthropic), OpenAI, and Google as LLM providers, with a safety gate that requires user approval before executing device commands.

Ways to Use AI

MethodWhat it isBest for
Chat PanelBrowser sidebar with natural language inputInteractive use in the web UI
MCP ServerModel Context Protocol over HTTP and stdioClaude Code, Claude Desktop, MCP-compatible AI tools

Quick Start

1. Chat Panel (Browser Web UI)

The Chat Panel provides a full AI assistant that can control your devices, read sensors, write scripts, and answer questions — all via natural language (text or voice). It uses an agentic tool-calling loop: the LLM decides which tools to call, executes them, and continues until it has a final answer.

All LLM requests route through OpenRouter's OpenAI-compatible API, so you can use any model available on OpenRouter (Claude, GPT-4, Gemini, etc.).

Step 1: Configure your provider

Click the gear icon (bottom of Activity Bar) to open Settings. Under AI Services, select "openrouter" as your provider and enter your OpenRouter API key.

Or edit workspace/config/server.json directly:

json
{
  "ai": {
    "provider": "openrouter",
    "providers": {
      "openrouter": { "apiKey": "sk-or-v1-YOUR-KEY", "model": "anthropic/claude-sonnet-4", "maxTokens": 4096 }
    },
    "safetyMode": "confirm"
  }
}

Step 2: Start the server

bash
node start.js server       # MuxitServer (serves web UI at http://127.0.0.1:8765)
node start.js server --gui # Auto-open browser

Step 3: Open AI Chat — Click the chat icon in the Activity Bar.

2. MCP Server

The MCP server is built directly into MuxitServer (C#), exposing device control tools and resources over the Model Context Protocol. It supports both HTTP and stdio transports.

HTTP transport — Available at /mcp on the running server (e.g. http://127.0.0.1:8765/mcp). Starts automatically with the server.

Stdio transport — For Claude Code and Claude Desktop integration:

bash
node start.js mcp                    # Via start script
dotnet run --project MuxitServer -- --mcp  # Direct

Claude Code — Already configured in .claude/.mcp.json:

json
{
  "mcpServers": {
    "muxit": {
      "type": "stdio",
      "command": "dotnet",
      "args": ["run", "--project", "MuxitServer", "--", "--mcp", "--workspace", "workspace"]
    }
  }
}

Claude Desktop — Add to claude_desktop_config.json:

json
{
  "mcpServers": {
    "muxit": {
      "command": "dotnet",
      "args": ["run", "--project", "/path/to/MuxitServer", "--", "--mcp", "--workspace", "/path/to/workspace"]
    }
  }
}

MCP Tools

ToolDescriptionCategory
list_connectorsList all devices and capabilitiesDevice Control
get_connector_schemaGet full schema for a deviceDevice Control
read_propertyRead a device propertyDevice Control
write_propertySet a device propertyDevice Control
call_actionExecute a device actionDevice Control
get_device_stateSnapshot of all cached stateDevice Control
list_scriptsList available and running scriptsScripts
run_scriptStart a named scriptScripts
run_codeExecute inline JavaScript in sandboxScripts
stop_scriptStop a running scriptScripts
read_script_sourceRead a script's source codeFiles
write_scriptCreate or update a script fileFiles
read_connector_configRead a connector config fileFiles
write_connector_configCreate or update a connector configFiles
list_driversList available driversSystem
get_driver_schemaGet full schema for a specific driverSystem
get_server_configGet server configuration (redacted)System
take_snapshotCapture a camera image for visual analysisVision
list_agentsList agent configs and running instancesAgents
start_agentStart an agent with a goalAgents
stop_agentStop a running agentAgents
get_agent_statusGet current status of an agentAgents
pause_agentPause a running agentAgents
resume_agentResume a paused agentAgents

MCP Resources

URIDescription
muxit://connectorsList of all connectors with schemas
muxit://connector/{name}/schemaSchema for a specific connector
muxit://stateCurrent device state snapshot

Safety Modes

ModeBehaviorConfig
confirm (default)Every device write/action requires user approval"safetyMode": "confirm"
trustAll tool calls execute immediately"safetyMode": "trust"

In confirm mode, the Chat Panel shows an approval dialog with the tool name, parameters, and Allow/Deny buttons.


AI Chat Tools

The Chat Panel uses the same tools as the MCP server. The AI assistant decides which tools to call based on your request. In confirm mode, each tool call shows an approval dialog before executing.

Vision (Camera Snapshots)

The take_snapshot tool captures a single frame from any camera connector (USB webcam, IP camera) and returns it as an image content block. The LLM can then see and describe the image — useful for checking robot positions, reading instrument displays, or verifying alignments.

Requirements:

  • A camera connector must be configured and initialized
  • The LLM model must support vision (Claude Sonnet/Opus, GPT-4o, Gemini)

Example prompts:

  • "Take a snapshot from the webcam and tell me what you see"
  • "Is the robot arm aligned with the target? Check the camera"
  • "What's the reading on the oscilloscope display?"

Voice Commands

Voice is a core Muxit feature — control your devices by speaking. The Chat Panel header includes a mic button with a dropdown for voice settings.

Two Modes

ModeHow it works
Push to Talk (PTT)Hold the mic button to talk. Release to stop. Your speech is transcribed and sent as a chat message.
Hands-freeClick the mic button to toggle listening. Behavior depends on wake word setting.

Hands-free sub-modes (configured in Settings > Voice > Wake Word):

Wake Word SettingHands-free Behavior
EnabledSay "Muxit" to start a conversation. Muxit listens, sends your command, speaks the AI response (if TTS is on), then automatically resumes listening — no need to say the wake word again. The conversation ends after an adjustable silence timeout (default 10s).
DisabledMuxit listens continuously. When you pause speaking, the transcript is automatically sent.

Conversation Loop

When both wake word and TTS are enabled in hands-free mode, Muxit supports a natural spoken conversation:

  1. Say "Muxit" to start the conversation
  2. Speak your command — it's transcribed and sent to the AI
  3. The AI responds and TTS speaks the response
  4. Muxit automatically starts listening again (no wake word needed)
  5. Continue the conversation naturally
  6. When you stop talking, the conversation ends after a configurable idle timeout

Idle timeout: Adjust how long Muxit waits for you to speak before ending the conversation. Open the mic dropdown (▾ next to the mic button) and use the Idle slider (3–30 seconds, default 10s). You can also manually end the conversation by clicking the Stop button.

Text-to-Speech

Toggle TTS on/off with the speaker icon in the chat header. When enabled, AI responses are spoken aloud using browser speech synthesis. The AI automatically keeps responses short and voice-friendly.

Configure voice, rate, and pitch in Settings > Voice > Text-to-Speech.

Wake Word

Say "Muxit" to start a voice conversation hands-free. Enable in Settings > Voice > Wake Word.

Voice uses browser speech recognition (Web Speech API). Works in Chrome and Edge.


AI in Scripts

Scripts can make single-shot LLM calls using the ai() global:

javascript
// Text-only query
const answer = await ai("Classify this reading as normal or abnormal: 47.3 ohms");
log.info(answer);

// Vision: analyze a camera image
const cam = connector('webcam');
const frame = await cam.snapshot();
const description = await ai("Describe what you see in this image", frame);
log.info(description);

ai(prompt, image?) is a single-shot call — no conversation memory, no tool access, no agentic loop.


Agent Mode

Agents are autonomous AI instances that can coordinate multiple devices to accomplish goals. Unlike the chat loop (which responds to single messages), agents are persistent, goal-oriented, and can react to device events.

Quick Start

From chat:

"Pick up the red part and place it in the tray"

From saved config (workspace/agents/*.agent.json):

json
{
    "name": "pick-and-place",
    "description": "Pick a part and place it in the tray",
    "devices": ["robot", "camera"],
    "autonomy": "supervised",
    "safety": {
        "maxSpeed": 50,
        "workspace": { "x": [0, 600], "y": [-300, 300], "z": [50, 400] },
        "maxForce": 20,
        "requireVisionConfirm": true
    },
    "instructions": "Always approach from above. Verify grip before lifting.",
    "parameters": {
        "partColor": { "type": "string", "default": "red" }
    }
}

Autonomy Levels

LevelBehaviorBest For
supervisedEach action shown to user in real-time, proceeds unless stoppedFirst-time tasks, dangerous ops
plan-approveAgent creates plan, user approves, then executes freelyRepetitive tasks
guardrailsRuns freely within safety boundaries, pauses only on limit violationTrusted tasks
fullNo approval needed (safety limits still enforced)Well-tested automation

Safety Boundaries

Enforced regardless of autonomy level:

BoundaryConfig KeyDescription
Workspace limitssafety.workspaceBounding box the robot cannot leave
Speed limitssafety.maxSpeedMax velocity as % of device max
Force limitssafety.maxForceMax force (N) before emergency stop
Rate limitssafety.maxActionsPerMinutePrevent hammering hardware
Vision confirmsafety.requireVisionConfirmCamera snapshot before/after critical actions

Event Triggers

Agents can start automatically when device conditions are met:

json
{
    "triggers": [
        {
            "event": "state:psu.temperature",
            "condition": "gt:80",
            "cooldownSeconds": 120,
            "goal": "Temperature too high — reduce power"
        }
    ]
}

Condition formats: gt:80, lt:10, eq:true, neq:off, changed

Per-Agent Model Selection (Cost Optimization)

Each agent can override the global LLM model and token limit. Simple agents (monitoring, condition checking) can use cheap models while complex agents (vision-guided manipulation) use premium models.

json
{
    "name": "temperature-monitor",
    "devices": ["test-device"],
    "model": "google/gemini-2.5-flash",
    "maxTokens": 1024,
    "maxIterations": 30
}

Cost tiers on OpenRouter:

TierModels~Cost/MTok
$ (cheap)google/gemini-2.5-flash, deepseek/deepseek-chat-v3-0324, openai/gpt-4o-mini$0.10–0.30
$$ (mid)anthropic/claude-haiku-4, google/gemini-2.5-pro, openai/gpt-4o$1–5
$$$ (premium)anthropic/claude-sonnet-4$5–15

If model is omitted, the agent uses the global model from Settings. Anthropic models automatically use prompt caching for repeated system prompts (~50% input token savings in agent loops).


When to Use What

ApproachBest ForHow It Works
Chat PanelInteractive questions, one-off commandsYou type, AI responds with tool calls
Scripts with ai()Automated decisions, classification, visionScript calls ai() for a single LLM response
AgentsMulti-step autonomous goals, reactive automationPersistent AI loop: plan, act, observe, re-plan
MCP ToolsExternal AI tools (Claude Code/Desktop)AI calls Muxit tools via MCP

AI Memory

Muxit AI remembers facts and preferences across sessions. Tell the AI "remember that..." to save a memory, or "forget that..." to remove one.

Memories are stored locally in workspace/config/ai-memory.json. Click the M button in the Chat Panel header to view, edit, or delete memories.

CategoryExamples
preference"User prefers metric units"
device"The PSU on the left bench is named 'main-psu'"
procedure"Calibration: reset PSU, set 5V, wait 10s, read"
note"Don't run scripts during backup (2-3am)"

Vision-Guided Robot Control

Muxit supports closed-loop visual servoing — using camera feedback to guide robot movements in real time. The system uses a two-speed approach:

  • Fast local CV (VisionDriver + OpenCV): Runs color/contour detection at camera frame rate for real-time tracking. No LLM calls, no latency.
  • Slow LLM vision (take_snapshot + AI): Captures a frame and sends it to the LLM for high-level scene understanding, object identification, and planning. Used for initial assessment and verification, not tight control loops.

The combination lets agents plan with full visual intelligence but execute with the speed of classical computer vision.

Setting Up a Vision Connector

Create a connector config that uses the Vision driver with a webcam source:

js
// workspace/connectors/eye.js
export default {
  driver: "Vision",
  config: { source: "webcam" },
  properties: {
    frame: c => c.frame(),
    detections: c => c.detections()
  },
  methods: {
    detectColor: [c => c.detectColor(), "Run HSV color detection"],
    detectContours: [c => c.detectContours(), "Run contour detection"]
  },
  poll: ["detections"]
};

The source can be "webcam" (default USB camera) or a camera connector name for IP cameras.

Tracker Types

TrackerHow It WorksBest For
color (HSV)Filters pixels by hue/saturation/value range, finds centroidsTracking brightly colored objects (red ball, green LED)
contourEdge detection + contour finding, returns bounding boxes and areasTracking shapes regardless of color (parts, tools, containers)

Both trackers run locally via OpenCV with no LLM calls, providing detections at frame rate.

The visual_servo Tool

The visual_servo MCP/chat tool runs a closed-loop control cycle:

  1. Detect — VisionDriver captures a frame and runs the configured tracker
  2. Compute error — Compares detected object position to the target position
  3. Move — Sends a small incremental move command to the robot connector
  4. Repeat — Loops until the error is below threshold or max iterations reached

The tool accepts parameters for the vision connector, robot connector, tracker type, target coordinates, and tolerance. Safety boundaries from the agent config are enforced on every move.

Example Flow: "Put the Red Ball in the Basket"

  1. User says: "Put the red ball in the basket"
  2. Agent plans (LLM): Takes a snapshot, identifies the red ball and basket positions
  3. Agent moves above ball (robot connector): Sends move command to approach position
  4. Visual servo engages (fast CV loop): Color tracker locks onto the red ball by HSV, servo loop centers the gripper over it with sub-millimeter corrections
  5. Agent grips: Lowers Z, closes gripper
  6. Agent moves above basket (robot connector): Navigates to basket approach position
  7. Visual servo for placement (fast CV loop): Contour tracker locks onto the basket, servo loop centers the payload
  8. Agent releases: Lowers Z, opens gripper
  9. Verification snapshot (LLM vision): Takes a final snapshot to confirm the ball is in the basket

Steps 4 and 7 run at frame rate with no LLM calls. Steps 2 and 9 use the LLM for scene understanding.

Agent Config Reference (vision-robot.agent.json)

json
{
    "name": "vision-robot",
    "description": "Vision-guided pick and place with visual servoing",
    "devices": ["robot", "eye"],
    "autonomy": "supervised",
    "safety": {
        "maxSpeed": 30,
        "workspace": { "x": [0, 500], "y": [-250, 250], "z": [20, 350] },
        "maxForce": 15,
        "requireVisionConfirm": true
    },
    "instructions": "Use the eye connector for visual servoing. Always verify object detection before moving. Use color tracker for colored objects, contour tracker for shapes. Take a verification snapshot after placement.",
    "parameters": {
        "trackerType": { "type": "string", "default": "color", "enum": ["color", "contour"] },
        "servoTolerance": { "type": "number", "default": 5, "description": "Pixel error threshold for servo convergence" }
    }
}

Place this file in workspace/agents/vision-robot.agent.json. The agent can be started from chat ("start the vision-robot agent") or triggered by events.


Configuration Reference

AI Config (server.json)

FieldTypeDefaultDescription
ai.providerstring"openrouter"Active LLM provider
ai.providers.<name>.apiKeystring""API key
ai.providers.<name>.modelstringper-providerModel identifier
ai.providers.<name>.maxTokensnumber4096Max tokens per response
ai.safetyMode"confirm" / "trust""confirm"Tool call approval mode
ai.instructionsstring""Custom AI system prompt
ai.access.<connector>.<item>.enabledbooleantrueHide a property/action from AI
ai.access.<connector>.<item>.instructionsstring""Per-property/action AI note

You can also embed AI instructions directly in connector config files using the ai.instructions field — see the Connector Guide for details.

Voice Config (server.json)

FieldTypeDefaultDescription
voice.tts.enabledbooleanfalseEnable text-to-speech
voice.wakeWord.enabledbooleanfalseEnable "Muxit" wake word
voice.autoSendbooleantrueAuto-send voice input

Muxit — Hardware Orchestration Platform