AI & Voice
Muxit AI lets you control lab hardware using natural language — via chat or voice in the browser UI. It supports Claude (Anthropic), OpenAI, and Google as LLM providers, with a safety gate that requires user approval before executing device commands.
Ways to Use AI
| Method | What it is | Best for |
|---|---|---|
| Chat Panel | Browser sidebar with natural language input | Interactive use in the web UI |
| MCP Server | Model Context Protocol over HTTP and stdio | Claude Code, Claude Desktop, MCP-compatible AI tools |
Quick Start
1. Chat Panel (Browser Web UI)
The Chat Panel provides a full AI assistant that can control your devices, read sensors, write scripts, and answer questions — all via natural language (text or voice). It uses an agentic tool-calling loop: the LLM decides which tools to call, executes them, and continues until it has a final answer.
All LLM requests route through OpenRouter's OpenAI-compatible API, so you can use any model available on OpenRouter (Claude, GPT-4, Gemini, etc.).
Step 1: Configure your provider
Click the gear icon (bottom of Activity Bar) to open Settings. Under AI Services, select "openrouter" as your provider and enter your OpenRouter API key.
Or edit workspace/config/server.json directly:
{
"ai": {
"provider": "openrouter",
"providers": {
"openrouter": { "apiKey": "sk-or-v1-YOUR-KEY", "model": "anthropic/claude-sonnet-4", "maxTokens": 4096 }
},
"safetyMode": "confirm"
}
}Step 2: Start the server
node start.js server # MuxitServer (serves web UI at http://127.0.0.1:8765)
node start.js server --gui # Auto-open browserStep 3: Open AI Chat — Click the chat icon in the Activity Bar.
2. MCP Server
The MCP server is built directly into MuxitServer (C#), exposing device control tools and resources over the Model Context Protocol. It supports both HTTP and stdio transports.
HTTP transport — Available at /mcp on the running server (e.g. http://127.0.0.1:8765/mcp). Starts automatically with the server.
Stdio transport — For Claude Code and Claude Desktop integration:
node start.js mcp # Via start script
dotnet run --project MuxitServer -- --mcp # DirectClaude Code — Already configured in .claude/.mcp.json:
{
"mcpServers": {
"muxit": {
"type": "stdio",
"command": "dotnet",
"args": ["run", "--project", "MuxitServer", "--", "--mcp", "--workspace", "workspace"]
}
}
}Claude Desktop — Add to claude_desktop_config.json:
{
"mcpServers": {
"muxit": {
"command": "dotnet",
"args": ["run", "--project", "/path/to/MuxitServer", "--", "--mcp", "--workspace", "/path/to/workspace"]
}
}
}MCP Tools
| Tool | Description | Category |
|---|---|---|
list_connectors | List all devices and capabilities | Device Control |
get_connector_schema | Get full schema for a device | Device Control |
read_property | Read a device property | Device Control |
write_property | Set a device property | Device Control |
call_action | Execute a device action | Device Control |
get_device_state | Snapshot of all cached state | Device Control |
list_scripts | List available and running scripts | Scripts |
run_script | Start a named script | Scripts |
run_code | Execute inline JavaScript in sandbox | Scripts |
stop_script | Stop a running script | Scripts |
read_script_source | Read a script's source code | Files |
write_script | Create or update a script file | Files |
read_connector_config | Read a connector config file | Files |
write_connector_config | Create or update a connector config | Files |
list_drivers | List available drivers | System |
get_driver_schema | Get full schema for a specific driver | System |
get_server_config | Get server configuration (redacted) | System |
take_snapshot | Capture a camera image for visual analysis | Vision |
list_agents | List agent configs and running instances | Agents |
start_agent | Start an agent with a goal | Agents |
stop_agent | Stop a running agent | Agents |
get_agent_status | Get current status of an agent | Agents |
pause_agent | Pause a running agent | Agents |
resume_agent | Resume a paused agent | Agents |
MCP Resources
| URI | Description |
|---|---|
muxit://connectors | List of all connectors with schemas |
muxit://connector/{name}/schema | Schema for a specific connector |
muxit://state | Current device state snapshot |
Safety Modes
| Mode | Behavior | Config |
|---|---|---|
| confirm (default) | Every device write/action requires user approval | "safetyMode": "confirm" |
| trust | All tool calls execute immediately | "safetyMode": "trust" |
In confirm mode, the Chat Panel shows an approval dialog with the tool name, parameters, and Allow/Deny buttons.
AI Chat Tools
The Chat Panel uses the same tools as the MCP server. The AI assistant decides which tools to call based on your request. In confirm mode, each tool call shows an approval dialog before executing.
Vision (Camera Snapshots)
The take_snapshot tool captures a single frame from any camera connector (USB webcam, IP camera) and returns it as an image content block. The LLM can then see and describe the image — useful for checking robot positions, reading instrument displays, or verifying alignments.
Requirements:
- A camera connector must be configured and initialized
- The LLM model must support vision (Claude Sonnet/Opus, GPT-4o, Gemini)
Example prompts:
- "Take a snapshot from the webcam and tell me what you see"
- "Is the robot arm aligned with the target? Check the camera"
- "What's the reading on the oscilloscope display?"
Voice Commands
Voice is a core Muxit feature — control your devices by speaking. The Chat Panel header includes a mic button with a dropdown for voice settings.
Two Modes
| Mode | How it works |
|---|---|
| Push to Talk (PTT) | Hold the mic button to talk. Release to stop. Your speech is transcribed and sent as a chat message. |
| Hands-free | Click the mic button to toggle listening. Behavior depends on wake word setting. |
Hands-free sub-modes (configured in Settings > Voice > Wake Word):
| Wake Word Setting | Hands-free Behavior |
|---|---|
| Enabled | Say "Muxit" to start a conversation. Muxit listens, sends your command, speaks the AI response (if TTS is on), then automatically resumes listening — no need to say the wake word again. The conversation ends after an adjustable silence timeout (default 10s). |
| Disabled | Muxit listens continuously. When you pause speaking, the transcript is automatically sent. |
Conversation Loop
When both wake word and TTS are enabled in hands-free mode, Muxit supports a natural spoken conversation:
- Say "Muxit" to start the conversation
- Speak your command — it's transcribed and sent to the AI
- The AI responds and TTS speaks the response
- Muxit automatically starts listening again (no wake word needed)
- Continue the conversation naturally
- When you stop talking, the conversation ends after a configurable idle timeout
Idle timeout: Adjust how long Muxit waits for you to speak before ending the conversation. Open the mic dropdown (▾ next to the mic button) and use the Idle slider (3–30 seconds, default 10s). You can also manually end the conversation by clicking the Stop button.
Text-to-Speech
Toggle TTS on/off with the speaker icon in the chat header. When enabled, AI responses are spoken aloud using browser speech synthesis. The AI automatically keeps responses short and voice-friendly.
Configure voice, rate, and pitch in Settings > Voice > Text-to-Speech.
Wake Word
Say "Muxit" to start a voice conversation hands-free. Enable in Settings > Voice > Wake Word.
Voice uses browser speech recognition (Web Speech API). Works in Chrome and Edge.
AI in Scripts
Scripts can make single-shot LLM calls using the ai() global:
// Text-only query
const answer = await ai("Classify this reading as normal or abnormal: 47.3 ohms");
log.info(answer);
// Vision: analyze a camera image
const cam = connector('webcam');
const frame = await cam.snapshot();
const description = await ai("Describe what you see in this image", frame);
log.info(description);ai(prompt, image?) is a single-shot call — no conversation memory, no tool access, no agentic loop.
Agent Mode
Agents are autonomous AI instances that can coordinate multiple devices to accomplish goals. Unlike the chat loop (which responds to single messages), agents are persistent, goal-oriented, and can react to device events.
Quick Start
From chat:
"Pick up the red part and place it in the tray"From saved config (workspace/agents/*.agent.json):
{
"name": "pick-and-place",
"description": "Pick a part and place it in the tray",
"devices": ["robot", "camera"],
"autonomy": "supervised",
"safety": {
"maxSpeed": 50,
"workspace": { "x": [0, 600], "y": [-300, 300], "z": [50, 400] },
"maxForce": 20,
"requireVisionConfirm": true
},
"instructions": "Always approach from above. Verify grip before lifting.",
"parameters": {
"partColor": { "type": "string", "default": "red" }
}
}Autonomy Levels
| Level | Behavior | Best For |
|---|---|---|
| supervised | Each action shown to user in real-time, proceeds unless stopped | First-time tasks, dangerous ops |
| plan-approve | Agent creates plan, user approves, then executes freely | Repetitive tasks |
| guardrails | Runs freely within safety boundaries, pauses only on limit violation | Trusted tasks |
| full | No approval needed (safety limits still enforced) | Well-tested automation |
Safety Boundaries
Enforced regardless of autonomy level:
| Boundary | Config Key | Description |
|---|---|---|
| Workspace limits | safety.workspace | Bounding box the robot cannot leave |
| Speed limits | safety.maxSpeed | Max velocity as % of device max |
| Force limits | safety.maxForce | Max force (N) before emergency stop |
| Rate limits | safety.maxActionsPerMinute | Prevent hammering hardware |
| Vision confirm | safety.requireVisionConfirm | Camera snapshot before/after critical actions |
Event Triggers
Agents can start automatically when device conditions are met:
{
"triggers": [
{
"event": "state:psu.temperature",
"condition": "gt:80",
"cooldownSeconds": 120,
"goal": "Temperature too high — reduce power"
}
]
}Condition formats: gt:80, lt:10, eq:true, neq:off, changed
Per-Agent Model Selection (Cost Optimization)
Each agent can override the global LLM model and token limit. Simple agents (monitoring, condition checking) can use cheap models while complex agents (vision-guided manipulation) use premium models.
{
"name": "temperature-monitor",
"devices": ["test-device"],
"model": "google/gemini-2.5-flash",
"maxTokens": 1024,
"maxIterations": 30
}Cost tiers on OpenRouter:
| Tier | Models | ~Cost/MTok |
|---|---|---|
| $ (cheap) | google/gemini-2.5-flash, deepseek/deepseek-chat-v3-0324, openai/gpt-4o-mini | $0.10–0.30 |
| $$ (mid) | anthropic/claude-haiku-4, google/gemini-2.5-pro, openai/gpt-4o | $1–5 |
| $$$ (premium) | anthropic/claude-sonnet-4 | $5–15 |
If model is omitted, the agent uses the global model from Settings. Anthropic models automatically use prompt caching for repeated system prompts (~50% input token savings in agent loops).
When to Use What
| Approach | Best For | How It Works |
|---|---|---|
| Chat Panel | Interactive questions, one-off commands | You type, AI responds with tool calls |
Scripts with ai() | Automated decisions, classification, vision | Script calls ai() for a single LLM response |
| Agents | Multi-step autonomous goals, reactive automation | Persistent AI loop: plan, act, observe, re-plan |
| MCP Tools | External AI tools (Claude Code/Desktop) | AI calls Muxit tools via MCP |
AI Memory
Muxit AI remembers facts and preferences across sessions. Tell the AI "remember that..." to save a memory, or "forget that..." to remove one.
Memories are stored locally in workspace/config/ai-memory.json. Click the M button in the Chat Panel header to view, edit, or delete memories.
| Category | Examples |
|---|---|
preference | "User prefers metric units" |
device | "The PSU on the left bench is named 'main-psu'" |
procedure | "Calibration: reset PSU, set 5V, wait 10s, read" |
note | "Don't run scripts during backup (2-3am)" |
Vision-Guided Robot Control
Muxit supports closed-loop visual servoing — using camera feedback to guide robot movements in real time. The system uses a two-speed approach:
- Fast local CV (VisionDriver + OpenCV): Runs color/contour detection at camera frame rate for real-time tracking. No LLM calls, no latency.
- Slow LLM vision (
take_snapshot+ AI): Captures a frame and sends it to the LLM for high-level scene understanding, object identification, and planning. Used for initial assessment and verification, not tight control loops.
The combination lets agents plan with full visual intelligence but execute with the speed of classical computer vision.
Setting Up a Vision Connector
Create a connector config that uses the Vision driver with a webcam source:
// workspace/connectors/eye.js
export default {
driver: "Vision",
config: { source: "webcam" },
properties: {
frame: c => c.frame(),
detections: c => c.detections()
},
methods: {
detectColor: [c => c.detectColor(), "Run HSV color detection"],
detectContours: [c => c.detectContours(), "Run contour detection"]
},
poll: ["detections"]
};The source can be "webcam" (default USB camera) or a camera connector name for IP cameras.
Tracker Types
| Tracker | How It Works | Best For |
|---|---|---|
| color (HSV) | Filters pixels by hue/saturation/value range, finds centroids | Tracking brightly colored objects (red ball, green LED) |
| contour | Edge detection + contour finding, returns bounding boxes and areas | Tracking shapes regardless of color (parts, tools, containers) |
Both trackers run locally via OpenCV with no LLM calls, providing detections at frame rate.
The visual_servo Tool
The visual_servo MCP/chat tool runs a closed-loop control cycle:
- Detect — VisionDriver captures a frame and runs the configured tracker
- Compute error — Compares detected object position to the target position
- Move — Sends a small incremental move command to the robot connector
- Repeat — Loops until the error is below threshold or max iterations reached
The tool accepts parameters for the vision connector, robot connector, tracker type, target coordinates, and tolerance. Safety boundaries from the agent config are enforced on every move.
Example Flow: "Put the Red Ball in the Basket"
- User says: "Put the red ball in the basket"
- Agent plans (LLM): Takes a snapshot, identifies the red ball and basket positions
- Agent moves above ball (robot connector): Sends move command to approach position
- Visual servo engages (fast CV loop): Color tracker locks onto the red ball by HSV, servo loop centers the gripper over it with sub-millimeter corrections
- Agent grips: Lowers Z, closes gripper
- Agent moves above basket (robot connector): Navigates to basket approach position
- Visual servo for placement (fast CV loop): Contour tracker locks onto the basket, servo loop centers the payload
- Agent releases: Lowers Z, opens gripper
- Verification snapshot (LLM vision): Takes a final snapshot to confirm the ball is in the basket
Steps 4 and 7 run at frame rate with no LLM calls. Steps 2 and 9 use the LLM for scene understanding.
Agent Config Reference (vision-robot.agent.json)
{
"name": "vision-robot",
"description": "Vision-guided pick and place with visual servoing",
"devices": ["robot", "eye"],
"autonomy": "supervised",
"safety": {
"maxSpeed": 30,
"workspace": { "x": [0, 500], "y": [-250, 250], "z": [20, 350] },
"maxForce": 15,
"requireVisionConfirm": true
},
"instructions": "Use the eye connector for visual servoing. Always verify object detection before moving. Use color tracker for colored objects, contour tracker for shapes. Take a verification snapshot after placement.",
"parameters": {
"trackerType": { "type": "string", "default": "color", "enum": ["color", "contour"] },
"servoTolerance": { "type": "number", "default": 5, "description": "Pixel error threshold for servo convergence" }
}
}Place this file in workspace/agents/vision-robot.agent.json. The agent can be started from chat ("start the vision-robot agent") or triggered by events.
Configuration Reference
AI Config (server.json)
| Field | Type | Default | Description |
|---|---|---|---|
ai.provider | string | "openrouter" | Active LLM provider |
ai.providers.<name>.apiKey | string | "" | API key |
ai.providers.<name>.model | string | per-provider | Model identifier |
ai.providers.<name>.maxTokens | number | 4096 | Max tokens per response |
ai.safetyMode | "confirm" / "trust" | "confirm" | Tool call approval mode |
ai.instructions | string | "" | Custom AI system prompt |
ai.access.<connector>.<item>.enabled | boolean | true | Hide a property/action from AI |
ai.access.<connector>.<item>.instructions | string | "" | Per-property/action AI note |
You can also embed AI instructions directly in connector config files using the ai.instructions field — see the Connector Guide for details.
Voice Config (server.json)
| Field | Type | Default | Description |
|---|---|---|---|
voice.tts.enabled | boolean | false | Enable text-to-speech |
voice.wakeWord.enabled | boolean | false | Enable "Muxit" wake word |
voice.autoSend | boolean | true | Auto-send voice input |