Skip to content

Latest commit

 

History

History
270 lines (230 loc) · 8.64 KB

File metadata and controls

270 lines (230 loc) · 8.64 KB

MCP (Model Context Protocol) Interaction Flow

NOTICE: This document was AI-assisted; when implementing a backend, always cross-check the details against the code.

In this project, MCP is used between the backend API (MCP client) and the ESP32 device (MCP server) to let the backend discover and invoke the device's capabilities (tools).

Message Format

From main/protocols/protocol.cc and main/mcp_server.cc, MCP messages are wrapped inside the underlying transport (WebSocket or MQTT). The inner payload follows the JSON-RPC 2.0 specification.

Overall message layout:

{
  "session_id": "...",   // session id
  "type": "mcp",         // fixed value "mcp"
  "payload": {           // JSON-RPC 2.0 payload
    "jsonrpc": "2.0",
    "method": "...",     // method name ("initialize", "tools/list", "tools/call", ...)
    "params": { ... },   // arguments (for requests)
    "id": ...,           // request id (for requests and responses)
    "result": { ... },   // success result (response)
    "error": { ... }     // error (response)
  }
}

The payload follows standard JSON-RPC 2.0:

  • jsonrpc: always "2.0".
  • method: the method name (requests).
  • params: structured parameters, usually an object (requests).
  • id: request identifier; echoed back in responses.
  • result: success value (responses).
  • error: error information (responses).

Interaction Flow

MCP interactions are driven by the client (backend) discovering and invoking tools on the device.

  1. Connection and capability announcement

    • When: after the device boots and connects to the backend.
    • Direction: device -> backend.
    • Message: the device sends the transport hello, advertising supported capabilities. MCP support is signaled via "mcp": true in the features map.
    • Example (transport hello, not an MCP payload):
      {
        "type": "hello",
        "version": 1,
        "features": {
          "mcp": true
        },
        "transport": "websocket",
        "audio_params": { ... },
        "session_id": "..."
      }
  2. Initialize the MCP session

    • When: after the backend sees that the device supports MCP. Usually the first MCP request.

    • Direction: backend -> device.

    • Method: initialize

    • Message (MCP payload):

      {
        "jsonrpc": "2.0",
        "method": "initialize",
        "params": {
          "capabilities": {
            // optional client capabilities
            "vision": {
              "url": "...",   // camera image upload endpoint (must be an http URL, not a websocket URL)
              "token": "..."  // token for the upload URL
            }
            // ... other client capabilities
          }
        },
        "id": 1
      }
    • Device response:

      {
        "jsonrpc": "2.0",
        "id": 1,
        "result": {
          "protocolVersion": "2024-11-05",
          "capabilities": {
            "tools": {}
          },
          "serverInfo": {
            "name": "...",    // device name (BOARD_NAME)
            "version": "..."  // firmware version
          }
        }
      }
  3. Discover the tools

    • When: whenever the backend needs the list of callable tools and their signatures.
    • Direction: backend -> device.
    • Method: tools/list
    • Request parameters:
      • cursor (string, optional): pagination cursor. Empty on the first request.
      • withUserTools (boolean, optional, default false): if true, the device also includes "user-only" tools (see "User-only tools" below) in the listing. This is typically used by a companion app that lets the user trigger privileged actions directly.
    • Message (MCP payload):
      {
        "jsonrpc": "2.0",
        "method": "tools/list",
        "params": {
          "cursor": "",
          "withUserTools": false
        },
        "id": 2
      }
    • Device response:
      {
        "jsonrpc": "2.0",
        "id": 2,
        "result": {
          "tools": [
            {
              "name": "self.get_device_status",
              "description": "...",
              "inputSchema": { ... }
            },
            {
              "name": "self.audio_speaker.set_volume",
              "description": "...",
              "inputSchema": { ... }
            }
            // ... more tools
          ],
          "nextCursor": "..."
        }
      }
    • Pagination: when nextCursor is non-empty, the backend must send another tools/list request with that cursor to fetch the next page.
  4. Call a tool

    • When: the backend wants to execute a specific device function.
    • Direction: backend -> device.
    • Method: tools/call
    • Message (MCP payload):
      {
        "jsonrpc": "2.0",
        "method": "tools/call",
        "params": {
          "name": "self.audio_speaker.set_volume",
          "arguments": {
            "volume": 50
          }
        },
        "id": 3
      }
    • Successful response:
      {
        "jsonrpc": "2.0",
        "id": 3,
        "result": {
          "content": [
            { "type": "text", "text": "true" }
          ],
          "isError": false
        }
      }
    • Error response:
      {
        "jsonrpc": "2.0",
        "id": 3,
        "error": {
          "code": -32601,
          "message": "Unknown tool: self.non_existent_tool"
        }
      }
  5. Device-initiated notifications

    • When: the device wants to inform the backend of internal events (e.g. state transitions). Application::SendMcpMessage is the outbound entry point.
    • Direction: device -> backend.
    • Method: conventionally notifications/... or any custom method.
    • Message (MCP payload): JSON-RPC notifications have no id.
      {
        "jsonrpc": "2.0",
        "method": "notifications/state_changed",
        "params": {
          "newState": "idle",
          "oldState": "connecting"
        }
      }
    • Backend handling: process the notification without replying.

User-only Tools

The MCP server on the device maintains two kinds of tools:

  • Regular tools - registered via McpServer::AddTool. Exposed to the backend (and hence the AI model) by default.
  • User-only tools - registered via McpServer::AddUserOnlyTool. These are hidden from standard tools/list results, because they are privileged or user-facing actions that should not be invoked autonomously by the AI. Examples include system reboot, firmware upgrade, and screen snapshot upload.

The backend opts in to user-only tools by sending tools/list with params.withUserTools = true. Typical usage: a companion app screen that exposes these actions to the end user.

See MCP IoT control usage for how to register either kind of tool on the device side.

Sequence Diagram

A simplified diagram of the main MCP message flow:

sequenceDiagram
    participant Device as ESP32 Device
    participant BackendAPI as Backend API (Client)

    Note over Device, BackendAPI: Establish WebSocket / MQTT

    Device->>BackendAPI: Hello (features.mcp = true)

    BackendAPI->>Device: MCP Initialize request
    Note over BackendAPI: method: initialize
    Note over BackendAPI: params: { capabilities: ... }

    Device->>BackendAPI: MCP Initialize response
    Note over Device: result: { protocolVersion, serverInfo, ... }

    BackendAPI->>Device: MCP tools/list request
    Note over BackendAPI: params: { cursor: "", withUserTools: false }

    Device->>BackendAPI: MCP tools/list response
    Note over Device: result: { tools: [...], nextCursor: ... }

    loop Optional pagination
        BackendAPI->>Device: MCP tools/list request
        Note over BackendAPI: params: { cursor: "..." }
        Device->>BackendAPI: MCP tools/list response
        Note over Device: result: { tools: [...], nextCursor: "" }
    end

    BackendAPI->>Device: MCP tools/call request
    Note over BackendAPI: params: { name, arguments }

    alt Call succeeds
        Device->>BackendAPI: MCP tools/call success response
        Note over Device: result: { content, isError: false }
    else Call fails
        Device->>BackendAPI: MCP tools/call error response
        Note over Device: error: { code, message }
    end

    opt Device notification
        Device->>BackendAPI: MCP notification
        Note over Device: method: notifications/...
    end
Loading

This document summarizes the MCP interaction flow in this project. For exact parameter shapes, behavior, and available tools, refer to McpServer::AddCommonTools / AddUserOnlyTools in main/mcp_server.cc and the per-board InitializeTools implementations.