Built-ins

asr Plugin

How the asr plugin receives speech transcription through its constructor and writes voice text blocks during chat inbound processing

asr Plugin

asr is the speech-to-text plugin. It defines the agent-side protocol only. It does not bind to a local model, Python dependency, or provider.

The real transcription capability is injected through the constructor. The recommended shape is to connect City AI:

import { Agent } from "@downcity/agent";
import { AsrPlugin } from "@downcity/plugins";

const agent = new Agent({
  id: "voice-agent",
  path: "/path/to/project",
  plugins: [
    new AsrPlugin({
      asr: (input) => city.ai.asr(input),
      auto: true,
      language: "auto",
    }),
  ],
});

Design Boundary

asr is responsible for:

  • exposing the transcribe action
  • listening to inbound chat voice attachments when auto: true
  • appending transcription results into user text
  • adding system guidance for how the agent should use asr

asr is not responsible for:

  • installing local models
  • choosing an ASR provider
  • managing Python dependencies
  • persisting project config
  • providing CLI management commands

Constructor Options

OptionPurposeDefault
asrRequired real transcription functionnone
autoWhether to auto-transcribe inbound chat voice attachmentsfalse
languageDefault language hint, such as auto, zh, or ennone
namePlugin nameasr
titleDisplay titleASR
descriptionPlugin descriptionbuilt-in description

Explicit Call

If you only need to transcribe one audio file, call the action directly:

const result = await agent.plugins.runAction({
  plugin: "asr",
  action: "transcribe",
  payload: {
    audio_path: "/path/to/voice.ogg",
    language: "zh",
  },
});

The payload must provide at least one input source:

  • audio_path
  • url
  • data_url

Automatic Transcription

When auto: true, inbound chat messages with voice or audio attachments are processed at CHAT_PLUGIN_POINTS.augmentInbound.

After successful transcription, asr does not replace the original message. It appends the result to the text body:

<voice src="voice.ogg">Remind me about the meeting tomorrow at 3 PM</voice>

If automatic transcription fails, the main chat flow continues. This keeps a single voice attachment or external ASR failure from blocking the agent.

How The Agent Calls It

The agent still calls asr through the standard plugin action path:

plugin_call({
  plugin: "asr",
  action: "transcribe",
  payload: {
    audio_path: "...",
  },
});

When auto: true is enabled, many voice messages do not need an explicit model-triggered action because the transcript has already been written into the message before it reaches the agent.