Built-ins

tts Plugin

How the tts plugin receives speech synthesis through its constructor and returns audio as assistant file parts

tts Plugin

tts is the text-to-speech plugin. It defines the agent-side protocol only. It does not bind to a local model, Python dependency, or provider.

The real speech synthesis capability is injected through the constructor. The recommended shape is to connect City AI:

import { Agent } from "@downcity/agent";
import { TtsPlugin } from "@downcity/plugins";

const agent = new Agent({
  id: "speech-agent",
  path: "/path/to/project",
  plugins: [
    new TtsPlugin({
      tts: (input) => city.ai.tts(input),
      language: "auto",
      format: "wav",
    }),
  ],
});

Design Boundary

tts is responsible for:

  • exposing the synthesize action
  • calling the constructor-injected tts function
  • normalizing the result into an AI SDK UIMessage
  • letting audio file parts enter the agent resource materialization flow
  • adding system guidance for how the agent should use tts

tts is not responsible for:

  • installing local models
  • choosing a TTS provider
  • managing Python dependencies
  • persisting project config
  • providing CLI management commands

Constructor Options

OptionPurposeDefault
ttsRequired real speech synthesis functionnone
languageDefault language hint, such as auto, zh, or ennone
voiceDefault voice hintnone
formatDefault output format hint, such as wav, mp3, or oggnone
namePlugin nametts
titleDisplay titleTTS
descriptionPlugin descriptionbuilt-in description

Explicit Call

const result = await agent.plugins.runAction({
  plugin: "tts",
  action: "synthesize",
  payload: {
    text: "Hello, welcome to Downcity",
    language: "en",
    format: "wav",
  },
});

text is required. Common optional fields include:

  • language
  • voice
  • format
  • speed
  • provider_options

Result Shape

The injected tts function can return an AI SDK UIMessage directly, or a simple audio result:

return {
  data_url: "data:audio/wav;base64,...",
  media_type: "audio/wav",
  filename: "speech.wav",
};

The plugin normalizes this into a UIMessage with a file part. When called through plugin_call, the agent materializes the audio under .downcity/resources and keeps an Agent-root relative path such as .downcity/resources/... in the final assistant message.

How The Agent Calls It

plugin_call({
  plugin: "tts",
  action: "synthesize",
  payload: {
    text: "...",
  },
});

TTS is an output-generation capability. It does not automatically intercept the message flow. Call it only when the user explicitly wants voice, narration, or an audio file.