How the tts plugin receives speech synthesis through its constructor and returns audio as assistant file parts

tts Plugin

tts is the text-to-speech plugin. It defines the agent-side protocol only. It does not bind to a local model, Python dependency, or provider.

The real speech synthesis capability is injected through the constructor. The recommended shape is to connect City AI:

import { Agent } from "@downcity/agent";
import { TtsPlugin } from "@downcity/plugins";

const agent = new Agent({
  id: "speech-agent",
  path: "/path/to/project",
  plugins: [
    new TtsPlugin({
      tts: (input) => city.ai.tts(input),
      language: "auto",
      format: "wav",
    }),
  ],
});

Design Boundary

tts is responsible for:

exposing the synthesize action
calling the constructor-injected tts function
normalizing the result into an AI SDK UIMessage
letting audio file parts enter the agent resource materialization flow
adding system guidance for how the agent should use tts

tts is not responsible for:

installing local models
choosing a TTS provider
managing Python dependencies
persisting project config
providing CLI management commands

Constructor Options

Option	Purpose	Default
`tts`	Required real speech synthesis function	none
`language`	Default language hint, such as `auto`, `zh`, or `en`	none
`voice`	Default voice hint	none
`format`	Default output format hint, such as `wav`, `mp3`, or `ogg`	none
`name`	Plugin name	`tts`
`title`	Display title	`TTS`
`description`	Plugin description	built-in description

Explicit Call

const result = await agent.plugins.runAction({
  plugin: "tts",
  action: "synthesize",
  payload: {
    text: "Hello, welcome to Downcity",
    language: "en",
    format: "wav",
  },
});

text is required. Common optional fields include:

language
voice
format
speed
provider_options

Result Shape

The injected tts function can return an AI SDK UIMessage directly, or a simple audio result:

return {
  data_url: "data:audio/wav;base64,...",
  media_type: "audio/wav",
  filename: "speech.wav",
};

The plugin normalizes this into a UIMessage with a file part. When called through plugin_call, the agent materializes the audio under .downcity/resources and keeps an Agent-root relative path such as .downcity/resources/... in the final assistant message.

How The Agent Calls It

plugin_call({
  plugin: "tts",
  action: "synthesize",
  payload: {
    text: "...",
  },
});

TTS is an output-generation capability. It does not automatically intercept the message flow. Call it only when the user explicitly wants voice, narration, or an audio file.

tts Plugin

tts Plugin

Design Boundary

Constructor Options

Explicit Call

Result Shape

How The Agent Calls It

Table of Contents