Text to Speech

Generate natural-sounding speech audio from text using AI voice synthesis, designed to work seamlessly with your chatbot applications.

What it is

The Text to Speech system provides:

AI voice synthesis that converts text to natural-sounding audio
Rate limiting protection to manage API usage and costs
Scenario-based voice selection for different chatbot personalities
Binary audio responses ready for immediate playback
Flexible TTS provider integration supporting multiple AI voice services

How to use it

API Endpoint

Send POST requests to your TTS webhook URL with:

{
  "text": "Hello! Welcome to our service. How can I help you today?",
  "scenario": "customer-service"
}

Response

Receive binary audio data (typically WAV or MP3) ready for playback in your application.

How it works

Rate Limiting - Verifies the user hasn't exceeded TTS usage limits
Text Processing - Receives text and scenario parameters
Voice Synthesis - Generates audio using the configured TTS provider
Audio Delivery - Returns binary audio data for immediate use

Workflow Architecture

The TTS system consists of three interconnected workflows:

TTS (Main API Route)

Purpose: Public API endpoint called by your chatbot applications
Function: Handles incoming requests with rate limiting and routing
Usage: Integrated with ChatbotKit AI engine for seamless voice responses

TTS-Call (Core Processing)

Purpose: Subworkflow responsible for actual voice synthesis
Function: Interfaces with TTS providers and manages audio generation
Customization: Replace OpenAI TTS with providers like ElevenLabs, Azure Speech, or others

TTS-Manual (Development & Testing)

Purpose: Manual trigger for testing and development
Function: Generate TTS audio directly in n8n interface
Usage: Download generated audio files for testing or batch processing

Voice Customization

TTS Provider Options

The system supports multiple voice synthesis providers:

OpenAI TTS (default) - Natural voices with good quality and speed
ElevenLabs - Premium voice cloning and custom voice creation
Azure Speech Services - Enterprise-grade with extensive language support
Google Cloud TTS - Multilingual with WaveNet neural voices
Amazon Polly - Cost-effective with neural and standard voices

Switching TTS Providers

To change from OpenAI to another provider:

Configure Credentials - Add your new provider's API keys in n8n
Update TTS-Call Workflow - Modify the API call nodes to use your preferred service
Adjust Parameters - Configure voice selection, speed, and quality settings
Test Integration - Use TTS-Manual workflow to verify voice output

Voice Configuration

Customize voice characteristics:

Voice Selection - Choose different voices per scenario
Speech Rate - Adjust speaking speed for different contexts
Pitch Control - Modify voice pitch for personality matching
Audio Format - Select output format (MP3, WAV, OGG)
Quality Settings - Balance between audio quality and file size

Integration with Chat System

Automatic Triggering - Chat responses automatically generate TTS audio
Scenario Matching - Voice characteristics match chatbot personality
Session Continuity - Consistent voice throughout conversation
Performance Optimization - Cached audio for repeated responses

Rate Limiting

Usage Protection - Prevents excessive TTS API costs
Per-IP Limits - Controls individual user consumption
Error Responses - Clear feedback when limits exceeded
Cost Management - Helps maintain predictable TTS expenses

Best Practices

Text Optimization - Clean text of special characters and formatting
Length Management - Break long texts into shorter segments
Voice Testing - Use TTS-Manual to test different voices and settings
Provider Comparison - Evaluate different TTS services for your use case
Caching Strategy - Store frequently used audio to reduce API calls
Error Handling - Implement fallbacks for TTS service failures

Development Workflow

Use TTS-Manual for initial voice testing and configuration
Configure TTS-Call with your preferred provider and settings
Test via TTS endpoint to verify integration with your applications
Monitor usage through rate limiting logs and provider dashboards

The TTS system provides a complete voice synthesis solution that integrates seamlessly with your chatbot infrastructure while maintaining flexibility for different voice providers and customization needs.

Chat

Rate Limiter

Documentation

Architecture

N8N Workflows

Customization

Deployment