Text to Speech
Generate natural-sounding speech audio from text using AI voice synthesis, designed to work seamlessly with your chatbot applications.
What it is
The Text to Speech system provides:
AI voice synthesis that converts text to natural-sounding audio
Rate limiting protection to manage API usage and costs
Scenario-based voice selection for different chatbot personalities
Binary audio responses ready for immediate playback
Flexible TTS provider integration supporting multiple AI voice services
How to use it
API Endpoint
Send POST requests to your TTS webhook URL with:
Copy {
"text" : "Hello! Welcome to our service. How can I help you today?" ,
"scenario" : "customer-service"
}
Response
Receive binary audio data (typically WAV or MP3) ready for playback in your application.
How it works
Rate Limiting - Verifies the user hasn't exceeded TTS usage limits
Text Processing - Receives text and scenario parameters
Voice Synthesis - Generates audio using the configured TTS provider
Audio Delivery - Returns binary audio data for immediate use
Workflow Architecture
The TTS system consists of three interconnected workflows:
TTS (Main API Route)
Purpose : Public API endpoint called by your chatbot applications
Function : Handles incoming requests with rate limiting and routing
Usage : Integrated with ChatbotKit AI engine for seamless voice responses
TTS-Call (Core Processing)
Purpose : Subworkflow responsible for actual voice synthesis
Function : Interfaces with TTS providers and manages audio generation
Customization : Replace OpenAI TTS with providers like ElevenLabs, Azure Speech, or others
TTS-Manual (Development & Testing)
Purpose : Manual trigger for testing and development
Function : Generate TTS audio directly in n8n interface
Usage : Download generated audio files for testing or batch processing
Voice Customization
TTS Provider Options
The system supports multiple voice synthesis providers:
OpenAI TTS (default) - Natural voices with good quality and speed
ElevenLabs - Premium voice cloning and custom voice creation
Azure Speech Services - Enterprise-grade with extensive language support
Google Cloud TTS - Multilingual with WaveNet neural voices
Amazon Polly - Cost-effective with neural and standard voices
Switching TTS Providers
To change from OpenAI to another provider:
Configure Credentials - Add your new provider's API keys in n8n
Update TTS-Call Workflow - Modify the API call nodes to use your preferred service
Adjust Parameters - Configure voice selection, speed, and quality settings
Test Integration - Use TTS-Manual workflow to verify voice output
Voice Configuration
Customize voice characteristics:
Voice Selection - Choose different voices per scenario
Speech Rate - Adjust speaking speed for different contexts
Pitch Control - Modify voice pitch for personality matching
Audio Format - Select output format (MP3, WAV, OGG)
Quality Settings - Balance between audio quality and file size
Integration with Chat System
Automatic Triggering - Chat responses automatically generate TTS audio
Scenario Matching - Voice characteristics match chatbot personality
Session Continuity - Consistent voice throughout conversation
Performance Optimization - Cached audio for repeated responses
Rate Limiting
Usage Protection - Prevents excessive TTS API costs
Per-IP Limits - Controls individual user consumption
Error Responses - Clear feedback when limits exceeded
Cost Management - Helps maintain predictable TTS expenses
Best Practices
Text Optimization - Clean text of special characters and formatting
Length Management - Break long texts into shorter segments
Voice Testing - Use TTS-Manual to test different voices and settings
Provider Comparison - Evaluate different TTS services for your use case
Caching Strategy - Store frequently used audio to reduce API calls
Error Handling - Implement fallbacks for TTS service failures
Development Workflow
Use TTS-Manual for initial voice testing and configuration
Configure TTS-Call with your preferred provider and settings
Test via TTS endpoint to verify integration with your applications
Monitor usage through rate limiting logs and provider dashboards
The TTS system provides a complete voice synthesis solution that integrates seamlessly with your chatbot infrastructure while maintaining flexibility for different voice providers and customization needs.