Chatbot Kit

Text to Speech

Generate natural-sounding speech audio from text using AI voice synthesis, designed to work seamlessly with your chatbot applications.

What it is

The Text to Speech system provides:

  • AI voice synthesis that converts text to natural-sounding audio
  • Rate limiting protection to manage API usage and costs
  • Scenario-based voice selection for different chatbot personalities
  • Binary audio responses ready for immediate playback
  • Flexible TTS provider integration supporting multiple AI voice services

How to use it

API Endpoint

Send POST requests to your TTS webhook URL with:

{
  "text": "Hello! Welcome to our service. How can I help you today?",
  "scenario": "customer-service"
}

Response

Receive binary audio data (typically WAV or MP3) ready for playback in your application.

How it works

  1. Rate Limiting - Verifies the user hasn't exceeded TTS usage limits
  2. Text Processing - Receives text and scenario parameters
  3. Voice Synthesis - Generates audio using the configured TTS provider
  4. Audio Delivery - Returns binary audio data for immediate use

Workflow Architecture

The TTS system consists of three interconnected workflows:

TTS (Main API Route)

  • Purpose: Public API endpoint called by your chatbot applications
  • Function: Handles incoming requests with rate limiting and routing
  • Usage: Integrated with ChatbotKit AI engine for seamless voice responses

TTS-Call (Core Processing)

  • Purpose: Subworkflow responsible for actual voice synthesis
  • Function: Interfaces with TTS providers and manages audio generation
  • Customization: Replace OpenAI TTS with providers like ElevenLabs, Azure Speech, or others

TTS-Manual (Development & Testing)

  • Purpose: Manual trigger for testing and development
  • Function: Generate TTS audio directly in n8n interface
  • Usage: Download generated audio files for testing or batch processing

Voice Customization

TTS Provider Options

The system supports multiple voice synthesis providers:

  • OpenAI TTS (default) - Natural voices with good quality and speed
  • ElevenLabs - Premium voice cloning and custom voice creation
  • Azure Speech Services - Enterprise-grade with extensive language support
  • Google Cloud TTS - Multilingual with WaveNet neural voices
  • Amazon Polly - Cost-effective with neural and standard voices

Switching TTS Providers

To change from OpenAI to another provider:

  1. Configure Credentials - Add your new provider's API keys in n8n
  2. Update TTS-Call Workflow - Modify the API call nodes to use your preferred service
  3. Adjust Parameters - Configure voice selection, speed, and quality settings
  4. Test Integration - Use TTS-Manual workflow to verify voice output

Voice Configuration

Customize voice characteristics:

  • Voice Selection - Choose different voices per scenario
  • Speech Rate - Adjust speaking speed for different contexts
  • Pitch Control - Modify voice pitch for personality matching
  • Audio Format - Select output format (MP3, WAV, OGG)
  • Quality Settings - Balance between audio quality and file size

Integration with Chat System

  • Automatic Triggering - Chat responses automatically generate TTS audio
  • Scenario Matching - Voice characteristics match chatbot personality
  • Session Continuity - Consistent voice throughout conversation
  • Performance Optimization - Cached audio for repeated responses

Rate Limiting

  • Usage Protection - Prevents excessive TTS API costs
  • Per-IP Limits - Controls individual user consumption
  • Error Responses - Clear feedback when limits exceeded
  • Cost Management - Helps maintain predictable TTS expenses

Best Practices

  • Text Optimization - Clean text of special characters and formatting
  • Length Management - Break long texts into shorter segments
  • Voice Testing - Use TTS-Manual to test different voices and settings
  • Provider Comparison - Evaluate different TTS services for your use case
  • Caching Strategy - Store frequently used audio to reduce API calls
  • Error Handling - Implement fallbacks for TTS service failures

Development Workflow

  1. Use TTS-Manual for initial voice testing and configuration
  2. Configure TTS-Call with your preferred provider and settings
  3. Test via TTS endpoint to verify integration with your applications
  4. Monitor usage through rate limiting logs and provider dashboards

The TTS system provides a complete voice synthesis solution that integrates seamlessly with your chatbot infrastructure while maintaining flexibility for different voice providers and customization needs.