Chatbot Kit

AI Engine

The AI Engine is a unified Zustand store that manages all aspects of the AI-powered conversational experience. It combines multiple specialized slices to handle chat interactions, audio processing, 3D scene management, and speech recognition.

The main idea is to provide a single source of truth for the application's state, allowing different components to easily access and modify the state as needed.

You can create multiple scenarios or templates by reusing the same AI Engine store, making it easy to maintain and extend.

Architecture Overview

The AI Engine is built using a modular slice architecture, combining four key components:

  • Chat Slice: Manages conversation state and message handling
  • Audio Slice: Handles text-to-speech, audio playback, and lip-sync
  • Scene Slice: Controls 3D avatar animations and camera positioning
  • Speech Recognition Slice: Manages voice input capabilities
/src
├── /store
│   ├── /aiEngine.js               # Main Zustand store combining all slices
│   ├── /audioSlice.js             # Manages audio playback and processing
│   ├── /chatSlice.js              # Manages chat messages and state
│   ├── /sceneSlice.js             # Manages 3D scene and avatar state
│   ├── /speechRecognitionSlice.js # Manages speech recognition state

Core Components

Store Structure (aiEngine.js)

The main store combines all slices into a single unified interface:

const aiEngine = create((set, get) => ({
  ...chatSlice(set, get),
  ...audioSlice(set, get),
  ...sceneSlice(set, get),
  ...speechRecognitionSlice(set, get),
}));

Initialization automatically sets up:

  • Audio player with lip-sync capabilities
  • Speech recognition (if available in browser)

This approach allows you to add new features or slices easily in the future.

Chat Slice (chatSlice.js)

Manages the conversational flow between user and AI assistant.

Key State Properties

  • status: Current chat status (IDLE, LOADING, ERROR, ERROR_RATE_LIMITER)
  • messages: Array of conversation messages
  • sessionId: Unique session identifier
  • mode: Chat mode configuration
  • scenario: Current conversation scenario

Core Methods

sendMessage(message)

  • Sends user message to the AI backend
  • Updates chat status and message history
  • Triggers avatar state machine updates
  • Handles API responses and error states
  • Automatically queues assistant responses for audio playback

addManualMessage(message)

  • Adds a message directly without API call
  • Useful for system messages or pre-scripted content
  • Automatically queues for audio processing

handleRateLimitError()

  • Displays rate limiting error to user
  • Updates chat status appropriately

Audio Slice (audioSlice.js)

Manages all audio-related functionality including text-to-speech and playback synchronization.

Key State Properties

  • audioPlayer: HTML Audio element instance
  • lipsyncManager: Wawa-lipsync integration for mouth animations
  • queue: Messages pending audio playback
  • audioCache: URL cache for generated audio
  • currentMessageId: ID of currently playing message
  • audioPlayerStatus: 'idle' or 'busy' state indicator

Core Methods

setupAudioPlayer()

  • Initializes HTML5 Audio element
  • Configures lip-sync manager
  • Sets up audio event handlers
  • Starts animation frame loop for audio analysis

fetchAudio(message)

  • Converts message text to speech via TTS API
  • Caches audio URLs for reuse
  • Handles rate limiting and errors
  • Supports direct audio URLs if provided

checkPlayback()

  • Manages audio queue processing
  • Ensures sequential playback
  • Updates message audio status
  • Triggers avatar FSM updates

playMessage(message)

  • Plays specific message audio
  • Handles pause/resume logic
  • Manages queue priorities

Scene Slice (sceneSlice.js)

Controls 3D avatar visualization and scene management.

Key State Properties

  • avatarAnimation: Current animation state (IDLE, THINKING, TALKING)
  • avatarMood: Emotional expression (NEUTRAL, HAPPY, SAD, etc.)
  • cameraPosition: Camera zoom level
  • avatarEyesFollowPointer: Eye tracking toggle
  • avatarLookAtCamera: Head tracking toggle
  • avatarHeadWander: Head movement toggle
  • sceneMode: Scene rendering mode
  • loading: Scene loading state

Core Methods

avatarFSM()

  • Finite State Machine for avatar behavior
  • Synchronizes avatar state with:
    • Currently playing audio
    • Chat loading status
    • Queue status
  • Manages camera transitions
  • Sets appropriate animations and moods

Scene State Transitions

  • Playing audio → Talking animation + Zoom in
  • Loading response → Thinking animation + Confused mood
  • Idle → Default animation + Neutral mood

Speech Recognition Slice (speechRecognitionSlice.js)

Manages voice input using Web Speech API.

Key State Properties

  • speechRecognitionInstance: Browser SpeechRecognition object
  • speechRecognitionStatus: Current recognition state
  • speechRecognitionLoading: Loading indicator
  • onSpeechRecognitionEndedListeners: Event listener array

Core Methods

setupSpeechRecognition(options)

  • Initializes Web Speech API
  • Configures language and recognition settings
  • Sets up event handlers for results
  • Handles browser compatibility

startSpeechRecognition()

  • Begins voice capture
  • Prevents conflicts with audio playback
  • Manages recognition state

setSpeechRecognitionLanguage(language)

  • Dynamically updates recognition language

Constants (constants.js)

To simplify binding and avoid typos, key constants are defined in the src/store/constants.js file. It includes the audio statuses, avatar animations, moods, chat statuses, and more.

Data Flow

Message Processing Pipeline

  1. User Input → Chat Slice

    • Via text input or speech recognition
    • Message added to conversation history
    • API call initiated
  2. AI Response → Chat Slice

    • Response received from backend
    • Messages parsed with metadata (animation, expression)
    • Added to message queue
  3. Audio Generation → Audio Slice

    • TTS API called for each message
    • Audio URLs cached
    • Playback queue managed
  4. Avatar Synchronization → Scene Slice

    • Avatar FSM triggered on state changes
    • Animation and mood updated
    • Camera position adjusted
  5. Playback → Audio & Scene

    • Audio played sequentially
    • Lip-sync data processed
    • Avatar animations synchronized

Integration Points

Backend APIs

  • /chat: Message processing endpoint
  • /tts: Text-to-speech generation
  • Scenario-based routing for context

Frontend Components

  • Chat interfaces consume message state
  • 3D viewers subscribe to scene state (Vanilla Three.js or React Three Fiber)
  • Audio controls interact with playback methods
  • Voice input buttons trigger speech recognition

External Dependencies

Performance Considerations

  • Audio Caching: Generated audio URLs are cached to avoid redundant API calls
  • Queue Management: Sequential processing prevents audio overlap
  • State Batching: Zustand handles efficient re-renders
  • Animation Loops: RequestAnimationFrame for smooth lip-sync

Error Handling

  • Rate Limiting: Graceful degradation with user feedback
  • Audio Failures: Fallback to text-only display
  • Speech Recognition: Feature detection and fallback
  • Network Errors: Retry logic and error states

Best Practices

  1. State Access: Use get() within actions to access current state
  2. Event Cleanup: Unsubscribe from listeners when components unmount
  3. Queue Management: Let the system handle audio sequencing
  4. FSM Updates: Call avatarFSM() after state changes affecting avatar
  5. Error States: Always handle API failures gracefully