AI Engine

The AI Engine is a unified Zustand store that manages all aspects of the AI-powered conversational experience. It combines multiple specialized slices to handle chat interactions, audio processing, 3D scene management, and speech recognition.

The main idea is to provide a single source of truth for the application's state, allowing different components to easily access and modify the state as needed.

You can create multiple scenarios or templates by reusing the same AI Engine store, making it easy to maintain and extend.

Architecture Overview

The AI Engine is built using a modular slice architecture, combining four key components:

Chat Slice: Manages conversation state and message handling
Audio Slice: Handles text-to-speech, audio playback, and lip-sync
Scene Slice: Controls 3D avatar animations and camera positioning
Speech Recognition Slice: Manages voice input capabilities

/src
├── /store
│   ├── /aiEngine.js               # Main Zustand store combining all slices
│   ├── /audioSlice.js             # Manages audio playback and processing
│   ├── /chatSlice.js              # Manages chat messages and state
│   ├── /sceneSlice.js             # Manages 3D scene and avatar state
│   ├── /speechRecognitionSlice.js # Manages speech recognition state

Core Components

Store Structure (`aiEngine.js`)

The main store combines all slices into a single unified interface:

const aiEngine = create((set, get) => ({
  ...chatSlice(set, get),
  ...audioSlice(set, get),
  ...sceneSlice(set, get),
  ...speechRecognitionSlice(set, get),
}));

Initialization automatically sets up:

Audio player with lip-sync capabilities
Speech recognition (if available in browser)

This approach allows you to add new features or slices easily in the future.

Chat Slice (`chatSlice.js`)

Manages the conversational flow between user and AI assistant.

Key State Properties

status: Current chat status (IDLE, LOADING, ERROR, ERROR_RATE_LIMITER)
messages: Array of conversation messages
sessionId: Unique session identifier
mode: Chat mode configuration
scenario: Current conversation scenario

Core Methods

sendMessage(message)

Sends user message to the AI backend
Updates chat status and message history
Triggers avatar state machine updates
Handles API responses and error states
Automatically queues assistant responses for audio playback

addManualMessage(message)

Adds a message directly without API call
Useful for system messages or pre-scripted content
Automatically queues for audio processing

handleRateLimitError()

Displays rate limiting error to user
Updates chat status appropriately

Audio Slice (`audioSlice.js`)

Manages all audio-related functionality including text-to-speech and playback synchronization.

Key State Properties

audioPlayer: HTML Audio element instance
lipsyncManager: Wawa-lipsync integration for mouth animations
queue: Messages pending audio playback
audioCache: URL cache for generated audio
currentMessageId: ID of currently playing message
audioPlayerStatus: 'idle' or 'busy' state indicator

Core Methods

setupAudioPlayer()

Initializes HTML5 Audio element
Configures lip-sync manager
Sets up audio event handlers
Starts animation frame loop for audio analysis

fetchAudio(message)

Converts message text to speech via TTS API
Caches audio URLs for reuse
Handles rate limiting and errors
Supports direct audio URLs if provided

checkPlayback()

Manages audio queue processing
Ensures sequential playback
Updates message audio status
Triggers avatar FSM updates

playMessage(message)

Plays specific message audio
Handles pause/resume logic
Manages queue priorities

Scene Slice (`sceneSlice.js`)

Controls 3D avatar visualization and scene management.

Key State Properties

avatarAnimation: Current animation state (IDLE, THINKING, TALKING)
avatarMood: Emotional expression (NEUTRAL, HAPPY, SAD, etc.)
cameraPosition: Camera zoom level
avatarLookAtCamera: Head tracking toggle
avatarHeadWander: Head movement toggle
sceneMode: Scene rendering mode
loading: Scene loading state

Core Methods

avatarFSM()

Finite State Machine for avatar behavior
Synchronizes avatar state with:
- Currently playing audio
- Chat loading status
- Queue status
Manages camera transitions
Sets appropriate animations and moods

Scene State Transitions

Playing audio → Talking animation + Zoom in
Loading response → Thinking animation + Confused mood
Idle → Default animation + Neutral mood

Speech Recognition Slice (`speechRecognitionSlice.js`)

Manages voice input using Web Speech API.

Key State Properties

speechRecognitionInstance: Browser SpeechRecognition object
speechRecognitionStatus: Current recognition state
speechRecognitionLoading: Loading indicator
onSpeechRecognitionEndedListeners: Event listener array

Core Methods

setupSpeechRecognition(options)

Initializes Web Speech API
Configures language and recognition settings
Sets up event handlers for results
Handles browser compatibility

startSpeechRecognition()

Begins voice capture
Prevents conflicts with audio playback
Manages recognition state

setSpeechRecognitionLanguage(language)

Dynamically updates recognition language

Constants (`constants.js`)

To simplify binding and avoid typos, key constants are defined in the src/store/constants.js file. It includes the audio statuses, avatar animations, moods, chat statuses, and more.

Data Flow

Message Processing Pipeline

User Input → Chat Slice
- Via text input or speech recognition
- Message added to conversation history
- API call initiated
AI Response → Chat Slice
- Response received from backend
- Messages parsed with metadata (animation, expression)
- Added to message queue
Audio Generation → Audio Slice
- TTS API called for each message
- Audio URLs cached
- Playback queue managed
Avatar Synchronization → Scene Slice
- Avatar FSM triggered on state changes
- Animation and mood updated
- Camera position adjusted
Playback → Audio & Scene
- Audio played sequentially
- Lip-sync data processed
- Avatar animations synchronized

Integration Points

Backend APIs

/chat: Message processing endpoint
/tts: Text-to-speech generation
Scenario-based routing for context

Frontend Components

Chat interfaces consume message state
3D viewers subscribe to scene state (Vanilla Three.js or React Three Fiber)
Audio controls interact with playback methods
Voice input buttons trigger speech recognition

External Dependencies

wawa-lipsync: Lip synchronization library
Web Speech API: Browser speech recognition
HTML5 Audio API: Audio playback control

Performance Considerations

Audio Caching: Generated audio URLs are cached to avoid redundant API calls
Queue Management: Sequential processing prevents audio overlap
State Batching: Zustand handles efficient re-renders
Animation Loops: RequestAnimationFrame for smooth lip-sync

Error Handling

Rate Limiting: Graceful degradation with user feedback
Audio Failures: Fallback to text-only display
Speech Recognition: Feature detection and fallback
Network Errors: Retry logic and error states

Best Practices

State Access: Use get() within actions to access current state
Event Cleanup: Unsubscribe from listeners when components unmount
Queue Management: Let the system handle audio sequencing
FSM Updates: Call avatarFSM() after state changes affecting avatar
Error States: Always handle API failures gracefully

Codebase Structure

Codebase Updates

Documentation

Architecture

N8N Workflows

Customization

Deployment

AI Engine

Architecture Overview

Core Components

Store Structure (`aiEngine.js`)

Chat Slice (`chatSlice.js`)

Key State Properties

Core Methods

Audio Slice (`audioSlice.js`)

Key State Properties

Core Methods

Scene Slice (`sceneSlice.js`)

Key State Properties

Core Methods

Speech Recognition Slice (`speechRecognitionSlice.js`)

Key State Properties

Core Methods

Constants (`constants.js`)

Data Flow

Message Processing Pipeline

Integration Points

Backend APIs

Frontend Components

External Dependencies

Performance Considerations

Error Handling

Best Practices

Documentation

Architecture

N8N Workflows

Customization

Deployment

AI Engine

Architecture Overview

Core Components

Store Structure (aiEngine.js)

Chat Slice (chatSlice.js)

Key State Properties

Core Methods

Audio Slice (audioSlice.js)

Key State Properties

Core Methods

Scene Slice (sceneSlice.js)

Key State Properties

Core Methods

Speech Recognition Slice (speechRecognitionSlice.js)

Key State Properties

Core Methods

Constants (constants.js)

Data Flow

Message Processing Pipeline

Integration Points

Backend APIs

Frontend Components

External Dependencies

Performance Considerations

Error Handling

Best Practices

Store Structure (`aiEngine.js`)

Chat Slice (`chatSlice.js`)

Audio Slice (`audioSlice.js`)

Scene Slice (`sceneSlice.js`)

Speech Recognition Slice (`speechRecognitionSlice.js`)

Constants (`constants.js`)