Transform your website content into intelligent, searchable knowledge bases with this automated workflow that crawls URLs, processes content with AI, and creates vector embeddings for semantic search.
What it is
The Generate Knowledge Base workflow allows you to:
Submit multiple website URLs through a secure web form
Automatically crawl and extract clean content from each page
Generate intelligent Q&A pairs using AI to improve searchability
Store everything as vector embeddings in your Qdrant knowledge base
Keep your data fresh by automatically updating existing content
How to use it
Step 1: Access the Form
Navigate to your workflow's webhook URL to access the knowledge base form. You'll see:
Knowledge Base Dropdown: Select which collection to populate (e.g., "customer-service", "wellness-center")
Website URLs Field: Enter multiple URLs, one per line
Simply update the LLM node in your n8n workflow to use your preferred model.
Content Processing
You can customize:
Chunk size: Adjust how content is split for processing
Q&A generation: Modify the AI prompt to change question/answer style (Loaded from get scenario data workflow)
Content filtering: Configure which HTML elements to exclude
Metadata: Add custom fields for better content organization
Best Practices
URL Quality: Use URLs with substantial, well-structured content
Batch Size: Process 1-10 URLs at a time for optimal performance
Content Updates: Re-run the workflow periodically to keep knowledge bases current
Knowledge Base Organization: Use descriptive collection names for different content types
The workflow handles all technical complexity automatically, letting you focus on building comprehensive knowledge bases for your chatbots and AI applications.
Clearing Existing Data
By default, the workflow removes any existing content from the selected knowledge base that matches the submitted URLs. This ensures that your knowledge base remains up-to-date without duplicates.
To completely clear a knowledge base collection, you can use the Clear Full Scenario Knowledge Base workflow.
This is useful if you want to start fresh with entirely new content. It will delete the collection and all its data.