HTTrack MCP Setup Guide
Instructions for setting up mcp-server-webcrawl with HTTrack Website Copier. This allows your LLM (e.g. Claude Desktop) to search content and metadata from websites you’ve mirrored using HTTrack.
Follow along with the video, or the step-action guide below.
Requirements
Before you begin, ensure you have:
Claude Desktop installed
Python 3.10 or later installed
HTTrack Website Copier installed
Basic familiarity with command line interfaces
What is HTTrack?
HTTrack is a well-established open source website mirror tool that offers:
Complete website mirroring with organized project directories
User-friendly wizard-style interface for setup
Comprehensive content capture including HTML, CSS, images, and other assets
Ability to manage multiple site mirrors efficiently
Cross-platform support (Windows, macOS, Linux)
Installation Steps
1. Install mcp-server-webcrawl
Open your terminal or command line and install the package:
pip install mcp-server-webcrawl
Verify installation was successful:
mcp-server-webcrawl --help
2. Create Website Mirrors with HTTrack
Open HTTrack Website Copier application
Create a new project (e.g., “example”) and specify where to save it
Add the URL you want to mirror (e.g., https://example.com)
Use the wizard interface to configure your crawling options
Start the mirroring process and wait for completion
Repeat for additional sites as needed (e.g., create another project for pragmar.com)
HTTrack will create organized project directories under your specified location (typically “My Web Sites” on Windows or “websites” on macOS/Linux). Each project contains the complete website mirror with all HTML files, images, CSS, and other assets properly organized.
3. Configure Claude Desktop
Open Claude Desktop
Go to File → Settings → Developer → Edit Config
Add the following configuration (modify paths as needed):
{
"mcpServers": {
"webcrawl": {
"command": "/path/to/mcp-server-webcrawl",
"args": ["--crawler", "httrack", "--datasrc",
"/path/to/httrack/projects/"]
}
}
}
Note
On macOS/Linux, use the absolute path (output of
which mcp-server-webcrawl
), and the default path is typically"~/websites"
The datasrc path should point to your HTTrack project directory containing all your mirrored sites
Save the file and completely exit Claude Desktop (not just close the window)
Restart Claude Desktop
4. Verify and Use
In Claude Desktop, you should now see MCP tools available under Search and Tools
Ask Claude to list your crawled sites:
Can you list the crawled sites available?
Try searching content from your crawls:
Can you find information about [topic] on [crawled site]?
Conduct content audits and SEO analysis:
Can you analyze the content structure and SEO elements for [crawled site]?
Troubleshooting
If Claude doesn’t show MCP tools after restart, verify your configuration file is correctly formatted
Ensure Python and mcp-server-webcrawl are properly installed
Check that your HTTrack project directory path in the configuration is correct
Make sure HTTrack has successfully completed mirroring the websites and created the project directories
Remember that the first time you use a function, Claude will ask for permission
For large websites, initial indexing may take some time during the first search
HTTrack’s project structure makes it easy to manage multiple site mirrors, and when combined with mcp-server-webcrawl, provides for content analysis, SEO audits, and searchable archives.
For more details, including API documentation and other crawler options, visit the mcp-server-webcrawl documentation.