HTTrack MCP Setup Guide

Instructions for setting up mcp-server-webcrawl with HTTrack Website Copier. This allows your LLM (e.g. Claude Desktop) to search content and metadata from websites you’ve mirrored using HTTrack.

Follow along with the video, or the step-action guide below.

Requirements

Before you begin, ensure you have:

Claude Desktop installed
Python 3.10 or later installed
HTTrack Website Copier installed
Basic familiarity with command line interfaces

What is HTTrack?

HTTrack is a well-established open source website mirror tool that offers:

Complete website mirroring with organized project directories
User-friendly wizard-style interface for setup
Comprehensive content capture including HTML, CSS, images, and other assets
Ability to manage multiple site mirrors efficiently
Cross-platform support (Windows, macOS, Linux)

Installation Steps

1. Install mcp-server-webcrawl

Open your terminal or command line and install the package:

pip install mcp-server-webcrawl

Verify installation was successful:

mcp-server-webcrawl --help

2. Create Website Mirrors with HTTrack

Open HTTrack Website Copier application
Create a new project (e.g., “example”) and specify where to save it
Add the URL you want to mirror (e.g., https://example.com)
Use the wizard interface to configure your crawling options
Start the mirroring process and wait for completion
Repeat for additional sites as needed (e.g., create another project for pragmar.com)

HTTrack will create organized project directories under your specified location (typically “My Web Sites” on Windows or “websites” on macOS/Linux). Each project contains the complete website mirror with all HTML files, images, CSS, and other assets properly organized.

3. Configure Claude Desktop

Open Claude Desktop
Go to File → Settings → Developer → Edit Config
Add the following configuration (modify paths as needed):

{
  "mcpServers": {
    "webcrawl": {
      "command": "/path/to/mcp-server-webcrawl",
      "args": ["--crawler", "httrack", "--datasrc",
        "/path/to/httrack/projects/"]
    }
  }
}

Note

On macOS/Linux, use the absolute path (output of which mcp-server-webcrawl), and the default path is typically "~/websites"
The datasrc path should point to your HTTrack project directory containing all your mirrored sites

Save the file and completely exit Claude Desktop (not just close the window)
Restart Claude Desktop

4. Verify and Use

In Claude Desktop, you should now see MCP tools available under Search and Tools

Ask Claude to list your crawled sites:

Can you list the crawled sites available?

Try searching content from your crawls:

Can you find information about [topic] on [crawled site]?

Conduct content audits and SEO analysis:

Can you analyze the content structure and SEO elements for [crawled site]?

Troubleshooting

If Claude doesn’t show MCP tools after restart, verify your configuration file is correctly formatted
Ensure Python and mcp-server-webcrawl are properly installed
Check that your HTTrack project directory path in the configuration is correct
Make sure HTTrack has successfully completed mirroring the websites and created the project directories
Remember that the first time you use a function, Claude will ask for permission
For large websites, initial indexing may take some time during the first search

HTTrack’s project structure makes it easy to manage multiple site mirrors, and when combined with mcp-server-webcrawl, provides for content analysis, SEO audits, and searchable archives.

For more details, including API documentation and other crawler options, visit the mcp-server-webcrawl documentation.