WARC MCP Setup Guide
Instructions for setting up mcp-server-webcrawl with WARC files. This allows your LLM (e.g. Claude Desktop) to search content and metadata from websites you’ve archived in WARC format.
Follow along with the video, or the step-action guide below.
Requirements
Before you begin, ensure you have:
Claude Desktop installed
Python 3.10 or later installed
Basic familiarity with command line interfaces
wget installed (macOS users can install via Homebrew, Windows users need WSL/Ubuntu)
What are WARC Files?
WARC (Web ARChive) files are single-file web archives that store complete crawl data including:
HTTP status codes
HTTP headers
Response content
Compared to wget mirror mode:
WARC: More comprehensive (preserves status codes and headers) but slower crawling
wget mirror: Faster crawling but doesn’t preserve status codes or headers
Installation Steps
1. Install MCP Server Web Crawl
Open your terminal or command line and install the package:
pip install mcp-server-webcrawl
Verify installation was successful:
mcp-server-webcrawl --version
2. Configure Claude Desktop
Open Claude Desktop
Go to File → Settings → Developer → Edit Config
Add the following configuration (modify paths as needed):
{
"mcpServers": {
"webcrawl": {
"command": "/path/to/mcp-server-webcrawl",
"args": ["--crawler", "warc", "--datasrc",
"/path/to/warc/archives/"]
}
}
}
Note
On Windows, use
"mcp-server-webcrawl"
as the commandOn macOS, use the absolute path (output of
which mcp-server-webcrawl
)Change
/path/to/warc/archives/
to your actual directory path where WARC files are stored
Save the file and completely exit Claude Desktop (not just close the window)
Restart Claude Desktop
3. Create WARC Files with Wget
Open Terminal (macOS) or Ubuntu/WSL (Windows)
Navigate to your target directory for storing WARC files
Run wget with WARC options:
# Basic WARC capture
wget --warc-file=example --recursive https://example.com
# More comprehensive capture with page requirements (CSS, images, etc.)
wget --warc-file=example --recursive --page-requisites https://example.com
Your WARC files will be created with a .warc.gz extension in your current directory.
4. Verify and Use
In Claude Desktop, you should now see MCP tools available under Search and Tools
Ask Claude to list your crawled sites:
Can you list the crawled sites available?
Try searching content from your crawls:
Can you find information about [topic] on [crawled site]?
Troubleshooting
If Claude doesn’t show MCP tools after restart, verify your configuration file is correctly formatted
Ensure Python and mcp-server-webcrawl are properly installed
Check that your WARC directory path in the configuration is correct
Make sure your WARC files have the correct extension (typically .warc.gz)
Remember that the first time you use each function, Claude will ask for permission
For large WARC files, initial indexing may take some time
For more details, including API documentation and other crawler options, visit the mcp-server-webcrawl documentation.