Katana MCP Setup Guide
Instructions for setting up mcp-server-webcrawl with Katana crawler. This allows your LLM (e.g. Claude Desktop) to search content and metadata from websites you’ve crawled using Katana.
Follow along with the video, or the step-action guide below.
Requirements
Before you begin, ensure you have:
Claude Desktop installed
Python 3.10 or later installed
Go programming language installed
Katana crawler installed
Basic familiarity with command line interfaces
What is Katana?
Katana is an open-source web crawler from Project Discovery that offers:
Fast and efficient web crawling capabilities
Command-line interface for flexibility and automation
Highly configurable crawling parameters
Ability to store complete HTTP responses for analysis
Installation Steps
1. Install MCP Server Web Crawl
Open your terminal or command line and install the package:
pip install mcp-server-webcrawl
Verify installation was successful:
mcp-server-webcrawl --version
2. Install and Run Katana
Verify Go is installed and on your PATH:
go version
Install Katana using Go:
go install github.com/projectdiscovery/katana/cmd/katana@latest
Create a directory for your crawls and run Katana with storage options:
# Create a directory for storing crawls mkdir crawls # Run Katana with storage options katana -u https://example.com -store-response -store-response-dir archives/example.com/
Repeat for additional websites as needed:
katana -u https://pragmar.com -store-response -store-response-dir archives/pragmar.com/
In this case, the ./archives directory is the datasrc. The crawler will create a separate host directory for each unique host within the specified directory. This is consistent with the behavior of Katana, example.com/example.com is expected. Sites with external dependencies will branch out by origin host in the -store-response-dir, and continue to be searchable as a singular site search.
3. Configure Claude Desktop
Open Claude Desktop
Go to File → Settings → Developer → Edit Config
Add the following configuration (modify paths as needed):
{
"mcpServers": {
"webcrawl": {
"command": "/path/to/mcp-server-webcrawl",
"args": ["--crawler", "katana", "--datasrc",
"/path/to/katana/crawls/"]
}
}
}
Note
On Windows, use
"mcp-server-webcrawl"
as the commandOn macOS, use the absolute path (output of
which mcp-server-webcrawl
)Change
/path/to/katana/crawls/
to the actual path where you stored your Katana crawls
Save the file and completely exit Claude Desktop (not just close the window)
Restart Claude Desktop
4. Verify and Use
In Claude Desktop, you should now see MCP tools available under Search and Tools
Ask Claude to list your crawled sites:
Can you list the crawled sites available?
Try searching content from your crawls:
Can you find information about [topic] on [crawled site]?
Try specialized searches that use Katana’s comprehensive data collection:
Can you find all the help pages on this site and tell me how they're different?
Troubleshooting
If Claude doesn’t show MCP tools after restart, verify your configuration file is correctly formatted
Ensure Python and mcp-server-webcrawl are properly installed
Check that your Katana crawls directory path in the configuration is correct
Make sure the
-store-response
flag was used during crawling, as this is required to save contentVerify that each crawl completed successfully and files were saved to the expected location
Remember that the first time you use a function, Claude will ask for permission
For more details, including API documentation and other crawler options, visit the mcp-server-webcrawl documentation.