Katana MCP Setup Guide

Instructions for setting up mcp-server-webcrawl with Katana crawler. This allows your LLM (e.g. Claude Desktop) to search content and metadata from websites you’ve crawled using Katana.

Follow along with the video, or the step-action guide below.

Requirements

Before you begin, ensure you have:

Claude Desktop installed
Python 3.10 or later installed
Go programming language installed
Katana crawler installed
Basic familiarity with command line interfaces

What is Katana?

Katana is an open-source web crawler from Project Discovery that offers:

Fast and efficient web crawling capabilities
Command-line interface for flexibility and automation
Highly configurable crawling parameters
Ability to store complete HTTP responses for analysis

Installation Steps

1. Install MCP Server Web Crawl

Open your terminal or command line and install the package:

pip install mcp-server-webcrawl

Verify installation was successful:

mcp-server-webcrawl --version

2. Install and Run Katana

Verify Go is installed and on your PATH:
```
go version
```

Install Katana using Go:

go install github.com/projectdiscovery/katana/cmd/katana@latest

Create a directory for your crawls and run Katana with storage options:

# Create a directory for storing crawls
mkdir crawls

# Run Katana with storage options
katana -u https://example.com -store-response -store-response-dir archives/example.com/

Repeat for additional websites as needed:

katana -u https://pragmar.com -store-response -store-response-dir archives/pragmar.com/

In this case, the ./archives directory is the datasrc. The crawler will create a separate host directory for each unique host within the specified directory. This is consistent with the behavior of Katana, example.com/example.com is expected. Sites with external dependencies will branch out by origin host in the -store-response-dir, and continue to be searchable as a singular site search.

3. Configure Claude Desktop

Open Claude Desktop
Go to File → Settings → Developer → Edit Config
Add the following configuration (modify paths as needed):

{
  "mcpServers": {
    "webcrawl": {
      "command": "/path/to/mcp-server-webcrawl",
      "args": ["--crawler", "katana", "--datasrc",
        "/path/to/katana/crawls/"]
    }
  }
}

Note

On Windows, use "mcp-server-webcrawl" as the command
On macOS, use the absolute path (output of which mcp-server-webcrawl)
Change /path/to/katana/crawls/ to the actual path where you stored your Katana crawls

Save the file and completely exit Claude Desktop (not just close the window)
Restart Claude Desktop

4. Verify and Use

In Claude Desktop, you should now see MCP tools available under Search and Tools

Ask Claude to list your crawled sites:

Can you list the crawled sites available?

Try searching content from your crawls:

Can you find information about [topic] on [crawled site]?

Try specialized searches that use Katana’s comprehensive data collection:

Can you find all the help pages on this site and tell me how they're different?

Troubleshooting

If Claude doesn’t show MCP tools after restart, verify your configuration file is correctly formatted
Ensure Python and mcp-server-webcrawl are properly installed
Check that your Katana crawls directory path in the configuration is correct
Make sure the -store-response flag was used during crawling, as this is required to save content
Verify that each crawl completed successfully and files were saved to the expected location
Remember that the first time you use a function, Claude will ask for permission

For more details, including API documentation and other crawler options, visit the mcp-server-webcrawl documentation.