Katana MCP Setup Guide

Instructions for setting up mcp-server-webcrawl with Katana crawler. This allows your LLM (e.g. Claude Desktop) to search content and metadata from websites you’ve crawled using Katana.

Follow along with the video, or the step-action guide below.

Requirements

Before you begin, ensure you have:

What is Katana?

Katana is an open-source web crawler from Project Discovery that offers:

  • Fast and efficient web crawling capabilities

  • Command-line interface for flexibility and automation

  • Highly configurable crawling parameters

  • Ability to store complete HTTP responses for analysis

Installation Steps

1. Install MCP Server Web Crawl

Open your terminal or command line and install the package:

pip install mcp-server-webcrawl

Verify installation was successful:

mcp-server-webcrawl --version

2. Install and Run Katana

  1. Verify Go is installed and on your PATH:

    go version
    
  2. Install Katana using Go:

    go install github.com/projectdiscovery/katana/cmd/katana@latest
    
  3. Create a directory for your crawls and run Katana with storage options:

    # Create a directory for storing crawls
    mkdir crawls
    
    # Run Katana with storage options
    katana -u https://example.com -store-response -store-response-dir archives/example.com/
    
  4. Repeat for additional websites as needed:

    katana -u https://pragmar.com -store-response -store-response-dir archives/pragmar.com/
    

In this case, the ./archives directory is the datasrc. The crawler will create a separate host directory for each unique host within the specified directory. This is consistent with the behavior of Katana, example.com/example.com is expected. Sites with external dependencies will branch out by origin host in the -store-response-dir, and continue to be searchable as a singular site search.

3. Configure Claude Desktop

  1. Open Claude Desktop

  2. Go to File → Settings → Developer → Edit Config

  3. Add the following configuration (modify paths as needed):

{
  "mcpServers": {
    "webcrawl": {
      "command": "/path/to/mcp-server-webcrawl",
      "args": ["--crawler", "katana", "--datasrc",
        "/path/to/katana/crawls/"]
    }
  }
}

Note

  • On Windows, use "mcp-server-webcrawl" as the command

  • On macOS, use the absolute path (output of which mcp-server-webcrawl)

  • Change /path/to/katana/crawls/ to the actual path where you stored your Katana crawls

  1. Save the file and completely exit Claude Desktop (not just close the window)

  2. Restart Claude Desktop

4. Verify and Use

  1. In Claude Desktop, you should now see MCP tools available under Search and Tools

  2. Ask Claude to list your crawled sites:

    Can you list the crawled sites available?
    
  3. Try searching content from your crawls:

    Can you find information about [topic] on [crawled site]?
    
  4. Try specialized searches that use Katana’s comprehensive data collection:

    Can you find all the help pages on this site and tell me how they're different?
    

Troubleshooting

  • If Claude doesn’t show MCP tools after restart, verify your configuration file is correctly formatted

  • Ensure Python and mcp-server-webcrawl are properly installed

  • Check that your Katana crawls directory path in the configuration is correct

  • Make sure the -store-response flag was used during crawling, as this is required to save content

  • Verify that each crawl completed successfully and files were saved to the expected location

  • Remember that the first time you use a function, Claude will ask for permission

For more details, including API documentation and other crawler options, visit the mcp-server-webcrawl documentation.