Katana MCP Setup Guide ====================== Instructions for setting up `mcp-server-webcrawl `_ with `Katana `_ crawler. This allows your LLM (e.g. Claude Desktop) to search content and metadata from websites you've crawled using Katana. .. raw:: html Follow along with the video, or the step-action guide below. Requirements ------------ Before you begin, ensure you have: - `Claude Desktop `_ installed - `Python `_ 3.10 or later installed - `Go programming language `_ installed - `Katana crawler `_ installed - Basic familiarity with command line interfaces What is Katana? --------------- Katana is an open-source web crawler from Project Discovery that offers: - Fast and efficient web crawling capabilities - Command-line interface for flexibility and automation - Highly configurable crawling parameters - Ability to store complete HTTP responses for analysis Installation Steps ------------------ 1. Install MCP Server Web Crawl ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Open your terminal or command line and install the package:: pip install mcp-server-webcrawl Verify installation was successful:: mcp-server-webcrawl --version 2. Install and Run Katana ~~~~~~~~~~~~~~~~~~~~~~~~~ 1. Verify Go is installed and on your PATH:: go version 2. Install Katana using Go:: go install github.com/projectdiscovery/katana/cmd/katana@latest 3. Create a directory for your crawls and run Katana with storage options:: # Create a directory for storing crawls mkdir crawls # Run Katana with storage options katana -u https://example.com -store-response -store-response-dir archives/example.com/ 4. Repeat for additional websites as needed:: katana -u https://pragmar.com -store-response -store-response-dir archives/pragmar.com/ In this case, the ./archives directory is the datasrc. The crawler will create a separate host directory for each unique host within the specified directory. This is consistent with the behavior of Katana, example.com/example.com is expected. Sites with external dependencies will branch out by origin host in the -store-response-dir, and continue to be searchable as a singular site search. 3. Configure Claude Desktop ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. Open Claude Desktop 2. Go to **File → Settings → Developer → Edit Config** 3. Add the following configuration (modify paths as needed): .. code-block:: json { "mcpServers": { "webcrawl": { "command": "/path/to/mcp-server-webcrawl", "args": ["--crawler", "katana", "--datasrc", "/path/to/katana/crawls/"] } } } .. note:: - On Windows, use ``"mcp-server-webcrawl"`` as the command - On macOS, use the absolute path (output of ``which mcp-server-webcrawl``) - Change ``/path/to/katana/crawls/`` to the actual path where you stored your Katana crawls 4. Save the file and **completely exit** Claude Desktop (not just close the window) 5. Restart Claude Desktop 4. Verify and Use ~~~~~~~~~~~~~~~~~ 1. In Claude Desktop, you should now see MCP tools available under Search and Tools 2. Ask Claude to list your crawled sites:: Can you list the crawled sites available? 3. Try searching content from your crawls:: Can you find information about [topic] on [crawled site]? 4. Try specialized searches that use Katana's comprehensive data collection:: Can you find all the help pages on this site and tell me how they're different? Troubleshooting --------------- - If Claude doesn't show MCP tools after restart, verify your configuration file is correctly formatted - Ensure Python and mcp-server-webcrawl are properly installed - Check that your Katana crawls directory path in the configuration is correct - Make sure the ``-store-response`` flag was used during crawling, as this is required to save content - Verify that each crawl completed successfully and files were saved to the expected location - Remember that the first time you use a function, Claude will ask for permission For more details, including API documentation and other crawler options, visit the `mcp-server-webcrawl documentation `_.