wget MCP Setup Guide

Instructions for setting up mcp-server-webcrawl with wget. This allows your LLM (e.g. Claude Desktop) to search content and metadata from websites you’ve crawled.

Follow along with the video, or the step-action guide.

Requirements

Before you begin, ensure you have:

  • Claude Desktop installed

  • Python 3.10 or later installed

  • Basic familiarity with command line interfaces

  • wget installed (macOS users can install via Homebrew, Windows users need WSL/Ubuntu)

Installation Steps

1. Install MCP Server Web Crawl

Open your terminal or command line and install the package:

pip install mcp-server-webcrawl

Verify installation was successful by checking the version:

mcp-server-webcrawl --version

2. Configure Claude Desktop

  1. Open Claude Desktop

  2. Go to File → Settings → Developer → Edit Config

  3. Add the following configuration (modify paths as needed):

{
  "mcpServers": {
    "webcrawl": {
      "command": "/path/to/mcp-server-webcrawl",
      "args": ["--crawler", "wget", "--datasrc",
        "/path/to/wget/archives/"]
    }
  }
}

Note

  • On Windows, use "mcp-server-webcrawl" as the command

  • On macOS, use the absolute path (output of which mcp-server-webcrawl)

  • Change /path/to/wget/archives/ to your actual directory path

  1. Save the file and completely exit Claude Desktop (not just close the window)

  2. Restart Claude Desktop

3. Crawl Websites with wget

  1. Open Terminal (macOS) or Ubuntu/WSL (Windows)

  2. Navigate to your target directory for storing crawls

  3. Run wget with the mirror option:

wget --mirror https://example.com

4. Verify and Use

  1. In Claude Desktop, you should now see an MCP tool option under Search and Tools

  2. Ask Claude to list your crawled sites:

Can you list the crawled sites available?
  1. Try searching content from your crawls:

Can you find information about [topic] on [crawled site]?

Troubleshooting

  • If Claude doesn’t show MCP tools after restart, verify your configuration file is correctly formatted

  • Ensure Python and mcp-server-webcrawl are properly installed, and on PATH or using absolute paths

  • Check that your crawl directory path in the configuration is correct

  • Remember that the first time you use a function, Claude will ask for permission

  • Indexing for file-based archives (wget included) requires build time on first search, time is dependent on archive size

For more details, including API documentation and other crawler options, visit the mcp-server-webcrawl documentation.