Installation

Install the package via pip:

pip install mcp_server_webcrawl

Requirements

To use mcp_server_webcrawl effectively, you need:

  • An MCP-capable LLM client such as Claude Desktop

  • Python installed on your command line interface

  • Basic familiarity with running Python packages

After ensuring these prerequisites are met, simply run the pip install command above to add the package to your environment. The package will handle its own dependencies during installation.

MCP Configuration

To enable your LLM client to access your web crawl data, you’ll need to add an MCP server configuration. From Claude’s developer settings, locate the MCP configuration section and add the appropriate configuration for your crawler type.

Below are configurations for each supported crawler type. Choose the one that matches your crawler and modify the --datasrc path to point to your specific data location.

wget

{
  "mcpServers": {
    "webcrawl": {
      "command": "mcp-server-webcrawl",
       "args": ["--crawler", "wget", "--datasrc",
         "/path/to/wget/archives/"]
    }
  }
}

Tested wget commands:

# (macOS Terminal/Windows WSL)
# --adjust-extension for file extensions, e.g. *.html
$ wget --mirror https://example.com
$ wget --mirror https://example.com --adjust-extension

WARC

{
  "mcpServers": {
    "webcrawl": {
      "command": "mcp-server-webcrawl",
       "args": ["--crawler", "warc", "--datasrc",
         "/path/to/warc/archives/"]
    }
  }
}

Tested WARC commands:

# (macOS Terminal/Windows WSL)
$ wget --warc-file=example --recursive https://example.com
$ wget --warc-file=example --recursive --page-requisites https://example.com

InterroBot

{
  "mcpServers": {
    "webcrawl": {
      "command": "mcp-server-webcrawl",
       "args": ["--crawler", "interrobot", "--datasrc",
         "[homedir]/Documents/InterroBot/interrobot.v2.db"]
    }
  }
}

Notes for InterroBot:

  • Crawls must be executed in InterroBot (windowed application)

  • On Windows: replace [homedir] with /Users/…

  • On macOS: path is provided on InterroBot settings page

Katana

{
  "mcpServers": {
    "webcrawl": {
      "command": "mcp-server-webcrawl",
       "args": ["--crawler", "katana", "--datasrc",
         "/path/to/katana/crawls/"]
    }
  }
}

Tested Katana commands:

# (macOS Terminal/Powershell/WSL)
# -store-response to save crawl contents
# -store-response-dir allows for many site crawls in one dir
$ katana -u https://example.com -store-response -store-response-dir crawls/

SiteOne

{
  "mcpServers": {
    "webcrawl": {
      "command": "mcp-server-webcrawl",
       "args": ["--crawler", "siteone", "--datasrc",
         "/path/to/siteone/archives/"]
    }
  }
}

Notes for SiteOne:

  • Crawls must be executed in SiteOne (windowed application)

  • Generate offline website must be checked

Multiple Configurations

You can set up multiple mcp-server-webcrawl connections under the mcpServers section if you want to access different crawler data sources simultaneously.

{
  "mcpServers": {
    "webcrawl_warc": {
      "command": "mcp-server-webcrawl",
       "args": ["--crawler", "warc", "--datasrc", "/path/to/warc/archives/"]
    },
    "webcrawl_wget": {
      "command": "mcp-server-webcrawl",
       "args": ["--crawler", "wget", "--datasrc", "/path/to/wget/archives/"]
    }
  }
}

After adding the configuration, save the file and restart your LLM client to apply the changes.