mcp_server_webcrawl.crawlers.interrobot package

Submodules

mcp_server_webcrawl.crawlers.interrobot.adapter module

class InterroBotManager[source]

Bases: BaseManager

Manages HTTP text files in in-memory SQLite databases. Provides connection pooling and caching for efficient access.

Initialize the HTTP text manager with empty cache and statistics.

__init__()[source]

Initialize the HTTP text manager with empty cache and statistics.

Return type:: None

get_connection(group)[source]

Get database connection for sites in the group, creating if needed.

Parameters:

group (SitesGroup) – Group of sites to connect to

Returns:

Tuple of (SQLite connection to in-memory database with data loaded or None if building,: IndexState associated with this database)

Return type:

tuple[Connection | None, IndexState]

get_sites(datasrc, ids=None, fields=None)[source]

Get sites based on the provided parameters.

Parameters:

datasrc (Path) – path to the database
ids – optional list of site IDs
fields – list of fields to include in response

Returns:

List of SiteResult objects

Return type:

list[SiteResult]

get_resources(datasrc, sites=None, query='', fields=None, sort=None, limit=20, offset=0)[source]

Get resources from wget directories using in-memory SQLite.

Parameters:

datasrc (Path) – path to the directory containing wget captures
sites (list[int] | None) – optional list of site IDs to filter by
query (str) – search query string
fields (list[str] | None) – optional list of fields to include in response
sort (str | None) – sort order for results
limit (int) – maximum number of results to return
offset (int) – number of results to skip for pagination

Returns:

Tuple of (list of ResourceResult objects, total count)

Return type:

tuple[list[ResourceResult], int, IndexState]

mcp_server_webcrawl.crawlers.interrobot.crawler module

class InterroBotCrawler[source]

Bases: BaseCrawler

A crawler implementation for InterroBot data sources. Provides functionality for accessing and searching web content from InterroBot.

Initialize the InterroBotCrawler with a data source path and required adapter functions.

Parameters:: datasrc – Path to the data source

__init__(datasrc)[source]

Initialize the InterroBotCrawler with a data source path and required adapter functions.

Parameters:: datasrc (Path) – Path to the data source
Return type:: None

async mcp_list_tools()[source]

List available tools for this crawler.

Returns:: List of Tool objects
Return type:: list[Tool]

mcp_server_webcrawl.crawlers.interrobot.tests module

class InterroBotTests[source]

Bases: BaseCrawlerTests

Test suite for the InterroBot crawler implementation. Uses all wrapped test methods from BaseCrawlerTests plus InterroBot-specific features.

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

setUp()[source]: Set up the test environment with fixture data.

test_interrobot_pulse()[source]: Test basic crawler initialization.

test_interrobot_sites()[source]: Test site retrieval API functionality.

test_interrobot_search()[source]: Test boolean search functionality

test_interrobot_resources()[source]: Test resource retrieval API functionality with various parameters.

test_interrobot_images()[source]: Test InterroBot-specific image handling and thumbnails.

test_interrobot_random_sort()[source]: Test random sort functionality using the ‘?’ sort parameter.

test_interrobot_content_parsing()[source]: Test content type detection and parsing.

test_interrobot_mcp_features()[source]: Test InterroBot-specific MCP tool functionality.

test_thumbnails_sync()[source]: Test thumbnail generation functionality (InterroBot-specific).

test_interrobot_advanced_site_features()[source]: Test InterroBot-specific site features like robots field.

test_report()[source]: Test thumbnail generation functionality (InterroBot-specific).

mcp_server_webcrawl.crawlers.interrobot package

Submodules

mcp_server_webcrawl.crawlers.interrobot.adapter module

mcp_server_webcrawl.crawlers.interrobot.crawler module

mcp_server_webcrawl.crawlers.interrobot.tests module

Module contents