mcp_server_webcrawl.crawlers.interrobot package
Submodules
mcp_server_webcrawl.crawlers.interrobot.adapter module
- class InterroBotManager[source]
Bases:
BaseManager
Manages HTTP text files in in-memory SQLite databases. Provides connection pooling and caching for efficient access.
Initialize the HTTP text manager with empty cache and statistics.
- __init__()[source]
Initialize the HTTP text manager with empty cache and statistics.
- Return type:
None
- get_connection(group)[source]
Get database connection for sites in the group, creating if needed.
- Parameters:
group (SitesGroup) – Group of sites to connect to
- Returns:
- Tuple of (SQLite connection to in-memory database with data loaded or None if building,
IndexState associated with this database)
- Return type:
tuple[Connection | None, IndexState]
- get_sites(datasrc, ids=None, fields=None)[source]
Get sites based on the provided parameters.
- Parameters:
datasrc (Path) – path to the database
ids – optional list of site IDs
fields – list of fields to include in response
- Returns:
List of SiteResult objects
- Return type:
- get_resources(datasrc, sites=None, query='', fields=None, sort=None, limit=20, offset=0)[source]
Get resources from wget directories using in-memory SQLite.
- Parameters:
datasrc (Path) – path to the directory containing wget captures
sites (list[int] | None) – optional list of site IDs to filter by
query (str) – search query string
fields (list[str] | None) – optional list of fields to include in response
sort (str | None) – sort order for results
limit (int) – maximum number of results to return
offset (int) – number of results to skip for pagination
- Returns:
Tuple of (list of ResourceResult objects, total count)
- Return type:
mcp_server_webcrawl.crawlers.interrobot.crawler module
- class InterroBotCrawler[source]
Bases:
BaseCrawler
A crawler implementation for InterroBot data sources. Provides functionality for accessing and searching web content from InterroBot.
Initialize the InterroBotCrawler with a data source path and required adapter functions.
- Parameters:
datasrc – Path to the data source
mcp_server_webcrawl.crawlers.interrobot.tests module
- class InterroBotTests[source]
Bases:
BaseCrawlerTests
Test suite for the InterroBot crawler implementation. Uses all wrapped test methods from BaseCrawlerTests plus InterroBot-specific features.
Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.
- test_interrobot_resources()[source]
Test resource retrieval API functionality with various parameters.