mcp_server_webcrawl.crawlers.interrobot package

Submodules

mcp_server_webcrawl.crawlers.interrobot.adapter module

iso_to_datetime(dt_string)[source]

Convert ISO string to datetime.

python<=3.10 struggles with zulu and fractions of seconds, will throw. smooth out the iso string, second precision isn’t key here

Parameters:

dt_string (str | None) –

Return type:

datetime

get_sites(datasrc, ids=None, fields=None)[source]

Get sites based on the provided parameters.

Parameters:
  • datasrc (Path) – Path to the database

  • ids – Optional list of site IDs

  • fields – List of fields to include in response

Returns:

List of SiteResult objects

Return type:

list[SiteResult]

get_resources(datasrc, ids=None, sites=None, query='', types=None, fields=None, statuses=None, sort=None, limit=20, offset=0)[source]

Get resources based on the provided parameters.

Args (all query/WHERE args ANDed):

datasrc: Path to the database ids: Optional list of resource IDs site: Optional project ID to filter by site query: Search query string for FTS5 search types: Optional filter for specific resource types fields: List of fields to include in response statuses: List of HTTP statuses to include in response sort: Sort order for results limit: Maximum number of results to return offset: Number of results to skip for pagination

Returns:

  • List of ResourceResult objects

  • Total count of matching resources

Return type:

Tuple containing

Parameters:

mcp_server_webcrawl.crawlers.interrobot.crawler module

class InterroBotCrawler[source]

Bases: BaseCrawler

A crawler implementation for InterroBot data sources. Provides functionality for accessing and searching web content from InterroBot.

Initialize the InterroBotCrawler with a data source path.

Parameters:

datasrc – Path to the data source

__init__(datasrc)[source]

Initialize the InterroBotCrawler with a data source path.

Parameters:

datasrc – Path to the data source

async mcp_call_tool(name, arguments)[source]

Handle tool execution requests.

Parameters:
  • name (str) – Name of the tool to call

  • arguments (dict[str, Any] | None) – Arguments to pass to the tool

Returns:

List of content objects

Return type:

list[TextContent | ImageContent | EmbeddedResource]

async mcp_list_tools()[source]

List available tools for this crawler.

Returns:

List of Tool objects

Return type:

list[Tool]

get_sites_api(ids=None, fields=None)[source]

Retrieve site information from the InterroBot data source.

Parameters:
  • ids (list[int] | None) – Optional list of site IDs to filter

  • fields (list[str] | None) – Optional list of fields to include in the response

Returns:

API response object containing site information

Return type:

BaseJsonApi

get_resources_api(ids=None, sites=None, query='', types=None, fields=None, statuses=None, sort=None, limit=20, offset=0)[source]

Get resources in JSON format based on the provided parameters.

Parameters:
  • ids (list[int] | None) – Optional list of resource ids to retrieve specific resources directly

  • sites (list[int] | None) – Optional list of project ids to filter search results to a specific site

  • query (str) – Search query string

  • types (list[str] | None) – Optional filter for specific resource types

  • fields (list[str] | None) – List of additional fields to include in the response

  • statuses (list[int] | None) – Optional list of HTTP status codes to filter results

  • sort (str | None) – Sort order for results

  • limit (int) – Maximum number of results to return

  • offset (int) – Number of results to skip for pagination

Returns:

JSON string containing the results

Return type:

BaseJsonApi

mcp_server_webcrawl.crawlers.interrobot.tests module

class InterroBotTests[source]

Bases: BaseCrawlerTests

Test suite for the InterroBot crawler implementation.

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

setUp()[source]

Set up the test environment with fixture data.

test_interrobot_pulse()[source]

Test basic crawler initialization.

test_interrobot_mcp()[source]

Test MCP tool functionality.

test_interrobot_thumbnail()[source]

Test thumbnail generation for image resources.

test_interrobot_sites()[source]

Test site retrieval API functionality.

test_interrobot_resources()[source]

Test resource retrieval API functionality with various parameters.

test_interrobot_random_sort()[source]

Test the random sort functionality using the ‘?’ sort parameter.

Module contents