mcp_server_webcrawl.utils package

Submodules

mcp_server_webcrawl.utils.blobs module

class ThumbnailManager[source]

Bases: object

Manages thumbnail generation and caching for image files and URLs.

__init__()[source]
get_thumbnails(paths)[source]

Convert URLs or file paths to base64 encoded strings.

Parameters:

paths (list[str]) – List of URLs or file paths to convert

Returns:

Dictionary mapping paths to their base64 representation or None if failed

Return type:

dict[str, str | None]

mcp_server_webcrawl.utils.cli module

get_help_short_message(version)[source]
Parameters:

version (str) –

Return type:

str

get_help_long_message(version)[source]
Parameters:

version (str) –

Return type:

str

mcp_server_webcrawl.utils.logger module

get_logger_configuration()[source]

Get log name, path, and level (in that order)

Returns:

A tuple containing name, path, and level

Return type:

tuple[str, Path, int]

get_logger()[source]

Get logger, usually in order to write to it

Returns:

a writable logging object (error/warn/info/debug)

Return type:

Logger

initialize_logger()[source]

Validate and set up logger for writing

Returns:

None

Return type:

None

mcp_server_webcrawl.utils.querycache module

class QueryCountCache[source]

Bases: object

A cache for storing total count results from database queries. Only caches the count integer values, as these are reusable and light.

Initialize the query count cache.

Parameters:
  • max – Maximum number of entries to store in the cache

  • ttl – Time-to-live for cache entries in seconds

__init__(max=250, ttl=900)[source]

Initialize the query count cache.

Parameters:
  • max (int) – Maximum number of entries to store in the cache

  • ttl (int) – Time-to-live for cache entries in seconds

get(statement, params)[source]

Get a cached count result if available and not expired.

Parameters:
  • statement (str) – SQL statement

  • params (dict[str, Any]) – Query parameters

Returns:

Cached count value or None if not found or expired

Return type:

int | None

set(statement, params, count)[source]

Store a count result in the cache.

Parameters:
  • statement (str) – SQL statement

  • params (dict[str, Any]) – Query parameters

  • count (int) – Count value to cache

Return type:

None

clear()[source]

Clear all entries from the cache.

Return type:

None

mcp_server_webcrawl.utils.server module

initialize_mcp_server()[source]
Return type:

None

mcp_server_webcrawl.utils.tools module

get_crawler_tools(sites=None)[source]

Generate crawler tools based on available sites.

Parameters:

sites (list[SiteResult] | None) – Optional list of site results to include in tool descriptions

Returns:

List of Tool objects for sites and resources

Module contents