mcp_server_webcrawl.models package

Submodules

mcp_server_webcrawl.models.resources module

class ResourceResultType[source]

Bases: Enum

Enum representing different types of web resources.

UNDEFINED = ''
PAGE = 'html'
FRAME = 'iframe'
IMAGE = 'img'
AUDIO = 'audio'
VIDEO = 'video'
FONT = 'font'
CSS = 'style'
SCRIPT = 'script'
FEED = 'rss'
TEXT = 'text'
PDF = 'pdf'
DOC = 'doc'
OTHER = 'other'
classmethod values()[source]

Return all values of the enum as a list.

Return type:

list[str]

classmethod to_int_map()[source]

Return a dictionary mapping each enum value to its integer position.

Returns:

a dictionary with enum values as keys and their ordinal positions as values.

Return type:

dict

class ResourceResult[source]

Bases: object

Represents a web resource result from a crawl operation.

Initialize a ResourceResult instance.

Parameters:
  • id – resource identifier

  • url – resource URL

  • site – site identifier the resource belongs to

  • crawl – crawl identifier the resource was found in

  • type – type of resource

  • name – resource name

  • headers – HTTP headers

  • content – resource content

  • created – creation timestamp

  • modified – last modification timestamp

  • status – HTTP status code

  • size – size in bytes

  • time – response time in milliseconds

  • thumbnail – base64 encoded thumbnail (experimental)

  • metadata – additional metadata for the resource

__init__(id, url, site=None, crawl=None, type=ResourceResultType.UNDEFINED, name=None, headers=None, content=None, created=None, modified=None, status=None, size=None, time=None, metadata=None)[source]

Initialize a ResourceResult instance.

Parameters:
  • id (int) – resource identifier

  • url (str) – resource URL

  • site (int | None) – site identifier the resource belongs to

  • crawl (int | None) – crawl identifier the resource was found in

  • type (ResourceResultType) – type of resource

  • name (str | None) – resource name

  • headers (str | None) – HTTP headers

  • content (str | None) – resource content

  • created (datetime | None) – creation timestamp

  • modified (datetime | None) – last modification timestamp

  • status (int | None) – HTTP status code

  • size (int | None) – size in bytes

  • time (int | None) – response time in milliseconds

  • thumbnail – base64 encoded thumbnail (experimental)

  • metadata (dict[str, str | int | float | bool | list[str] | list[int] | list[float] | None] | None) – additional metadata for the resource

to_dict()[source]

Convert the object to a dictionary suitable for JSON serialization.

Return type:

dict[str, str | int | float | bool | list[str] | list[int] | list[float] | None]

set_extra(extra_name, extra_value)[source]
Parameters:
  • extra_name (str) –

  • extra_value (str) –

Return type:

None

to_forcefield_dict(forcefields=None)[source]

Create a dictionary with forced fields set to None if not present in the object.

Parameters:

forcefields – list of field names that should be included in the result even if they’re not present in the object data

Returns:

Dictionary containing object data with forced fields included

Return type:

dict[str, str | int | float | bool | list[str] | list[int] | list[float] | None]

mcp_server_webcrawl.models.sites module

class SiteResult[source]

Bases: object

Represents a website or crawl directory result.

Initialize a SiteResult instance.

Parameters:
  • id – site identifier

  • url – site URL

  • path – path to site data, different from datasrc

  • created – creation timestamp

  • modified – last modification timestamp

  • robots – robots.txt content

  • metadata – additional metadata for the site

__init__(id, url=None, path=None, created=None, modified=None, robots=None, metadata=None)[source]

Initialize a SiteResult instance.

Parameters:
  • id (int) – site identifier

  • url (str | None) – site URL

  • path (Path | None) – path to site data, different from datasrc

  • created (datetime | None) – creation timestamp

  • modified (datetime | None) – last modification timestamp

  • robots (str | None) – robots.txt content

  • metadata (dict[str, str | int | float | bool | list[str] | list[int] | list[float] | None] | None) – additional metadata for the site

to_dict()[source]

Convert the object to a dictionary suitable for JSON serialization.

Return type:

dict[str, str | int | float | bool | list[str] | list[int] | list[float] | None]

to_forcefield_dict(forcefields)[source]

Convert the object to a dictionary with specified fields forced to exist.

Creates a dictionary that includes all non-None values from the forcefields list, and ensuring all fields in the forcefields list exist, even if null.

Parameters:

forcefields (list[str]) – list of field names that must appear in the output dictionary with at least a None value

Returns:

Dictionary containing all non-None object attributes, plus forced fields set to None if not already present

Return type:

dict[str, str | int | float | bool | list[str] | list[int] | list[float] | None]

Module contents