mcp_server_webcrawl.models package
Submodules
mcp_server_webcrawl.models.resources module
- class ResourceResultType[source]
Bases:
Enum
Enum representing different types of web resources.
- UNDEFINED = ''
- PAGE = 'html'
- FRAME = 'iframe'
- IMAGE = 'img'
- AUDIO = 'audio'
- VIDEO = 'video'
- FONT = 'font'
- CSS = 'style'
- SCRIPT = 'script'
- FEED = 'rss'
- TEXT = 'text'
- PDF = 'pdf'
- DOC = 'doc'
- OTHER = 'other'
- class ResourceResult[source]
Bases:
object
Represents a web resource result from a crawl operation.
Initialize a ResourceResult instance.
- Parameters:
id – Resource identifier
url – Resource URL
site – Site identifier the resource belongs to
crawl – Crawl identifier the resource was found in
type – Type of resource
name – Resource name
headers – HTTP headers
content – Resource content
created – Creation timestamp
modified – Last modification timestamp
status – HTTP status code
size – Size in bytes
time – Response time in milliseconds
thumbnail – Base64 encoded thumbnail (experimental)
metadata – Additional metadata for the resource
- __init__(id, url, site=None, crawl=None, type=ResourceResultType.UNDEFINED, name=None, headers=None, content=None, created=None, modified=None, status=None, size=None, time=None, metadata=None)[source]
Initialize a ResourceResult instance.
- Parameters:
id (int) – Resource identifier
url (str) – Resource URL
site (int | None) – Site identifier the resource belongs to
crawl (int | None) – Crawl identifier the resource was found in
type (ResourceResultType) – Type of resource
name (str | None) – Resource name
headers (str | None) – HTTP headers
content (str | None) – Resource content
created (datetime | None) – Creation timestamp
modified (datetime | None) – Last modification timestamp
status (int | None) – HTTP status code
size (int | None) – Size in bytes
time (int | None) – Response time in milliseconds
thumbnail – Base64 encoded thumbnail (experimental)
metadata (dict[str, str | int | float | bool | list[str] | list[int] | list[float] | None] | None) – Additional metadata for the resource
mcp_server_webcrawl.models.sites module
- class SiteResult[source]
Bases:
object
Represents a website or crawl directory result.
Initialize a SiteResult instance.
- Parameters:
id – Site identifier
url – Site URL
created – Creation timestamp
modified – Last modification timestamp
robots – Robots.txt content
metadata – Additional metadata for the site
- __init__(id, url=None, created=None, modified=None, robots=None, metadata=None)[source]
Initialize a SiteResult instance.
- Parameters:
id (int) – Site identifier
url (str | None) – Site URL
created (datetime | None) – Creation timestamp
modified (datetime | None) – Last modification timestamp
robots (str | None) – Robots.txt content
metadata (dict[str, str | int | float | bool | list[str] | list[int] | list[float] | None] | None) – Additional metadata for the site
- to_forcefield_dict(forcefields)[source]
Convert the object to a dictionary with specified fields forced to exist.
Creates a dictionary that includes all non-None values from the forcefields list, and ensuring all fields in the forcefields list exist, even if null.
- Parameters:
forcefields (list[str]) – List of field names that must appear in the output dictionary with at least a None value
- Returns:
Dictionary containing all non-None object attributes, plus forced fields set to None if not already present
- Return type:
dict[str, str | int | float | bool | list[str] | list[int] | list[float] | None]