Optional
allow_Option for allowing the crawl to follow links to external websites.
Optional
body_Text strings to remove from body when creating chunks for each page
Optional
boost_Boost titles such that keyword matches in titles are prioritized in search results. Strongly recommended to leave this on. Defaults to true.
Optional
exclude_URL Patterns to exclude from the crawl
Optional
exclude_Specify the HTML tags, classes and ids to exclude from the response.
Optional
heading_Text strings to remove from headings when creating chunks for each page
Optional
ignore_Ignore the website sitemap when crawling, defaults to true.
Optional
include_URL Patterns to include in the crawl
Optional
include_Specify the HTML tags, classes and ids to include in the response.
Optional
interval?: CrawlInterval | nullOptional
limit?: number | nullHow many pages to crawl, defaults to 1000
Optional
scrape_Optional
site_The URL to crawl
Optional
webhook_Metadata to send back with the webhook call for each successful page scrape
Optional
webhook_Host to call back on the webhook for each successful page scrape
Options for setting up the crawl which will populate the dataset.