Crawl API Reference

Recursively crawl a site, honor robots.txt, and return clean markdown for every page. An open Firecrawl alternative.

Base URL

base-url
https://api.stack0.dev/v1
POST/webdata/crawl

Start a crawl job.

start-crawl
curl -X POST https://api.stack0.dev/v1/webdata/crawl \
-H "Authorization: Bearer sk_live_..." \
-H "Content-Type: application/json" \
-d '{
"url": "https://docs.example.com",
"maxDepth": 3,
"maxPages": 200,
"formats": ["markdown", "links"],
"respectRobotsTxt": true,
"stealth": true
}'
response
{
"id": "c5f9…",
"status": "pending"
}
GET/webdata/crawl/{id}?includePages=true

Get crawl status and, optionally, every scraped page.

GET/webdata/crawl/{crawlId}/pages

Paginate the pages for a crawl.

POST/webdata/crawl/{id}/cancel

Cancel a pending or running crawl.

SDK

crawl-sdk.ts
import { Stack0 } from "@stack0/sdk";
const stack0 = new Stack0({ apiKey: process.env.STACK0_API_KEY! });
const crawl = await stack0.crawl.startAndWait({
url: "https://docs.example.com",
maxDepth: 3,
maxPages: 200,
formats: ["markdown", "links"],
});
for (const page of crawl.pages ?? []) {
console.log(page.url, page.markdown?.slice(0, 120));
}