Extraction API Reference

Complete REST API reference for AI-powered data extraction.

Base URL

base-url
https://api.stack0.dev/v1

Authentication

All API requests require authentication using your API key:

authentication
curl -X POST https://api.stack0.dev/v1/webdata/extractions \
-H "Authorization: Bearer sk_live_your_api_key" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "mode": "markdown"}'
POST/webdata/extractions

Create a new AI extraction request.

Request Body

FieldTypeDescription
urlstringURL to extract from (required)
modestringauto, schema, markdown, or raw
schemaobjectJSON schema for structured extraction
promptstringNatural language guidance
includeLinksbooleanInclude page links
includeImagesbooleanInclude image URLs
includeMetadatabooleanInclude page metadata
waitForSelectorstringWait for element
waitForTimeoutnumberWait time in ms
create-extraction
curl -X POST https://api.stack0.dev/v1/webdata/extractions \
-H "Authorization: Bearer sk_live_xxx" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/product",
"mode": "schema",
"schema": {
"type": "object",
"properties": {
"name": { "type": "string" },
"price": { "type": "number" },
"description": { "type": "string" }
}
},
"includeMetadata": true
}'
# Response
{
"id": "ext_abc123",
"status": "pending"
}
GET/webdata/extractions/:id

Get an extraction by ID.

get-extraction
curl https://api.stack0.dev/v1/webdata/extractions/ext_abc123 \
-H "Authorization: Bearer sk_live_xxx"
# Response
{
"id": "ext_abc123",
"url": "https://example.com/product",
"mode": "schema",
"status": "completed",
"extractedData": {
"name": "Product Name",
"price": 99.99,
"description": "Product description..."
},
"markdown": null,
"pageMetadata": {
"title": "Product Name - Example Store",
"description": "Buy Product Name...",
"ogImage": "https://example.com/product.jpg"
},
"tokensUsed": 1250,
"processingTimeMs": 3500,
"createdAt": "2024-01-15T10:30:00Z",
"completedAt": "2024-01-15T10:30:03Z"
}
GET/webdata/extractions

List extractions with optional filters.

Query Parameters

ParameterTypeDescription
statusstringFilter by status
urlstringFilter by URL
limitnumberMax results (default 20)
cursorstringPagination cursor
list-extractions
curl "https://api.stack0.dev/v1/webdata/extractions?status=completed&limit=10" \
-H "Authorization: Bearer sk_live_xxx"
# Response
{
"items": [...],
"nextCursor": "cursor_xyz"
}
DELETE/webdata/extractions/:id

Delete an extraction.

delete-extraction
curl -X DELETE https://api.stack0.dev/v1/webdata/extractions/ext_abc123 \
-H "Authorization: Bearer sk_live_xxx"
# Response
{ "success": true }

Batch Extractions

POST/webdata/batch/extractions

Create a batch extraction job for multiple URLs.

batch-extractions
curl -X POST https://api.stack0.dev/v1/webdata/batch/extractions \
-H "Authorization: Bearer sk_live_xxx" \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://example.com/article/1",
"https://example.com/article/2"
],
"config": {
"mode": "markdown",
"includeMetadata": true
}
}'
# Response
{
"id": "batch_abc123",
"status": "pending",
"totalUrls": 2
}
GET/webdata/batch/:id

Get a batch job by ID.

get-batch-job
curl https://api.stack0.dev/v1/webdata/batch/batch_abc123 \
-H "Authorization: Bearer sk_live_xxx"
# Response
{
"id": "batch_abc123",
"type": "extraction",
"status": "completed",
"totalUrls": 2,
"processedUrls": 2,
"successfulUrls": 2,
"failedUrls": 0,
"createdAt": "2024-01-15T10:30:00Z",
"completedAt": "2024-01-15T10:31:00Z"
}
POST/webdata/batch/:id/cancel

Cancel a running batch job.

cancel-batch-job
curl -X POST https://api.stack0.dev/v1/webdata/batch/batch_abc123/cancel \
-H "Authorization: Bearer sk_live_xxx"
# Response
{ "success": true }

Scheduled Extractions

POST/webdata/schedules

Create a recurring extraction schedule.

create-schedule
curl -X POST https://api.stack0.dev/v1/webdata/schedules \
-H "Authorization: Bearer sk_live_xxx" \
-H "Content-Type: application/json" \
-d '{
"name": "Daily price monitoring",
"url": "https://competitor.com/pricing",
"type": "extraction",
"frequency": "daily",
"config": {
"mode": "schema",
"schema": {
"type": "object",
"properties": {
"price": { "type": "number" },
"features": { "type": "array" }
}
}
},
"detectChanges": true
}'
# Response
{ "id": "sch_abc123" }
POST/webdata/schedules/:id/toggle

Toggle a schedule on or off.

toggle-schedule
curl -X POST https://api.stack0.dev/v1/webdata/schedules/sch_abc123/toggle \
-H "Authorization: Bearer sk_live_xxx"
# Response
{ "isActive": false }

Extraction Modes

ModeDescription
schemaExtract data matching your JSON schema
markdownConvert content to clean markdown
autoAI determines best extraction approach
rawReturn raw HTML content

Status Values

StatusDescription
pendingJob queued, waiting to start
processingJob is currently running
completedJob finished successfully
failedJob encountered an error