AI Extraction

Extract structured data, markdown content, and metadata from any webpage using AI-powered extraction.

Basic Extraction

Extract content from any URL in markdown format:

basic-extraction.ts
import { Stack0 } from '@stack0/sdk'
const stack0 = new Stack0({
apiKey: process.env.STACK0_API_KEY!
})
// Start extraction (async)
const { id, status } = await stack0.extraction.extract({
url: 'https://example.com/article',
mode: 'markdown',
includeMetadata: true,
})
// Poll for completion
const extraction = await stack0.extraction.get({ id })
console.log('Markdown:', extraction.markdown)
console.log('Metadata:', extraction.pageMetadata)

Extract and Wait

Extract content and automatically wait for completion:

extract-and-wait.ts
// This method handles polling automatically
const extraction = await stack0.extraction.extractAndWait({
url: 'https://example.com/article',
mode: 'markdown',
includeMetadata: true,
includeLinks: true,
includeImages: true,
}, {
pollInterval: 1000, // Check every 1 second (default)
timeout: 60000, // Timeout after 60 seconds (default)
})
console.log('Title:', extraction.pageMetadata?.title)
console.log('Description:', extraction.pageMetadata?.description)
console.log('Links found:', extraction.pageMetadata?.links?.length)
console.log('Markdown content:', extraction.markdown)

Extraction Modes

Choose from different extraction modes based on your needs:

ModeDescriptionBest For
markdownAI-cleaned markdown contentArticles, blog posts, documentation
schemaStructured data matching your schemaProduct data, listings, structured info
autoAI determines best extractionUnknown page types
rawRaw HTML contentCustom processing

Schema-Based Extraction

Extract structured data using a JSON schema:

schema-extraction.ts
// Define a schema for the data you want to extract
const extraction = await stack0.extraction.extractAndWait({
url: 'https://example.com/product/iphone-15',
mode: 'schema',
schema: {
type: 'object',
properties: {
name: { type: 'string', description: 'Product name' },
price: { type: 'number', description: 'Price in USD' },
currency: { type: 'string' },
description: { type: 'string' },
rating: { type: 'number', description: 'Average rating out of 5' },
reviewCount: { type: 'number' },
availability: {
type: 'string',
enum: ['in_stock', 'out_of_stock', 'preorder']
},
features: {
type: 'array',
items: { type: 'string' }
},
images: {
type: 'array',
items: { type: 'string', description: 'Image URL' }
}
},
required: ['name', 'price']
}
})
// extractedData matches your schema
const product = extraction.extractedData as {
name: string
price: number
currency: string
description: string
rating: number
reviewCount: number
availability: string
features: string[]
images: string[]
}
console.log(`${product.name}: $${product.price}`)
console.log(`Rating: ${product.rating}/5 (${product.reviewCount} reviews)`)

Prompt-Guided Extraction

Use natural language prompts to guide the AI extraction:

prompt-extraction.ts
const extraction = await stack0.extraction.extractAndWait({
url: 'https://news.example.com/article/tech-layoffs',
mode: 'schema',
// Natural language prompt to guide extraction
prompt: `Extract the main article content. Focus on:
- The main headline and any subheadings
- Key facts and figures mentioned
- Quotes from sources with attribution
- The publication date and author
Ignore ads, navigation, and related articles.`,
schema: {
type: 'object',
properties: {
headline: { type: 'string' },
author: { type: 'string' },
publishDate: { type: 'string' },
summary: { type: 'string', description: '2-3 sentence summary' },
keyFacts: {
type: 'array',
items: { type: 'string' }
},
quotes: {
type: 'array',
items: {
type: 'object',
properties: {
text: { type: 'string' },
source: { type: 'string' }
}
}
}
}
}
})

Extraction Options

Full list of options for extracting content:

OptionTypeDescription
urlstringURL to extract from (required)
mode'auto' | 'schema' | 'markdown' | 'raw'Extraction mode (default: auto)
schemaobjectJSON schema for structured extraction
promptstringNatural language extraction guidance
includeLinksbooleanInclude all links found on page
includeImagesbooleanInclude all image URLs
includeMetadatabooleanInclude page metadata (title, description, og:image)
waitForSelectorstringWait for element before extracting
waitForTimeoutnumberWait time in ms before extracting

Authenticated Extraction

Extract content from pages requiring authentication:

authenticated-extraction.ts
const extraction = await stack0.extraction.extractAndWait({
url: 'https://app.example.com/dashboard',
mode: 'markdown',
// Set authentication headers
headers: {
'Authorization': 'Bearer your-token',
},
// Or use cookies
cookies: [
{ name: 'session', value: 'abc123', domain: 'app.example.com' },
],
// Wait for dynamic content to load
waitForSelector: '.dashboard-content',
})

Batch Extractions

Extract content from multiple URLs in a single batch job:

batch-extractions.ts
// Start a batch extraction job
const batch = await stack0.extraction.batch({
urls: [
'https://example.com/article/1',
'https://example.com/article/2',
'https://example.com/article/3',
],
config: {
mode: 'markdown',
includeMetadata: true,
},
})
console.log('Batch ID:', batch.id)
console.log('Total URLs:', batch.totalUrls)
// Wait for completion
const completedJob = await stack0.extraction.batchAndWait({
urls: [
'https://example.com/product/1',
'https://example.com/product/2',
],
config: {
mode: 'schema',
schema: {
type: 'object',
properties: {
name: { type: 'string' },
price: { type: 'number' }
}
}
}
}, {
pollInterval: 2000,
timeout: 300000, // 5 minutes
})
console.log(`Completed: ${completedJob.successfulUrls} success, ${completedJob.failedUrls} failed`)

Scheduled Extractions

Create recurring extraction schedules:

scheduled-extractions.ts
// Create a daily extraction schedule
const schedule = await stack0.extraction.createSchedule({
name: 'Daily price monitoring',
url: 'https://competitor.com/pricing',
frequency: 'daily',
config: {
mode: 'schema',
schema: {
type: 'object',
properties: {
plans: {
type: 'array',
items: {
type: 'object',
properties: {
name: { type: 'string' },
price: { type: 'number' },
features: { type: 'array', items: { type: 'string' } }
}
}
}
}
}
},
// Get notified when content changes
detectChanges: true,
webhookUrl: 'https://your-app.com/webhook',
})
// List extraction schedules
const { items } = await stack0.extraction.listSchedules({
isActive: true,
})

List & Manage Extractions

Query and manage your extractions:

manage-extractions.ts
// List extractions with filters
const { items, nextCursor } = await stack0.extraction.list({
status: 'completed',
limit: 20,
})
for (const extraction of items) {
console.log(`[${extraction.status}] ${extraction.url}`)
console.log(` Mode: ${extraction.mode}`)
console.log(` Tokens used: ${extraction.tokensUsed}`)
}
// Get a specific extraction
const extraction = await stack0.extraction.get({ id: 'extraction-id' })
// Delete an extraction
await stack0.extraction.delete({ id: 'extraction-id' })

Usage Tracking

Monitor your extraction usage:

usage-tracking.ts
const usage = await stack0.extraction.getUsage({
periodStart: '2024-01-01T00:00:00Z',
periodEnd: '2024-01-31T23:59:59Z',
})
console.log('Extractions:', usage.extractionsTotal)
console.log(' Successful:', usage.extractionsSuccessful)
console.log(' Failed:', usage.extractionsFailed)
console.log(' Tokens used:', usage.extractionTokensUsed)
console.log(' Credits used:', usage.extractionCreditsUsed)