AI Extraction

Turn any webpage into
structured data

AI-powered extraction that understands content, not just HTML. No brittle CSS selectors. Define your schema, get consistent output.

$2.00 / 1,000 extractions • Plans start at $5/month

await stack0.extraction.extractAndWait({
url: "https://store.example.com",
schema: { products: [{ name, price, rating }] }
})
<div class="prod-item">
<span>Widget...</span>
<b>$49.99</b>
</div>
AI
{
"name": "Widget Pro"
"price": 49.99
"rating": 4.8
}
1
2
3
4
5
6
7
8
9
10
11
12
import { Stack0 } from '@stack0/sdk'
const stack0 = new Stack0()
const data = await stack0.extraction.extractAndWait({
url: 'https://store.example.com',
schema: {
products: [{ name, price, rating }]
},
})
// data.products → [{ name, price, rating }, ...]

The problem

CSS selectors break. AI doesn't.

Traditional web scraping is brittle. Sites update their HTML, your selectors break, and your pipeline fails at 3am.

Traditional Scraping
document.querySelector('.product-price')
Site updates class to .price-value
Error: Cannot read property 'textContent' of null
AI Extraction
schema: { price: { type: 'number' } }
Site updates HTML structure
{ "price": 29.99 } — AI understands context

AI extraction understands content semantically. It doesn't matter if the price is in a <span>, <div>, or <p>—the AI finds it. Your schema defines what you want, not where to find it.

Extraction modes

Four ways to extract data

Choose the extraction mode that fits your use case. From fully automatic to schema-driven.

Auto Mode

mode: 'auto'

AI automatically identifies and extracts the most relevant content. Great for articles, blog posts, and product pages.

typescript
const auto = await stack0.extraction.extractAndWait({
url: 'https://example.com/blog/article',
mode: 'auto',
})
// AI determines what's important
console.log(auto.extractedData)
// { title: '...', content: '...', author: '...', date: '...' }

Markdown Mode

mode: 'markdown'

Converts page content to clean, formatted markdown. Preserves headings, lists, code blocks, and links.

typescript
const markdown = await stack0.extraction.extractAndWait({
url: 'https://example.com/documentation',
mode: 'markdown',
includeLinks: true,
includeImages: true,
})
// Clean markdown output
console.log(markdown.extractedData)
// # Documentation Title\n\nContent in markdown...

Schema Mode

mode: 'schema'

Define your data structure with JSON Schema. Get strongly typed, consistent output every time.

typescript
const product = await stack0.extraction.extractAndWait({
url: 'https://store.example.com/product/123',
mode: 'schema',
schema: {
type: 'object',
properties: {
name: { type: 'string' },
price: { type: 'number' },
inStock: { type: 'boolean' },
rating: { type: 'number' },
},
},
})

HTML Mode

mode: 'html'

Returns raw HTML content for custom parsing. Useful when you need full control over extraction logic.

typescript
const html = await stack0.extraction.extractAndWait({
url: 'https://example.com',
mode: 'html',
})
// Raw HTML for custom processing
console.log(html.extractedData)

Schema extraction

Define your structure. Get consistent output.

JSON Schema support with nested objects, arrays, and all primitive types. Add custom prompts to guide extraction.

Product Data

E-commerce product with specs

typescript
{
type: 'object',
properties: {
name: { type: 'string' },
price: { type: 'number' },
currency: { type: 'string' },
description: { type: 'string' },
inStock: { type: 'boolean' },
rating: { type: 'number' },
reviewCount: { type: 'number' },
images: {
type: 'array',
items: { type: 'string' },
},
specifications: {
type: 'object',
properties: {
brand: { type: 'string' },
model: { type: 'string' },
weight: { type: 'string' },
},
},
},
}

News/Articles

List of stories from a news page

typescript
{
type: 'object',
properties: {
stories: {
type: 'array',
items: {
type: 'object',
properties: {
title: { type: 'string' },
url: { type: 'string' },
points: { type: 'number' },
comments: { type: 'number' },
author: { type: 'string' },
},
},
},
},
}

Team/People

Team members with roles and bios

typescript
{
type: 'object',
properties: {
teamMembers: {
type: 'array',
items: {
type: 'object',
properties: {
name: { type: 'string' },
role: { type: 'string' },
bio: { type: 'string' },
linkedIn: { type: 'string' },
},
},
},
},
}

Job Listings

Open positions with requirements

typescript
{
type: 'object',
properties: {
jobs: {
type: 'array',
items: {
type: 'object',
properties: {
title: { type: 'string' },
department: { type: 'string' },
location: { type: 'string' },
salary: { type: 'string' },
requirements: {
type: 'array',
items: { type: 'string' },
},
},
},
},
},
}

Guide extraction with prompts

Add natural language instructions to help the AI focus on what matters.

typescript
const guided = await stack0.extraction.extractAndWait({
url: 'https://example.com/team',
mode: 'schema',
prompt: 'Extract information about team members, focusing on their roles and technical expertise. Ignore marketing staff.',
schema: {
type: 'object',
properties: {
engineers: {
type: 'array',
items: {
type: 'object',
properties: {
name: { type: 'string' },
role: { type: 'string' },
expertise: { type: 'array', items: { type: 'string' } },
},
},
},
},
},
})

Use cases

Built for the AI era

From AI agents that need structured world knowledge to research automation and lead enrichment.

For AI Agents

RAG Data Pipelines

Feed structured web content into your retrieval-augmented generation systems

Agent World Knowledge

Give AI agents structured understanding of web pages they visit

Tool Use

Let agents extract data from URLs as part of multi-step workflows

For Research

Price Monitoring

Track competitor pricing across hundreds of products automatically

Market Research

Extract structured data from industry reports and directories

Trend Analysis

Monitor news and social content for emerging patterns

For Lead Gen

Company Enrichment

Extract company details, team info, and tech stack from websites

Contact Discovery

Find team members and their roles from about/team pages

Job Board Parsing

Monitor competitors hiring to understand their growth areas

For Content

News Aggregation

Build custom feeds from multiple sources with consistent structure

Content Migration

Convert web content to markdown for CMS imports

Documentation Sync

Keep external docs in sync with your knowledge base

Advanced features

Handle dynamic content. Process in batch.

Wait for Dynamic Content

Handle SPAs and lazy-loaded content. Wait for elements or timeouts before extracting.

typescript
const dynamic = await stack0.extraction.extractAndWait({
url: 'https://example.com/spa',
mode: 'schema',
waitForSelector: '.content-loaded',
waitForTimeout: 3000,
schema: { ... },
})

Batch Processing

Extract from multiple URLs with a shared schema. Process in parallel with webhook notifications.

typescript
const batch = await stack0.extraction.batchAndWait({
urls: [
'https://store.example.com/product/1',
'https://store.example.com/product/2',
'https://store.example.com/product/3',
],
config: {
mode: 'schema',
schema: { name: { type: 'string' }, price: { type: 'number' } },
},
})

Async with Webhooks

Start extractions and receive results via webhook. Perfect for background processing pipelines.

typescript
// Start extraction (returns immediately)
const { id } = await stack0.extraction.extract({
url: 'https://example.com',
mode: 'schema',
schema: { ... },
webhookUrl: 'https://yourapp.com/webhook',
webhookSecret: 'your-secret',
})
// Webhook receives:
{
event: 'extraction.completed',
data: {
id: 'ext_abc123',
status: 'completed',
extractedData: { ... },
processingTimeMs: 1840,
}
}

Reliability

Works when sites change

Semantic Understanding

AI understands content meaning, not just HTML structure

Consistent Output

Schema validation ensures you always get the structure you expect

No Maintenance

No selectors to update when sites change their HTML

Pricing

Simple, usage-based pricing

AI tokens included. No hidden costs for complex pages.

AI Extractions
$2.00/ 1,000 extractions

AI-powered content extraction and parsing.

All extraction modes included
AI tokens included in price
Schema validation
Custom prompts
Page metadata included
Batch processing & webhooks
Get Started

Plans start at $5/month. No long-term contracts.

Stop writing brittle scrapers

Define your schema once, extract structured data from any page. AI-powered extraction that works when sites change.