AI Extraction

Turn any webpage into
structured data

Name: Stack0
Author: Stack0

AI-powered extraction that understands content, not just HTML. No brittle CSS selectors. Define your schema, get consistent output.

Start Extracting View Documentation

$2.00 / 1,000 extractions • Plans start at $5/month

await stack0.extraction.extractAndWait({

url: "https://store.example.com",

schema: { products: [{ name, price, rating }] }

})

<span>Widget...</span>

</div>

{

"name": "Widget Pro"

"price": 49.99

"rating": 4.8

}

import { Stack0 } from '@stack0/sdk'

const stack0 = new Stack0()

const data = await stack0.extraction.extractAndWait({

url: 'https://store.example.com',

schema: {

products: [{ name, price, rating }]

})

// data.products → [{ name, price, rating }, ...]

The problem

CSS selectors break. AI doesn't.

Traditional web scraping is brittle. Sites update their HTML, your selectors break, and your pipeline fails at 3am.

Traditional Scraping

document.querySelector('.product-price')

Site updates class to .price-value

Error: Cannot read property 'textContent' of null

AI Extraction

schema: { price: { type: 'number' } }

Site updates HTML structure

{ "price": 29.99 } — AI understands context

AI extraction understands content semantically. It doesn't matter if the price is in a <span>, <div>, or <p>—the AI finds it. Your schema defines what you want, not where to find it.

Extraction modes

Four ways to extract data

Choose the extraction mode that fits your use case. From fully automatic to schema-driven.

Auto Mode

mode: 'auto'

AI automatically identifies and extracts the most relevant content. Great for articles, blog posts, and product pages.

typescript

const auto = await stack0.extraction.extractAndWait({
  url: 'https://example.com/blog/article',
  mode: 'auto',
})
 
// AI determines what's important
console.log(auto.extractedData)
// { title: '...', content: '...', author: '...', date: '...' }

Markdown Mode

mode: 'markdown'

Converts page content to clean, formatted markdown. Preserves headings, lists, code blocks, and links.

typescript

const markdown = await stack0.extraction.extractAndWait({
  url: 'https://example.com/documentation',
  mode: 'markdown',
  includeLinks: true,
  includeImages: true,
})
 
// Clean markdown output
console.log(markdown.extractedData)
// # Documentation Title\n\nContent in markdown...

Schema Mode

mode: 'schema'

Define your data structure with JSON Schema. Get strongly typed, consistent output every time.

typescript

const product = await stack0.extraction.extractAndWait({
  url: 'https://store.example.com/product/123',
  mode: 'schema',
  schema: {
    type: 'object',
    properties: {
      name: { type: 'string' },
      price: { type: 'number' },
      inStock: { type: 'boolean' },
      rating: { type: 'number' },
    },
  },
})

HTML Mode

mode: 'html'

Returns raw HTML content for custom parsing. Useful when you need full control over extraction logic.

typescript

const html = await stack0.extraction.extractAndWait({
  url: 'https://example.com',
  mode: 'html',
})
 
// Raw HTML for custom processing
console.log(html.extractedData)

Schema extraction

Define your structure. Get consistent output.

JSON Schema support with nested objects, arrays, and all primitive types. Add custom prompts to guide extraction.

Product Data

E-commerce product with specs

typescript

{
  type: 'object',
  properties: {
    name: { type: 'string' },
    price: { type: 'number' },
    currency: { type: 'string' },
    description: { type: 'string' },
    inStock: { type: 'boolean' },
    rating: { type: 'number' },
    reviewCount: { type: 'number' },
    images: {
      type: 'array',
      items: { type: 'string' },
    },
    specifications: {
      type: 'object',
      properties: {
        brand: { type: 'string' },
        model: { type: 'string' },
        weight: { type: 'string' },
      },
    },
  },
}

News/Articles

List of stories from a news page

typescript

{
  type: 'object',
  properties: {
    stories: {
      type: 'array',
      items: {
        type: 'object',
        properties: {
          title: { type: 'string' },
          url: { type: 'string' },
          points: { type: 'number' },
          comments: { type: 'number' },
          author: { type: 'string' },
        },
      },
    },
  },
}

Team/People

Team members with roles and bios

typescript

{
  type: 'object',
  properties: {
    teamMembers: {
      type: 'array',
      items: {
        type: 'object',
        properties: {
          name: { type: 'string' },
          role: { type: 'string' },
          bio: { type: 'string' },
          linkedIn: { type: 'string' },
        },
      },
    },
  },
}

Job Listings

Open positions with requirements

typescript

{
  type: 'object',
  properties: {
    jobs: {
      type: 'array',
      items: {
        type: 'object',
        properties: {
          title: { type: 'string' },
          department: { type: 'string' },
          location: { type: 'string' },
          salary: { type: 'string' },
          requirements: {
            type: 'array',
            items: { type: 'string' },
          },
        },
      },
    },
  },
}

Guide extraction with prompts

Add natural language instructions to help the AI focus on what matters.

typescript

const guided = await stack0.extraction.extractAndWait({
  url: 'https://example.com/team',
  mode: 'schema',
  prompt: 'Extract information about team members, focusing on their roles and technical expertise. Ignore marketing staff.',
  schema: {
    type: 'object',
    properties: {
      engineers: {
        type: 'array',
        items: {
          type: 'object',
          properties: {
            name: { type: 'string' },
            role: { type: 'string' },
            expertise: { type: 'array', items: { type: 'string' } },
          },
        },
      },
    },
  },
})

Use cases

Built for the AI era

From AI agents that need structured world knowledge to research automation and lead enrichment.

For AI Agents

RAG Data Pipelines

Feed structured web content into your retrieval-augmented generation systems

Agent World Knowledge

Give AI agents structured understanding of web pages they visit

Tool Use

Let agents extract data from URLs as part of multi-step workflows

For Research

Price Monitoring

Track competitor pricing across hundreds of products automatically

Market Research

Extract structured data from industry reports and directories

Trend Analysis

Monitor news and social content for emerging patterns

For Lead Gen

Company Enrichment

Extract company details, team info, and tech stack from websites

Contact Discovery

Find team members and their roles from about/team pages

Job Board Parsing

Monitor competitors hiring to understand their growth areas

For Content

News Aggregation

Build custom feeds from multiple sources with consistent structure

Content Migration

Convert web content to markdown for CMS imports

Documentation Sync

Keep external docs in sync with your knowledge base

Advanced features

Handle dynamic content. Process in batch.

Wait for Dynamic Content

Handle SPAs and lazy-loaded content. Wait for elements or timeouts before extracting.

typescript

const dynamic = await stack0.extraction.extractAndWait({
  url: 'https://example.com/spa',
  mode: 'schema',
  waitForSelector: '.content-loaded',
  waitForTimeout: 3000,
  schema: { ... },
})

Batch Processing

Extract from multiple URLs with a shared schema. Process in parallel with webhook notifications.

typescript

const batch = await stack0.extraction.batchAndWait({
  urls: [
    'https://store.example.com/product/1',
    'https://store.example.com/product/2',
    'https://store.example.com/product/3',
  ],
  config: {
    mode: 'schema',
    schema: { name: { type: 'string' }, price: { type: 'number' } },
  },
})

Async with Webhooks

Start extractions and receive results via webhook. Perfect for background processing pipelines.

typescript

// Start extraction (returns immediately)
const { id } = await stack0.extraction.extract({
  url: 'https://example.com',
  mode: 'schema',
  schema: { ... },
  webhookUrl: 'https://yourapp.com/webhook',
  webhookSecret: 'your-secret',
})
 
// Webhook receives:
{
  event: 'extraction.completed',
  data: {
    id: 'ext_abc123',
    status: 'completed',
    extractedData: { ... },
    processingTimeMs: 1840,
  }
}

Reliability

Works when sites change

Semantic Understanding

AI understands content meaning, not just HTML structure

Consistent Output

Schema validation ensures you always get the structure you expect

No Maintenance

No selectors to update when sites change their HTML

Pricing

Simple, usage-based pricing

AI tokens included. No hidden costs for complex pages.

AI Extractions

$2.00/ 1,000 extractions

AI-powered content extraction and parsing.

All extraction modes included

AI tokens included in price

Schema validation

Custom prompts

Page metadata included

Batch processing & webhooks

Get Started

Plans start at $5/month. No long-term contracts.

Stop writing brittle scrapers

Define your schema once, extract structured data from any page. AI-powered extraction that works when sites change.

Get Started Read the docs

Turn any webpage intostructured data

CSS selectors break. AI doesn't.

Four ways to extract data

Auto Mode

Markdown Mode

Schema Mode

HTML Mode

Define your structure. Get consistent output.

Product Data

News/Articles

Team/People

Job Listings

Guide extraction with prompts

Built for the AI era

For AI Agents

RAG Data Pipelines

Agent World Knowledge

Tool Use

For Research

Price Monitoring

Market Research

Trend Analysis

For Lead Gen

Company Enrichment

Contact Discovery

Job Board Parsing

For Content

News Aggregation

Content Migration

Documentation Sync

Handle dynamic content. Process in batch.

Wait for Dynamic Content

Batch Processing

Async with Webhooks

Works when sites change

Semantic Understanding

Consistent Output

No Maintenance

Simple, usage-based pricing

Stop writing brittle scrapers

Turn any webpage into
structured data