AI Extraction

Extract structured data, markdown content, and metadata from any webpage using AI-powered extraction.

Basic Extraction

Extract content from any URL in markdown format:

basic-extraction.ts

import { Stack0 } from '@stack0/sdk'
 
const stack0 = new Stack0({
  apiKey: process.env.STACK0_API_KEY!
})
 
// Start extraction (async)
const { id, status } = await stack0.extraction.extract({
  url: 'https://example.com/article',
  mode: 'markdown',
  includeMetadata: true,
})
 
// Poll for completion
const extraction = await stack0.extraction.get({ id })
console.log('Markdown:', extraction.markdown)
console.log('Metadata:', extraction.pageMetadata)

Extract and Wait

Extract content and automatically wait for completion:

extract-and-wait.ts

// This method handles polling automatically
const extraction = await stack0.extraction.extractAndWait({
  url: 'https://example.com/article',
  mode: 'markdown',
  includeMetadata: true,
  includeLinks: true,
  includeImages: true,
}, {
  pollInterval: 1000, // Check every 1 second (default)
  timeout: 60000,     // Timeout after 60 seconds (default)
})
 
console.log('Title:', extraction.pageMetadata?.title)
console.log('Description:', extraction.pageMetadata?.description)
console.log('Links found:', extraction.pageMetadata?.links?.length)
console.log('Markdown content:', extraction.markdown)

Extraction Modes

Choose from different extraction modes based on your needs:

Mode	Description	Best For
markdown	AI-cleaned markdown content	Articles, blog posts, documentation
schema	Structured data matching your schema	Product data, listings, structured info
auto	AI determines best extraction	Unknown page types
raw	Raw HTML content	Custom processing

Schema-Based Extraction

Extract structured data using a JSON schema:

schema-extraction.ts

// Define a schema for the data you want to extract
const extraction = await stack0.extraction.extractAndWait({
  url: 'https://example.com/product/iphone-15',
  mode: 'schema',
  schema: {
    type: 'object',
    properties: {
      name: { type: 'string', description: 'Product name' },
      price: { type: 'number', description: 'Price in USD' },
      currency: { type: 'string' },
      description: { type: 'string' },
      rating: { type: 'number', description: 'Average rating out of 5' },
      reviewCount: { type: 'number' },
      availability: {
        type: 'string',
        enum: ['in_stock', 'out_of_stock', 'preorder']
      },
      features: {
        type: 'array',
        items: { type: 'string' }
      },
      images: {
        type: 'array',
        items: { type: 'string', description: 'Image URL' }
      }
    },
    required: ['name', 'price']
  }
})
 
// extractedData matches your schema
const product = extraction.extractedData as {
  name: string
  price: number
  currency: string
  description: string
  rating: number
  reviewCount: number
  availability: string
  features: string[]
  images: string[]
}
 
console.log(`${product.name}: $${product.price}`)
console.log(`Rating: ${product.rating}/5 (${product.reviewCount} reviews)`)

Prompt-Guided Extraction

Use natural language prompts to guide the AI extraction:

prompt-extraction.ts

const extraction = await stack0.extraction.extractAndWait({
  url: 'https://news.example.com/article/tech-layoffs',
  mode: 'schema',
  // Natural language prompt to guide extraction
  prompt: `Extract the main article content. Focus on:
    - The main headline and any subheadings
    - Key facts and figures mentioned
    - Quotes from sources with attribution
    - The publication date and author
    Ignore ads, navigation, and related articles.`,
  schema: {
    type: 'object',
    properties: {
      headline: { type: 'string' },
      author: { type: 'string' },
      publishDate: { type: 'string' },
      summary: { type: 'string', description: '2-3 sentence summary' },
      keyFacts: {
        type: 'array',
        items: { type: 'string' }
      },
      quotes: {
        type: 'array',
        items: {
          type: 'object',
          properties: {
            text: { type: 'string' },
            source: { type: 'string' }
          }
        }
      }
    }
  }
})

Extraction Options

Full list of options for extracting content:

Option	Type	Description
url	string	URL to extract from (required)
mode	'auto' \| 'schema' \| 'markdown' \| 'raw'	Extraction mode (default: auto)
schema	object	JSON schema for structured extraction
prompt	string	Natural language extraction guidance
includeLinks	boolean	Include all links found on page
includeImages	boolean	Include all image URLs
includeMetadata	boolean	Include page metadata (title, description, og:image)
waitForSelector	string	Wait for element before extracting
waitForTimeout	number	Wait time in ms before extracting

Authenticated Extraction

Extract content from pages requiring authentication:

authenticated-extraction.ts

const extraction = await stack0.extraction.extractAndWait({
  url: 'https://app.example.com/dashboard',
  mode: 'markdown',
  // Set authentication headers
  headers: {
    'Authorization': 'Bearer your-token',
  },
  // Or use cookies
  cookies: [
    { name: 'session', value: 'abc123', domain: 'app.example.com' },
  ],
  // Wait for dynamic content to load
  waitForSelector: '.dashboard-content',
})

Batch Extractions

Extract content from multiple URLs in a single batch job:

batch-extractions.ts

// Start a batch extraction job
const batch = await stack0.extraction.batch({
  urls: [
    'https://example.com/article/1',
    'https://example.com/article/2',
    'https://example.com/article/3',
  ],
  config: {
    mode: 'markdown',
    includeMetadata: true,
  },
})
 
console.log('Batch ID:', batch.id)
console.log('Total URLs:', batch.totalUrls)
 
// Wait for completion
const completedJob = await stack0.extraction.batchAndWait({
  urls: [
    'https://example.com/product/1',
    'https://example.com/product/2',
  ],
  config: {
    mode: 'schema',
    schema: {
      type: 'object',
      properties: {
        name: { type: 'string' },
        price: { type: 'number' }
      }
    }
  }
}, {
  pollInterval: 2000,
  timeout: 300000, // 5 minutes
})
 
console.log(`Completed: ${completedJob.successfulUrls} success, ${completedJob.failedUrls} failed`)

Scheduled Extractions

Create recurring extraction schedules:

scheduled-extractions.ts

// Create a daily extraction schedule
const schedule = await stack0.extraction.createSchedule({
  name: 'Daily price monitoring',
  url: 'https://competitor.com/pricing',
  frequency: 'daily',
  config: {
    mode: 'schema',
    schema: {
      type: 'object',
      properties: {
        plans: {
          type: 'array',
          items: {
            type: 'object',
            properties: {
              name: { type: 'string' },
              price: { type: 'number' },
              features: { type: 'array', items: { type: 'string' } }
            }
          }
        }
      }
    }
  },
  // Get notified when content changes
  detectChanges: true,
  webhookUrl: 'https://your-app.com/webhook',
})
 
// List extraction schedules
const { items } = await stack0.extraction.listSchedules({
  isActive: true,
})

List & Manage Extractions

Query and manage your extractions:

manage-extractions.ts

// List extractions with filters
const { items, nextCursor } = await stack0.extraction.list({
  status: 'completed',
  limit: 20,
})
 
for (const extraction of items) {
  console.log(`[${extraction.status}] ${extraction.url}`)
  console.log(`  Mode: ${extraction.mode}`)
  console.log(`  Tokens used: ${extraction.tokensUsed}`)
}
 
// Get a specific extraction
const extraction = await stack0.extraction.get({ id: 'extraction-id' })
 
// Delete an extraction
await stack0.extraction.delete({ id: 'extraction-id' })

Usage Tracking

Monitor your extraction usage:

usage-tracking.ts

const usage = await stack0.extraction.getUsage({
  periodStart: '2024-01-01T00:00:00Z',
  periodEnd: '2024-01-31T23:59:59Z',
})
 
console.log('Extractions:', usage.extractionsTotal)
console.log('  Successful:', usage.extractionsSuccessful)
console.log('  Failed:', usage.extractionsFailed)
console.log('  Tokens used:', usage.extractionTokensUsed)
console.log('  Credits used:', usage.extractionCreditsUsed)

Screenshots Extraction API Reference Error Handling