AI Extraction
Extract structured data, markdown content, and metadata from any webpage using AI-powered extraction.
Basic Extraction
Extract content from any URL in markdown format:
basic-extraction.ts
import { Stack0 } from '@stack0/sdk'const stack0 = new Stack0({apiKey: process.env.STACK0_API_KEY!})// Start extraction (async)const { id, status } = await stack0.extraction.extract({url: 'https://example.com/article',mode: 'markdown',includeMetadata: true,})// Poll for completionconst extraction = await stack0.extraction.get({ id })console.log('Markdown:', extraction.markdown)console.log('Metadata:', extraction.pageMetadata)
Extract and Wait
Extract content and automatically wait for completion:
extract-and-wait.ts
// This method handles polling automaticallyconst extraction = await stack0.extraction.extractAndWait({url: 'https://example.com/article',mode: 'markdown',includeMetadata: true,includeLinks: true,includeImages: true,}, {pollInterval: 1000, // Check every 1 second (default)timeout: 60000, // Timeout after 60 seconds (default)})console.log('Title:', extraction.pageMetadata?.title)console.log('Description:', extraction.pageMetadata?.description)console.log('Links found:', extraction.pageMetadata?.links?.length)console.log('Markdown content:', extraction.markdown)
Extraction Modes
Choose from different extraction modes based on your needs:
| Mode | Description | Best For |
|---|---|---|
| markdown | AI-cleaned markdown content | Articles, blog posts, documentation |
| schema | Structured data matching your schema | Product data, listings, structured info |
| auto | AI determines best extraction | Unknown page types |
| raw | Raw HTML content | Custom processing |
Schema-Based Extraction
Extract structured data using a JSON schema:
schema-extraction.ts
// Define a schema for the data you want to extractconst extraction = await stack0.extraction.extractAndWait({url: 'https://example.com/product/iphone-15',mode: 'schema',schema: {type: 'object',properties: {name: { type: 'string', description: 'Product name' },price: { type: 'number', description: 'Price in USD' },currency: { type: 'string' },description: { type: 'string' },rating: { type: 'number', description: 'Average rating out of 5' },reviewCount: { type: 'number' },availability: {type: 'string',enum: ['in_stock', 'out_of_stock', 'preorder']},features: {type: 'array',items: { type: 'string' }},images: {type: 'array',items: { type: 'string', description: 'Image URL' }}},required: ['name', 'price']}})// extractedData matches your schemaconst product = extraction.extractedData as {name: stringprice: numbercurrency: stringdescription: stringrating: numberreviewCount: numberavailability: stringfeatures: string[]images: string[]}console.log(`${product.name}: $${product.price}`)console.log(`Rating: ${product.rating}/5 (${product.reviewCount} reviews)`)
Prompt-Guided Extraction
Use natural language prompts to guide the AI extraction:
prompt-extraction.ts
const extraction = await stack0.extraction.extractAndWait({url: 'https://news.example.com/article/tech-layoffs',mode: 'schema',// Natural language prompt to guide extractionprompt: `Extract the main article content. Focus on:- The main headline and any subheadings- Key facts and figures mentioned- Quotes from sources with attribution- The publication date and authorIgnore ads, navigation, and related articles.`,schema: {type: 'object',properties: {headline: { type: 'string' },author: { type: 'string' },publishDate: { type: 'string' },summary: { type: 'string', description: '2-3 sentence summary' },keyFacts: {type: 'array',items: { type: 'string' }},quotes: {type: 'array',items: {type: 'object',properties: {text: { type: 'string' },source: { type: 'string' }}}}}}})
Extraction Options
Full list of options for extracting content:
| Option | Type | Description |
|---|---|---|
| url | string | URL to extract from (required) |
| mode | 'auto' | 'schema' | 'markdown' | 'raw' | Extraction mode (default: auto) |
| schema | object | JSON schema for structured extraction |
| prompt | string | Natural language extraction guidance |
| includeLinks | boolean | Include all links found on page |
| includeImages | boolean | Include all image URLs |
| includeMetadata | boolean | Include page metadata (title, description, og:image) |
| waitForSelector | string | Wait for element before extracting |
| waitForTimeout | number | Wait time in ms before extracting |
Authenticated Extraction
Extract content from pages requiring authentication:
authenticated-extraction.ts
const extraction = await stack0.extraction.extractAndWait({url: 'https://app.example.com/dashboard',mode: 'markdown',// Set authentication headersheaders: {'Authorization': 'Bearer your-token',},// Or use cookiescookies: [{ name: 'session', value: 'abc123', domain: 'app.example.com' },],// Wait for dynamic content to loadwaitForSelector: '.dashboard-content',})
Batch Extractions
Extract content from multiple URLs in a single batch job:
batch-extractions.ts
// Start a batch extraction jobconst batch = await stack0.extraction.batch({urls: ['https://example.com/article/1','https://example.com/article/2','https://example.com/article/3',],config: {mode: 'markdown',includeMetadata: true,},})console.log('Batch ID:', batch.id)console.log('Total URLs:', batch.totalUrls)// Wait for completionconst completedJob = await stack0.extraction.batchAndWait({urls: ['https://example.com/product/1','https://example.com/product/2',],config: {mode: 'schema',schema: {type: 'object',properties: {name: { type: 'string' },price: { type: 'number' }}}}}, {pollInterval: 2000,timeout: 300000, // 5 minutes})console.log(`Completed: ${completedJob.successfulUrls} success, ${completedJob.failedUrls} failed`)
Scheduled Extractions
Create recurring extraction schedules:
scheduled-extractions.ts
// Create a daily extraction scheduleconst schedule = await stack0.extraction.createSchedule({name: 'Daily price monitoring',url: 'https://competitor.com/pricing',frequency: 'daily',config: {mode: 'schema',schema: {type: 'object',properties: {plans: {type: 'array',items: {type: 'object',properties: {name: { type: 'string' },price: { type: 'number' },features: { type: 'array', items: { type: 'string' } }}}}}}},// Get notified when content changesdetectChanges: true,webhookUrl: 'https://your-app.com/webhook',})// List extraction schedulesconst { items } = await stack0.extraction.listSchedules({isActive: true,})
List & Manage Extractions
Query and manage your extractions:
manage-extractions.ts
// List extractions with filtersconst { items, nextCursor } = await stack0.extraction.list({status: 'completed',limit: 20,})for (const extraction of items) {console.log(`[${extraction.status}] ${extraction.url}`)console.log(` Mode: ${extraction.mode}`)console.log(` Tokens used: ${extraction.tokensUsed}`)}// Get a specific extractionconst extraction = await stack0.extraction.get({ id: 'extraction-id' })// Delete an extractionawait stack0.extraction.delete({ id: 'extraction-id' })
Usage Tracking
Monitor your extraction usage:
usage-tracking.ts
const usage = await stack0.extraction.getUsage({periodStart: '2024-01-01T00:00:00Z',periodEnd: '2024-01-31T23:59:59Z',})console.log('Extractions:', usage.extractionsTotal)console.log(' Successful:', usage.extractionsSuccessful)console.log(' Failed:', usage.extractionsFailed)console.log(' Tokens used:', usage.extractionTokensUsed)console.log(' Credits used:', usage.extractionCreditsUsed)