SDK Auto-Extraction: Zero-Config Usage Guide¶

Overview¶

RunForge SDK now supports zero-configuration automatic extraction of tokens, costs, and metadata from LLM API responses. Users simply wrap their existing LLM calls with runforge.track() and all metrics flow automatically to the dashboard.

Quick Start¶

TypeScript/JavaScript¶

import { RunForge } from '@runforge/sdk-ts'

// Initialize with your API key
const runforge = new RunForge({ 
  apiKey: process.env.RUNFORGE_API_KEY,
  projectId: 'your-project-id'
})

// Wrap any LLM call - everything else is automatic!
const result = await runforge.track({ experiment: 'chat-v2' }, () =>
  openai.chat.completions.create({ 
    model: 'gpt-4o-mini', 
    messages: [{ role: 'user', content: 'Hello!' }] 
  })
)
// ✅ Tokens, costs, latency automatically tracked

Python¶

from runforge import RunForge

# Initialize with your API key
runforge = RunForge(
    api_key=os.environ['RUNFORGE_API_KEY'],
    project_id='your-project-id'
)

# Wrap any LLM call - everything else is automatic!
result = runforge.track(
    {"experiment": "chat-v2"},
    lambda: openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello!"}]
    )
)
# ✅ Tokens, costs, latency automatically tracked

Supported Providers¶

🥇 OpenRouter (Highest Accuracy)¶

Direct cost extraction from usage.total_cost
Real-time pricing from provider
No estimation required

// OpenRouter calls automatically extract exact costs
const result = await runforge.track({ model: 'openai/gpt-4o-mini' }, () =>
  openrouter.chat.completions.create({ 
    model: 'openai/gpt-4o-mini', 
    messages 
  })
)
// Cost comes directly from OpenRouter - 100% accurate

🤖 OpenAI Direct¶

Token extraction from usage.prompt_tokens/completion_tokens
Server-side cost calculation using pricing registry
Streaming support with stream_options.include_usage

// OpenAI calls extract tokens and calculate costs
const result = await runforge.track({}, () =>
  openai.chat.completions.create({ 
    model: 'gpt-4o-mini', 
    messages,
    stream_options: { include_usage: true } // For streaming
  })
)
// Tokens extracted, cost calculated automatically

🧠 Anthropic Direct¶

Token extraction from usage.input_tokens/output_tokens
Server-side cost calculation using pricing registry
Model-specific pricing for Claude variants

// Anthropic calls extract tokens and calculate costs
const result = await runforge.track({}, () =>
  anthropic.messages.create({
    model: 'claude-3-5-sonnet-20241022',
    messages
  })
)
// Input/output tokens extracted, cost calculated automatically

How It Works¶

1. Automatic Provider Detection¶

// Provider detected from model name
'openai/gpt-4o-mini' → 'openai'
'gpt-4o-mini'        → 'openai'  
'claude-3-opus'      → 'anthropic'

2. Usage Data Extraction¶

OpenRouter: usage.total_cost (exact from provider)
OpenAI: usage.prompt_tokens + usage.completion_tokens
Anthropic: usage.input_tokens + usage.output_tokens

3. Server-Side Cost Verification¶

OpenRouter costs trusted as-is (costSource: "provider")
Other providers recalculated server-side (costSource: "catalog")
Unknown models marked as estimated (costEstimated: true)

4. Privacy-First Design¶

Never stores prompts or responses
Only extracts usage metadata
Safe for sensitive workloads

Advanced Usage¶

Custom Metadata¶

const result = await runforge.track({
  experiment: 'chat-v2',
  user_id: 'user123',
  temperature: 0.7,
  custom_field: 'value'
}, () => llmCall())

Error Tracking¶

try {
  const result = await runforge.track({ experiment: 'test' }, () => {
    throw new Error('Rate limited')
  })
} catch (error) {
  // Error automatically tracked with latency and status
}

Streaming Support¶

// For OpenAI streaming with usage
const stream = await runforge.track({}, () =>
  openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages,
    stream: true,
    stream_options: { include_usage: true }
  })
)
// Usage data extracted from final chunk

Async Functions (Python)¶

import asyncio

# Supports both sync and async functions
async def async_llm_call():
    return await openai.achat.completions.create(model="gpt-4o", messages=messages)

result = await runforge.track({"experiment": "async"}, async_llm_call)

Configuration Options¶

SDK Initialization¶

TypeScript¶

const runforge = new RunForge({
  apiKey: 'your-api-key',           // Required
  endpoint: 'https://your-domain/api/ingest',  // Optional
  projectId: 'project-id'           // Optional
})

Python¶

runforge = RunForge(
    api_key='your-api-key',          # Required
    endpoint='https://your-domain/api/ingest',  # Optional  
    project_id='project-id'          # Optional
)

Cost Accuracy¶

Provider	Token Accuracy	Cost Accuracy	Source
OpenRouter	✅ Exact	✅ Exact	Provider API
OpenAI	✅ Exact	🟡 Calculated	Pricing Registry
Anthropic	✅ Exact	🟡 Calculated	Pricing Registry
Others	🟡 Estimated	🟡 Estimated	Fallback

Migration from Manual Configuration¶

Before (Manual)¶

// Old way - manual configuration required
const call = withLLM(
  openai.chat.completions.create.bind(openai.chat.completions),
  { 
    provider: 'openai', 
    model: 'gpt-4o', 
    price: { inUsdPerMTokIn: 5, inUsdPerMTokOut: 15 }
  },
  { apiKey: process.env.RUNFORGE_API_KEY }
)

After (Zero-Config)¶

// New way - completely automatic
const result = await runforge.track({ experiment: 'test' }, () =>
  openai.chat.completions.create({ model: 'gpt-4o', messages })
)

Troubleshooting¶

No Usage Data¶

If the SDK can't extract usage data: - Still tracks latency and status - Zeros for tokens and cost - Check provider response format

Incorrect Costs¶

OpenRouter costs are always exact
Other providers use server-side pricing registry
Check model name spelling and casing

Network Failures¶

SDK silently handles network failures
Never breaks your LLM calls
Metrics lost but application continues

Debug Mode¶

// Enable debug logging (if available)
process.env.RUNFORGE_DEBUG = '1'

Examples¶

See the complete examples: - TypeScript: examples/auto-extraction-demo.ts - Python: examples/auto-extraction-demo.py

Run them locally:

# TypeScript
npx tsx examples/auto-extraction-demo.ts

# Python  
python3 examples/auto-extraction-demo.py