SDK Auto-Extraction: Zero-Config Usage Guide¶
Overview¶
RunForge SDK now supports zero-configuration automatic extraction of tokens, costs, and metadata from LLM API responses. Users simply wrap their existing LLM calls with runforge.track() and all metrics flow automatically to the dashboard.
Quick Start¶
TypeScript/JavaScript¶
import { RunForge } from '@runforge/sdk-ts'
// Initialize with your API key
const runforge = new RunForge({
apiKey: process.env.RUNFORGE_API_KEY,
projectId: 'your-project-id'
})
// Wrap any LLM call - everything else is automatic!
const result = await runforge.track({ experiment: 'chat-v2' }, () =>
openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: 'Hello!' }]
})
)
// ✅ Tokens, costs, latency automatically tracked
Python¶
from runforge import RunForge
# Initialize with your API key
runforge = RunForge(
api_key=os.environ['RUNFORGE_API_KEY'],
project_id='your-project-id'
)
# Wrap any LLM call - everything else is automatic!
result = runforge.track(
{"experiment": "chat-v2"},
lambda: openai.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}]
)
)
# ✅ Tokens, costs, latency automatically tracked
Supported Providers¶
🥇 OpenRouter (Highest Accuracy)¶
- Direct cost extraction from
usage.total_cost - Real-time pricing from provider
- No estimation required
// OpenRouter calls automatically extract exact costs
const result = await runforge.track({ model: 'openai/gpt-4o-mini' }, () =>
openrouter.chat.completions.create({
model: 'openai/gpt-4o-mini',
messages
})
)
// Cost comes directly from OpenRouter - 100% accurate
🤖 OpenAI Direct¶
- Token extraction from
usage.prompt_tokens/completion_tokens - Server-side cost calculation using pricing registry
- Streaming support with
stream_options.include_usage
// OpenAI calls extract tokens and calculate costs
const result = await runforge.track({}, () =>
openai.chat.completions.create({
model: 'gpt-4o-mini',
messages,
stream_options: { include_usage: true } // For streaming
})
)
// Tokens extracted, cost calculated automatically
🧠Anthropic Direct¶
- Token extraction from
usage.input_tokens/output_tokens - Server-side cost calculation using pricing registry
- Model-specific pricing for Claude variants
// Anthropic calls extract tokens and calculate costs
const result = await runforge.track({}, () =>
anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
messages
})
)
// Input/output tokens extracted, cost calculated automatically
How It Works¶
1. Automatic Provider Detection¶
// Provider detected from model name
'openai/gpt-4o-mini' → 'openai'
'gpt-4o-mini' → 'openai'
'claude-3-opus' → 'anthropic'
2. Usage Data Extraction¶
- OpenRouter:
usage.total_cost(exact from provider) - OpenAI:
usage.prompt_tokens+usage.completion_tokens - Anthropic:
usage.input_tokens+usage.output_tokens
3. Server-Side Cost Verification¶
- OpenRouter costs trusted as-is (
costSource: "provider") - Other providers recalculated server-side (
costSource: "catalog") - Unknown models marked as estimated (
costEstimated: true)
4. Privacy-First Design¶
- Never stores prompts or responses
- Only extracts usage metadata
- Safe for sensitive workloads
Advanced Usage¶
Custom Metadata¶
const result = await runforge.track({
experiment: 'chat-v2',
user_id: 'user123',
temperature: 0.7,
custom_field: 'value'
}, () => llmCall())
Error Tracking¶
try {
const result = await runforge.track({ experiment: 'test' }, () => {
throw new Error('Rate limited')
})
} catch (error) {
// Error automatically tracked with latency and status
}
Streaming Support¶
// For OpenAI streaming with usage
const stream = await runforge.track({}, () =>
openai.chat.completions.create({
model: 'gpt-4o-mini',
messages,
stream: true,
stream_options: { include_usage: true }
})
)
// Usage data extracted from final chunk
Async Functions (Python)¶
import asyncio
# Supports both sync and async functions
async def async_llm_call():
return await openai.achat.completions.create(model="gpt-4o", messages=messages)
result = await runforge.track({"experiment": "async"}, async_llm_call)
Configuration Options¶
SDK Initialization¶
TypeScript¶
const runforge = new RunForge({
apiKey: 'your-api-key', // Required
endpoint: 'https://your-domain/api/ingest', // Optional
projectId: 'project-id' // Optional
})
Python¶
runforge = RunForge(
api_key='your-api-key', # Required
endpoint='https://your-domain/api/ingest', # Optional
project_id='project-id' # Optional
)
Cost Accuracy¶
| Provider | Token Accuracy | Cost Accuracy | Source |
|---|---|---|---|
| OpenRouter | ✅ Exact | ✅ Exact | Provider API |
| OpenAI | ✅ Exact | 🟡 Calculated | Pricing Registry |
| Anthropic | ✅ Exact | 🟡 Calculated | Pricing Registry |
| Others | 🟡 Estimated | 🟡 Estimated | Fallback |
Migration from Manual Configuration¶
Before (Manual)¶
// Old way - manual configuration required
const call = withLLM(
openai.chat.completions.create.bind(openai.chat.completions),
{
provider: 'openai',
model: 'gpt-4o',
price: { inUsdPerMTokIn: 5, inUsdPerMTokOut: 15 }
},
{ apiKey: process.env.RUNFORGE_API_KEY }
)
After (Zero-Config)¶
// New way - completely automatic
const result = await runforge.track({ experiment: 'test' }, () =>
openai.chat.completions.create({ model: 'gpt-4o', messages })
)
Troubleshooting¶
No Usage Data¶
If the SDK can't extract usage data: - Still tracks latency and status - Zeros for tokens and cost - Check provider response format
Incorrect Costs¶
- OpenRouter costs are always exact
- Other providers use server-side pricing registry
- Check model name spelling and casing
Network Failures¶
- SDK silently handles network failures
- Never breaks your LLM calls
- Metrics lost but application continues
Debug Mode¶
Examples¶
See the complete examples:
- TypeScript: examples/auto-extraction-demo.ts
- Python: examples/auto-extraction-demo.py
Run them locally: