Observability & alerts
- Visualization added to dashboard:
- KPI cards (cost, runs, tokens, avg P95 latency)
- Cost trend line chart
- Latency P95 area chart
- Usage heatmap (7x24)
/api/metricssupports?range=24h|7d|30dand returns{ points, costByProvider?, costByModel?, series }.- Alerts system scaffolding added in
convex/alerts.tsandlib/notifications.ts(no evaluation logic yet).
How to test locally:
- Start the app and navigate to /dashboard.
- Ensure GET /api/metrics?projectId=<id>&range=7d returns a JSON payload with points.
KPIs¶
- Cost (sum), error rate, p95 latency — computed from
runs_liveand Postgres runs. kpis_1maggregates per project every minute.
Error spike detection (concept)¶
- Monitor recent 15‑minute window; trigger when error rate exceeds threshold.
- Notify via Webhook/Discord. TODO: Implement evaluators and channels.
Metrics export¶
{
"points": [
{ "t": "2025-01-01T01:00:00.000Z", "costUSD": 0.12345, "errorRate": 0.01, "p95LatencyMs": 850, "runs": 42, "tokensIn": 1200, "tokensOut": 800 }
],
"series": [
{ "t": "2025-01-01T01:00:00.000Z", "costUSD": 0.12345, "errorRate": 0.01, "p95LatencyMs": 850 }
]
}
Integrations¶
- Notification channels (email/webhook/in‑app) — TODO.
- Slack later.