Observability & alerts

Visualization added to dashboard:
KPI cards (cost, runs, tokens, avg P95 latency)
Cost trend line chart
Latency P95 area chart
Usage heatmap (7x24)
/api/metrics supports ?range=24h|7d|30d and returns { points, costByProvider?, costByModel?, series }.
Alerts system scaffolding added in convex/alerts.ts and lib/notifications.ts (no evaluation logic yet).

How to test locally: - Start the app and navigate to /dashboard. - Ensure GET /api/metrics?projectId=<id>&range=7d returns a JSON payload with points.

KPIs¶

Cost (sum), error rate, p95 latency — computed from runs_live and Postgres runs.
kpis_1m aggregates per project every minute.

Error spike detection (concept)¶

Monitor recent 15‑minute window; trigger when error rate exceeds threshold.
Notify via Webhook/Discord. TODO: Implement evaluators and channels.

Metrics export¶

{
  "points": [
    { "t": "2025-01-01T01:00:00.000Z", "costUSD": 0.12345, "errorRate": 0.01, "p95LatencyMs": 850, "runs": 42, "tokensIn": 1200, "tokensOut": 800 }
  ],
  "series": [
    { "t": "2025-01-01T01:00:00.000Z", "costUSD": 0.12345, "errorRate": 0.01, "p95LatencyMs": 850 }
  ]
}

Integrations¶

Notification channels (email/webhook/in‑app) — TODO.
Slack later.