Skip to content

Observability & alerts

  • Visualization added to dashboard:
  • KPI cards (cost, runs, tokens, avg P95 latency)
  • Cost trend line chart
  • Latency P95 area chart
  • Usage heatmap (7x24)
  • /api/metrics supports ?range=24h|7d|30d and returns { points, costByProvider?, costByModel?, series }.
  • Alerts system scaffolding added in convex/alerts.ts and lib/notifications.ts (no evaluation logic yet).

How to test locally: - Start the app and navigate to /dashboard. - Ensure GET /api/metrics?projectId=<id>&range=7d returns a JSON payload with points.

KPIs

  • Cost (sum), error rate, p95 latency — computed from runs_live and Postgres runs.
  • kpis_1m aggregates per project every minute.

Error spike detection (concept)

  • Monitor recent 15‑minute window; trigger when error rate exceeds threshold.
  • Notify via Webhook/Discord. TODO: Implement evaluators and channels.

Metrics export

{
  "points": [
    { "t": "2025-01-01T01:00:00.000Z", "costUSD": 0.12345, "errorRate": 0.01, "p95LatencyMs": 850, "runs": 42, "tokensIn": 1200, "tokensOut": 800 }
  ],
  "series": [
    { "t": "2025-01-01T01:00:00.000Z", "costUSD": 0.12345, "errorRate": 0.01, "p95LatencyMs": 850 }
  ]
}

Integrations

  • Notification channels (email/webhook/in‑app) — TODO.
  • Slack later.