Skip to content

Setting Up Monitoring and Alerts

What You'll Learn

How to set up smart alerts so you'll know immediately when something important happens with your AI applications - like cost spikes, errors, or performance issues.

Why Use Alerts?

Avoid Surprises

  • Cost overruns: Get notified before your bill gets too high
  • Service issues: Know about problems before your users do
  • Performance drops: Catch slow responses early
  • Usage spikes: Understand when your app gets more traffic

Save Time

Instead of manually checking your dashboard: - Automatic monitoring: Alerts watch 24/7 - Smart notifications: Only get alerts that matter - Quick response: Fix issues faster with immediate notifications

Types of Alerts Available

Cost Alerts

Budget Warnings: - 50% of monthly budget reached - 80% of monthly budget reached
- 100% of monthly budget reached - Sudden 2x cost increase in 24 hours

Example: "Your Content Generator project has spent $45 of your $50 monthly budget"

Performance Alerts

Response Time Issues: - Average response time over 3 seconds - Any single request taking over 10 seconds - Response time 50% slower than usual

Example: "Your chatbot responses are averaging 4.2 seconds (usually 1.8 seconds)"

Reliability Alerts

Error Rate Problems: - Success rate drops below 99% - More than 10 errors in the last hour - Specific error types (rate limits, authentication, etc.)

Example: "Your Customer Support Bot has failed 15 times in the last hour"

Usage Alerts

Traffic Changes: - 300% increase in requests compared to normal - No activity for 6+ hours (when you expect activity) - New models or providers being used

Example: "Your app made 1,200 requests in the last hour (usually 400)"

Setting Up Your First Alert

  1. Go to SettingsAlerts or look for a 🔔 bell icon
  2. Select your project from the dropdown
  3. Click "Create Alert" or "Add New Alert"

Choose Alert Type

You'll see options like: - Cost/Budget alerts - Performance alerts
- Reliability alerts - Usage/Traffic alerts

Let's start with a budget alert:

Configure Budget Alert

  1. Alert Name: "Monthly Budget Warning"
  2. Alert Type: Budget/Cost
  3. Threshold: 80% of monthly budget
  4. Time Period: Current month
  5. How to notify: Email, SMS, webhook, etc.

Test Your Alert

  1. Review settings: Make sure everything looks right
  2. Save the alert: Click "Create Alert" or "Save"
  3. Test notification: Most systems let you send a test alert
  4. Verify delivery: Check that you received the test notification

Essential Alerts for Every Project

1. Budget Alert (High Priority)

Alert Name: "Monthly Budget - 80% Warning"
Type: Cost
Condition: Monthly spend > 80% of budget
Notification: Email + SMS

Why this matters: Prevents unexpected bills

2. High Error Rate (High Priority)

Alert Name: "Error Rate Spike"
Type: Reliability  
Condition: Success rate < 95% for 15 minutes
Notification: Email + Slack

Why this matters: Users are experiencing failures

3. Slow Performance (Medium Priority)

Alert Name: "Response Time Warning"
Type: Performance
Condition: Average response > 3 seconds for 30 minutes
Notification: Email

Why this matters: Poor user experience

4. No Activity (Low Priority)

Alert Name: "Service Might Be Down"
Type: Usage
Condition: No requests for 4 hours during business hours
Notification: Email

Why this matters: Your application might have stopped working

Advanced Alert Configuration

Smart Thresholds

Instead of fixed numbers, use dynamic thresholds: - "50% higher than last week" instead of "over 100 requests" - "Response time 2x normal" instead of "over 2 seconds" - "Cost increase 3x typical daily spend" instead of "$50 per day"

Time-Based Conditions

Make alerts smarter with time awareness: - Business hours only: Don't get cost alerts at night if that's normal - Weekend patterns: Different thresholds for weekends - Seasonal adjustments: Account for known busy periods

Escalation Rules

Set up multiple notification levels: 1. Warning (5 minutes): Email notification 2. Critical (15 minutes): Email + SMS
3. Emergency (30 minutes): Email + SMS + Phone call

Notification Channels

Email Notifications

Best for: Non-urgent alerts, detailed information Setup: Add your email address in notification settings Pros: Detailed messages, easy to search and archive Cons: Might be delayed, can get buried in inbox

SMS/Text Messages

Best for: Urgent alerts that need immediate attention Setup: Add phone number and verify Pros: Immediate delivery, hard to miss Cons: Character limits, costs money

Slack Integration

Best for: Team notifications, keeping everyone informed Setup: Connect RunForge to your Slack workspace Pros: Team visibility, conversation context Cons: Can be noisy, might get lost in busy channels

Webhook/API Integration

Best for: Custom integrations, automated responses Setup: Configure webhook URL in alert settings Pros: Can trigger automated responses Cons: Requires technical setup

Managing Alert Noise

Avoid Alert Fatigue

Start conservative: Begin with fewer, more important alerts Tune over time: Adjust thresholds based on experience Group related alerts: Don't send 10 alerts for the same issue

Alert Prioritization

Critical: Service down, major security issues High: Budget exceeded, high error rates Medium: Performance degradation, unusual patterns Low: Weekly summaries, minor threshold breaches

Quiet Hours

Set up "do not disturb" periods: - Nights and weekends: Unless truly critical - Maintenance windows: When you expect issues - Holiday periods: When usage patterns change

Responding to Alerts

Immediate Response Checklist

When you get an alert:

  1. Read the full message: Don't just glance at the subject
  2. Check the dashboard: Get current status and context
  3. Assess severity: Is this urgent or can it wait?
  4. Take action if needed: Fix the issue or escalate
  5. Follow up: Make sure the issue is resolved

Common Alert Scenarios

Cost Spike Alert

Example: "Your project spent $25 in the last hour (usually $3)"

Investigation steps: 1. Check recent activity in your dashboard 2. Look for unusual request patterns 3. Verify your applications are working normally
4. Check if you accidentally made a lot of expensive API calls

Possible actions: - Pause non-essential services temporarily - Investigate and fix any runaway processes - Rotate API keys if you suspect unauthorized usage

High Error Rate Alert

Example: "Success rate dropped to 87% in the last 30 minutes"

Investigation steps: 1. Check what errors are happening (rate limits, timeouts, etc.) 2. Look at your provider's status page for outages 3. Review recent code deployments for bugs 4. Check if API keys are still valid

Possible actions: - Wait if it's a provider outage - Rollback recent deployments - Implement retry logic with backoff - Contact provider support if needed

Performance Degradation Alert

Example: "Average response time is 4.2 seconds (usually 1.8 seconds)"

Investigation steps: 1. Check if specific models are slower than others 2. Look for increased request volume 3. Verify your internet connection and infrastructure 4. Check provider status for performance issues

Possible actions: - Switch to faster models temporarily - Reduce request volume - Optimize your prompts to be shorter - Scale up your infrastructure if needed

Customizing Alerts for Different Use Cases

Production Applications

  • Very sensitive: Low thresholds, immediate notifications
  • 24/7 monitoring: Alerts at any time
  • Multiple channels: Email, SMS, and team chat
  • Escalation: If not responded to in 30 minutes

Development/Testing

  • Less sensitive: Higher thresholds, daily summaries
  • Business hours: No alerts on nights/weekends
  • Email only: Less urgent notification methods
  • Weekly summaries: Digest of all activities

Personal Projects

  • Budget-focused: Mainly cost alerts
  • Email notifications: No urgent SMS needed
  • Higher thresholds: Don't alert for small issues
  • Monthly summaries: Overview of usage patterns

Alert Maintenance

Regular Review (Monthly)

  • Check alert history: Which alerts fired? Were they useful?
  • Tune thresholds: Adjust based on your normal usage patterns
  • Update contacts: Make sure notification info is current
  • Review relevance: Remove alerts you no longer need

Seasonal Adjustments

  • Holiday patterns: Expect different usage during holidays
  • Business cycles: Adjust for known busy/slow periods
  • Growth: Update thresholds as your usage grows
  • Model changes: New models may have different cost/performance profiles

Troubleshooting Alerts

Not Receiving Notifications

Check these items: - Is your email/phone number correct? - Are notifications going to spam/junk folder? - Is your phone carrier blocking messages? - Are webhooks/integrations properly configured?

Too Many False Alarms

Common fixes: - Raise thresholds to be less sensitive - Add time delays (e.g., "only alert after 15 minutes") - Use percentage-based thresholds instead of fixed numbers - Consider business hours restrictions

Missing Important Issues

Possible solutions: - Lower thresholds for critical alerts - Add multiple alert conditions - Use escalation rules for persistent problems - Review alert history to find gaps

Best Practices Summary

Starting Out

  1. Start with budget alerts - Protect against unexpected costs
  2. Add error rate monitoring - Know when things break
  3. Test all notifications - Make sure alerts actually reach you
  4. Review after one week - Tune based on initial experience

Long-term Success

  1. Tune regularly - Adjust thresholds as you learn normal patterns
  2. Document responses - Know what to do for each type of alert
  3. Train your team - Make sure everyone knows how to respond
  4. Keep it simple - Too many alerts = ignoring all alerts

Next Steps