Setting Up Monitoring and Alerts¶

What You'll Learn¶

How to set up smart alerts so you'll know immediately when something important happens with your AI applications - like cost spikes, errors, or performance issues.

Why Use Alerts?¶

Avoid Surprises¶

Cost overruns: Get notified before your bill gets too high
Service issues: Know about problems before your users do
Performance drops: Catch slow responses early
Usage spikes: Understand when your app gets more traffic

Save Time¶

Instead of manually checking your dashboard: - Automatic monitoring: Alerts watch 24/7 - Smart notifications: Only get alerts that matter - Quick response: Fix issues faster with immediate notifications

Types of Alerts Available¶

Cost Alerts¶

Budget Warnings: - 50% of monthly budget reached - 80% of monthly budget reached
- 100% of monthly budget reached - Sudden 2x cost increase in 24 hours

Example: "Your Content Generator project has spent $45 of your $50 monthly budget"

Performance Alerts¶

Response Time Issues: - Average response time over 3 seconds - Any single request taking over 10 seconds - Response time 50% slower than usual

Example: "Your chatbot responses are averaging 4.2 seconds (usually 1.8 seconds)"

Reliability Alerts¶

Error Rate Problems: - Success rate drops below 99% - More than 10 errors in the last hour - Specific error types (rate limits, authentication, etc.)

Example: "Your Customer Support Bot has failed 15 times in the last hour"

Usage Alerts¶

Traffic Changes: - 300% increase in requests compared to normal - No activity for 6+ hours (when you expect activity) - New models or providers being used

Example: "Your app made 1,200 requests in the last hour (usually 400)"

Setting Up Your First Alert¶

Navigate to Alerts¶

Go to Settings → Alerts or look for a 🔔 bell icon
Select your project from the dropdown
Click "Create Alert" or "Add New Alert"

Choose Alert Type¶

You'll see options like: - Cost/Budget alerts - Performance alerts
- Reliability alerts - Usage/Traffic alerts

Let's start with a budget alert:

Configure Budget Alert¶

Alert Name: "Monthly Budget Warning"
Alert Type: Budget/Cost
Threshold: 80% of monthly budget
Time Period: Current month
How to notify: Email, SMS, webhook, etc.

Test Your Alert¶

Review settings: Make sure everything looks right
Save the alert: Click "Create Alert" or "Save"
Test notification: Most systems let you send a test alert
Verify delivery: Check that you received the test notification

Essential Alerts for Every Project¶

1. Budget Alert (High Priority)¶

Alert Name: "Monthly Budget - 80% Warning"
Type: Cost
Condition: Monthly spend > 80% of budget
Notification: Email + SMS

Why this matters: Prevents unexpected bills

2. High Error Rate (High Priority)¶

Alert Name: "Error Rate Spike"
Type: Reliability  
Condition: Success rate < 95% for 15 minutes
Notification: Email + Slack

Why this matters: Users are experiencing failures

3. Slow Performance (Medium Priority)¶

Alert Name: "Response Time Warning"
Type: Performance
Condition: Average response > 3 seconds for 30 minutes
Notification: Email

Why this matters: Poor user experience

4. No Activity (Low Priority)¶

Alert Name: "Service Might Be Down"
Type: Usage
Condition: No requests for 4 hours during business hours
Notification: Email

Why this matters: Your application might have stopped working

Advanced Alert Configuration¶

Smart Thresholds¶

Instead of fixed numbers, use dynamic thresholds: - "50% higher than last week" instead of "over 100 requests" - "Response time 2x normal" instead of "over 2 seconds" - "Cost increase 3x typical daily spend" instead of "$50 per day"

Time-Based Conditions¶

Make alerts smarter with time awareness: - Business hours only: Don't get cost alerts at night if that's normal - Weekend patterns: Different thresholds for weekends - Seasonal adjustments: Account for known busy periods

Escalation Rules¶

Set up multiple notification levels: 1. Warning (5 minutes): Email notification 2. Critical (15 minutes): Email + SMS
3. Emergency (30 minutes): Email + SMS + Phone call

Notification Channels¶

Email Notifications¶

Best for: Non-urgent alerts, detailed information Setup: Add your email address in notification settings Pros: Detailed messages, easy to search and archive Cons: Might be delayed, can get buried in inbox

SMS/Text Messages¶

Best for: Urgent alerts that need immediate attention Setup: Add phone number and verify Pros: Immediate delivery, hard to miss Cons: Character limits, costs money

Slack Integration¶

Best for: Team notifications, keeping everyone informed Setup: Connect RunForge to your Slack workspace Pros: Team visibility, conversation context Cons: Can be noisy, might get lost in busy channels

Webhook/API Integration¶

Best for: Custom integrations, automated responses Setup: Configure webhook URL in alert settings Pros: Can trigger automated responses Cons: Requires technical setup

Managing Alert Noise¶

Avoid Alert Fatigue¶

Start conservative: Begin with fewer, more important alerts Tune over time: Adjust thresholds based on experience Group related alerts: Don't send 10 alerts for the same issue

Alert Prioritization¶

Critical: Service down, major security issues High: Budget exceeded, high error rates Medium: Performance degradation, unusual patterns Low: Weekly summaries, minor threshold breaches

Quiet Hours¶

Set up "do not disturb" periods: - Nights and weekends: Unless truly critical - Maintenance windows: When you expect issues - Holiday periods: When usage patterns change

Responding to Alerts¶

Immediate Response Checklist¶

When you get an alert:

Read the full message: Don't just glance at the subject
Check the dashboard: Get current status and context
Assess severity: Is this urgent or can it wait?
Take action if needed: Fix the issue or escalate
Follow up: Make sure the issue is resolved

Common Alert Scenarios¶

Cost Spike Alert¶

Example: "Your project spent $25 in the last hour (usually $3)"

Investigation steps: 1. Check recent activity in your dashboard 2. Look for unusual request patterns 3. Verify your applications are working normally
4. Check if you accidentally made a lot of expensive API calls

Possible actions: - Pause non-essential services temporarily - Investigate and fix any runaway processes - Rotate API keys if you suspect unauthorized usage

High Error Rate Alert¶

Example: "Success rate dropped to 87% in the last 30 minutes"

Investigation steps: 1. Check what errors are happening (rate limits, timeouts, etc.) 2. Look at your provider's status page for outages 3. Review recent code deployments for bugs 4. Check if API keys are still valid

Possible actions: - Wait if it's a provider outage - Rollback recent deployments - Implement retry logic with backoff - Contact provider support if needed

Performance Degradation Alert¶

Example: "Average response time is 4.2 seconds (usually 1.8 seconds)"

Investigation steps: 1. Check if specific models are slower than others 2. Look for increased request volume 3. Verify your internet connection and infrastructure 4. Check provider status for performance issues

Possible actions: - Switch to faster models temporarily - Reduce request volume - Optimize your prompts to be shorter - Scale up your infrastructure if needed

Customizing Alerts for Different Use Cases¶

Production Applications¶

Very sensitive: Low thresholds, immediate notifications
24/7 monitoring: Alerts at any time
Multiple channels: Email, SMS, and team chat
Escalation: If not responded to in 30 minutes

Development/Testing¶

Less sensitive: Higher thresholds, daily summaries
Business hours: No alerts on nights/weekends
Email only: Less urgent notification methods
Weekly summaries: Digest of all activities

Personal Projects¶

Budget-focused: Mainly cost alerts
Email notifications: No urgent SMS needed
Higher thresholds: Don't alert for small issues
Monthly summaries: Overview of usage patterns

Alert Maintenance¶

Regular Review (Monthly)¶

Check alert history: Which alerts fired? Were they useful?
Tune thresholds: Adjust based on your normal usage patterns
Update contacts: Make sure notification info is current
Review relevance: Remove alerts you no longer need

Seasonal Adjustments¶

Holiday patterns: Expect different usage during holidays
Business cycles: Adjust for known busy/slow periods
Growth: Update thresholds as your usage grows
Model changes: New models may have different cost/performance profiles

Troubleshooting Alerts¶

Not Receiving Notifications¶

Check these items: - Is your email/phone number correct? - Are notifications going to spam/junk folder? - Is your phone carrier blocking messages? - Are webhooks/integrations properly configured?

Too Many False Alarms¶

Common fixes: - Raise thresholds to be less sensitive - Add time delays (e.g., "only alert after 15 minutes") - Use percentage-based thresholds instead of fixed numbers - Consider business hours restrictions

Missing Important Issues¶

Possible solutions: - Lower thresholds for critical alerts - Add multiple alert conditions - Use escalation rules for persistent problems - Review alert history to find gaps

Best Practices Summary¶

Starting Out¶

Start with budget alerts - Protect against unexpected costs
Add error rate monitoring - Know when things break
Test all notifications - Make sure alerts actually reach you
Review after one week - Tune based on initial experience

Long-term Success¶

Tune regularly - Adjust thresholds as you learn normal patterns
Document responses - Know what to do for each type of alert
Train your team - Make sure everyone knows how to respond
Keep it simple - Too many alerts = ignoring all alerts

Next Steps¶

Optimize your dashboard to complement your alerts
Learn about testing to prevent issues that cause alerts
Explore use cases to understand normal usage patterns for better alert thresholds