How-tointermediateadmin, member
Monitoring Agent Health
Set up dashboards, alerts and health-check endpoints to keep tabs on your AI workforce.
5 min read7,150 viewsUpdated 2026-01-28
Health dashboard
Navigate to Dashboard → Health for a real-time overview of all your agents. The dashboard shows CPU, memory, and token usage in a unified timeline view.
Agents approaching their limits are highlighted in amber. Agents that have crashed in the last 24 hours are highlighted in red.
Click any agent's row to expand detailed metrics including request latency, error rate, and queue depth.
Setting up alerts
Go to Settings → Alerts and create rules based on metrics:
| Metric | Description | Recommended Threshold |
|---|---|---|
cpu_usage | CPU utilization % | 80% |
memory_usage | RAM utilization % | 80% |
error_rate | Errors per minute | > 5 |
response_time | P95 latency in ms | > 2000 |
token_usage | Pool consumption % | 90% |
Alerts can be sent via email, Slack, or webhook.
# Example alert configuration
alerts:
- name: "High Memory Usage"
metric: memory_usage
threshold: 80
window: 5m
channels:
- slack:#ops-alerts
- email:team@company.com
monitoringalertsdashboard
Was this helpful?
More questions? Contact support