Bot Crashed — What Happens Now?
Learn how Botonom handles unexpected agent crashes, auto-recovery, and what you should check first.
We start with What happens when an agent crashes?, move on to Auto-recovery process, cover What should you check first?, and wrap up with Preventing future crashes.
What happens when an agent crashes?
When an agent encounters an unrecoverable error, Botonom's watchdog service detects the failure within 5 seconds. The system immediately marks the agent as RECOVERING and begins the auto-restart sequence.
During this window (typically 10–30 seconds) any in-flight tasks are queued and will be retried once the agent is back online. No data is lost.
Auto-recovery process
Botonom uses a three-stage recovery pipeline:
1. Health check
The platform pings the agent's internal endpoints to determine whether the crash was transient.
2. Warm restart
If the health check fails, a fresh instance is spun up from the last checkpoint.
3. Cold restart
As a last resort, the agent is rebuilt from its base image. This takes up to 60 seconds but guarantees a clean state.
| Stage | Duration | Data Loss |
|---|---|---|
| Health check | 1–5 s | None |
| Warm restart | 10–30 s | None |
| Cold restart | 30–60 s | Possible (in-flight only) |
What should you check first?
Open the Agent Dashboard → Logs tab. Look for any red entries marked FATAL or PANIC. The most common causes are:
- Token exhaustion — your shared pool ran out mid-task.
- Skill conflict — two skills tried to access the same resource.
- External API timeout — a third-party service didn't respond.
If you see OOM (Out of Memory), consider upgrading to a higher tier for more resources.
# Quick way to check the last 50 log entries
botonom logs --agent <agent-id> --tail 50 --level error
PANIC entries, do not restart the agent manually — contact support so we can investigate the root cause.Preventing future crashes
Enable Proactive Monitoring in Settings → Alerts. This sends you a notification when an agent's memory or CPU usage exceeds 80%.
You can also set up automatic scaling rules:
{
"rule": "auto-scale",
"trigger": {
"metric": "memory_usage",
"threshold": 80,
"unit": "percent"
},
"action": {
"type": "increase_buffer",
"amount": "25%"
}
}
- Enable proactive monitoring
- Set CPU alert at 80%
- Set memory alert at 80%
- Configure auto-scaling rules
- Set up Slack notifications
