Alarm Graph Fallback — 2026-07-01T19:45Z #78
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Alarm Graph Fallback — 2026-07-01T19:45Z
Alert Batch: 37 Active Alerts
K8s Platform — REAL Incidents
1. DeploymentReplicasUnavailable (3 deployments) — Started 19:40Z
2. CrashLooping pods (2) — 19:35Z
3. CiliumCanaryProbeFailing (2) — 19:22Z
4. NixosHostDeployFailed (3) — 19:30Z batch
5. DNSMasterDown (1) — 18:42Z
Chronic Noise (No Intervention Needed)
6. LonghornMaintenanceJobFailed (3) — PITFALL-252 (snapshot-purge-watchdog BackoffLimitExceeded)
7. KubernetesAgentBackupControlJobFailed (~13) — backup-audit / backup-label-reconciler BackoffLimitExceeded
8. SmokepingInterSiteLatencyHigh (3) — cc-fr-lau-store-01 chronic latency targets (89.167.50.230, 204.168.217.156, 49.13.125.237)
9. HeadscaleOperatorMetricsDown (1) — likely persistent (namespace/service removed)
Cascade Timeline
19:22Z → Cilium canary fails (cc-se-sto-core-01 unreachable)
19:30Z → 3 NixosHost deploy failures (cc-se-sto-core-01 included — likely same host)
19:35Z → external-dns + repowise pods crashloop
19:40Z → 3 deployments report replicas unavailable
Classification
Notes