Skip to main content
Driftstack DRIFTSTACK

Incidents & SLA

This page covers what happens during a Driftstack outage: how we detect, communicate, and resolve incidents, and the credit policy if we miss our SLA.

Status page

status.driftstack.dev is the single source of truth during an incident. The same data the status page renders is also published via GET /v1/status (overall + per-component status) and GET /v1/status/incidents (recent / live incidents). To get notified by email when an incident is filed or resolved, subscribe via POST /v1/status/subscribe — see /docs/status-subscriptions.

Severity ladder

SeverityDefinitionFirst updateUpdate cadence
Critical Core API down across all customers, or data-loss risk. ≤ 15 min Every 30 min until resolved.
Major API degraded (>5% error rate) OR a critical surface (auth, sessions) unavailable for a subset of customers. ≤ 30 min Every 60 min.
Minor Single non-critical surface (dashboard, an SDK build pipeline) degraded. ≤ 60 min At resolution.
Maintenance Planned change with potential impact. Always announced ≥48h in advance. Pre-announced Start + end.

Detection

Three signals trigger an incident:

Customer communications during an incident

  1. Status page entry filed with severity + title + affected components.
  2. Email fan-out to confirmed /v1/status/subscribe subscribers.
  3. Progress updates on the cadence above. Each update appends to the incident's timeline (visible via GET /v1/status/incidents).
  4. Resolution marks the incident resolved and triggers a final email to subscribers with a root-cause summary + remediation steps.
  5. Postmortem for Critical / Major incidents published within 7 business days on the public status page as a permanent entry under the resolved incident. Minor incidents get an inline summary on the resolved status entry.

Note: incident.created / incident.updated / incident.resolved are admin-audit / internal SSE event types — they are not yet in SubscribableWebhookEventTypeSchema, so they can't be the target of a POST /v1/webhooks subscription. Email subscription is the customer-facing notification path today.

SLA + credit policy

Driftstack's SLA is on the API control plane (session create + lifecycle endpoints). The dashboard, SDK distribution, and any features marked roadmap in /docs/api-versioning are best-effort.

Tier-by-tier SLA targets, the windowing methodology, the credit bands, and the dispute process all live in /docs/sla-policy — that is the authoritative reference. Tier identifiers used there match the AccountTier enum exactly.

Reading SLA data from the API

GET /v1/status/sla

→ {
  "data": [
    {
      "target": "api.driftstack.dev",
      "uptimePct": 99.99,
      "totalProbes": 43200,
      "okCount": 43196,
      "failCount": 4,
      "lastProbeAt": "2026-05-11T13:00:00Z",
      "lastFailureAt": "2026-05-09T14:23:00Z",
      "windowStart": "2026-04-11T13:00:00Z",
      "windowEnd": "2026-05-11T13:00:00Z"
    }
  ]
}

No auth — status surface is public. Window is a fixed rolling 30 days. Field names are camelCase (the SLA report serialises its internal model directly).

Postmortems

Public postmortems for Critical + Major incidents live on the public status page, attached to the resolved incident entry. Each follows the same template: timeline, root cause, what we changed to prevent recurrence. Postmortems are blameless and detailed enough to be useful — we'd rather over-share than under-share.

Reporting a problem

Related