You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/releases/status.md
+36Lines changed: 36 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,6 +16,42 @@ Severity:
16
16
17
17
The severity is used to decide how much we invest in preventative measures, detection, mitigation plans, and rehearsals.
18
18
19
+
## 2025 December 4th: Brief DB outage (Severity: LOW):
20
+
21
+
### Timeline (GMT/UTC)
22
+
23
+
10:58 DB was unavailable for a few seconds, affecting about 20 users (e.g. page won't load)
24
+
25
+
11:14 Alerts automatically created
26
+
27
+
11:15 Developers responded
28
+
29
+
11:40 Decision and action
30
+
31
+
11:45 Incident over
32
+
33
+
### Analysis
34
+
35
+
The incident was triggered by a mistake by a developer on the DB configuration, which triggered a DB restart. Restart was successful so issues only arose during the brief restart period.
36
+
37
+
The analysis and decision concluded that the configuration needed to be reverted, and the DB restarted again.
38
+
39
+
The DB connections to the app remained open during the configuration change, avoiding any need for users to re-authenticate. This minimised the impact of the incident, but meant the quickest and safest response required a second restart.
40
+
41
+
### Actions
42
+
43
+
We have implemented protections against destructive actions on the DB, increasing barriers to this type of event.
44
+
45
+
We have increased user security requirements to confgure the DB (this incident was not security related, but it was a useful prompt).
46
+
47
+
Second-developer reviews are now required before any DB configuration changes are required.
48
+
49
+
Developers should only make configuration changes when fully aware of the consequences and able to handle the process
50
+
51
+
We have documented the error messages that correspond to this issue, to make detection faster and more accurate in future.
0 commit comments