Impact
Service was completely unavailable during the outage window.
Root Cause
Database replication had silently stopped. Overnight, the remaining DB node's disk filled up with WAL transaction logs.
Resolution
Disk space was increased and replication was restored. Improved monitoring will be implemented.
Follow-up Actions
Implement WAL size monitoring and alerting
Add replication health dashboard
Document incident response procedures
Additional Details
During the night of May 9, 2025, gitlab.git.nrw was unavailable from approximately 01:30 to 07:15 CEST.
As a follow‑up effect of the Wednesday incident (May 7, 2025), database replication had silently stopped. Overnight, the remaining database node’s disk filled up with WAL transaction logs, which took the service down.
Disk capacity was increased and replication restored; the service has been fully available again since 07:15. As we are still in pilot, we continue to harden the platform and improve monitoring.