Incident Report

Incident: gitlab.git.nrw outage

Christian Schild
Status: Resolved major

Incident Details

gitlab.git.nrw pilot

Started

May 9, 2025 01:30 +0200

Resolved

May 9, 2025 07:15 +0200

Duration

5h 45m

Affected Components

database gitlab
incident outage pilot

Impact

Service was completely unavailable during the outage window.

Root Cause

Database replication had silently stopped. Overnight, the remaining DB node's disk filled up with WAL transaction logs.

Resolution

Disk space was increased and replication was restored. Improved monitoring will be implemented.

Follow-up Actions

Implement WAL size monitoring and alerting

Add replication health dashboard

Document incident response procedures

Additional Details

During the night of May 9, 2025, gitlab.git.nrw was unavailable from approximately 01:30 to 07:15 CEST.

As a follow‑up effect of the Wednesday incident (May 7, 2025), database replication had silently stopped. Overnight, the remaining database node’s disk filled up with WAL transaction logs, which took the service down.

Disk capacity was increased and replication restored; the service has been fully available again since 07:15. As we are still in pilot, we continue to harden the platform and improve monitoring.

Related Updates