BTC Degraded Performance
Incident Report for BitGo
Postmortem

Overview

At 7:00 AM PST on December 15th, 2020, an internal BTC indexer hung while processing a network transaction, which caused it to fall behind chainhead. This temporarily prevented the BitGo platform from processing BTC transactions.

Timeline

(All times stated here are Pacific Standard Time)

7:00 AM - Initial BTC indexer disruption.

7:27 AM - Status page incident announced.

8:58 AM - Platform service restored.

9:21 AM - Finished reprocessing transaction backlog.

9:26 AM - Status page incident resolution announced.

Impact

The outage impacted our ability to index and process new BTC network transactions, and did not result in any data loss or corruption.This outage did not interact with or impact any systems that handle funds or currency.

Root Cause Analysis

An internal BTC indexer failed to process a network transaction which caused an internal retry to exponentially back-off, while also deadlocking. The indexer eventually ended up in a hung state, and was unable to process additional network transactions.

Mitigation

The BTC indexer was gracefully shut down, and a database snapshot was taken to expedite reindexing. The BTC indexer was then restarted and monitored while it finished reindexing, starting from where the snapshot left off, and ending at chainhead. Once the indexer arrived at chainhead, several pending transactions were then reprocessed.

Future Remediation

We have already added alerting against the internal process state that preceded this incident. Additionally, we are already migrating to a much more robust and fault-tolerant indexer implementation.

Posted Dec 18, 2020 - 18:20 PST

Resolved
This incident has been resolved.
Posted Dec 15, 2020 - 09:26 PST
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Dec 15, 2020 - 08:38 PST
Investigating
Our BTC processing is experiencing some performance issues at this time. Our Engineering and DevOps teams are actively investigating this issue, and we will will provide updates here as they become available.
Posted Dec 15, 2020 - 07:27 PST
This incident affected: Digital Assets (Bitcoin).