BitGo - Issue Processing Ripple (XRP) Transactions

Incident Report for BitGo

Postmortem

Overview

At 3:00 PM PST on December 9, 2020, an internal fullnode lost connection to the XRP network, and fell behind chain-head. Due to a spike in overall XRP network transaction traffic and resource constraints, the node was unable to fully catch up.

Timeline

(All times stated here are Pacific Standard Time)

12/9/2020, 3:00 PM Initial XRP fullnode network disruption.

12/9/2020, 4:26 PM BitGo status page was updated to announce the initial incident.

12/10/2020, 10:41 AM XRP fullnode arrives at chain-head.

12/10/2020, 10:47 AM BitGo status page was updated to announce incident resolution.

Impact

The outage impacted our ability to index and process new XRP network transactions, and not result in any data loss or corruption.This outage did not interact with or impact any systems that handle funds or currency.

Root Cause Analysis

An initial networking disruption prevented an XRP fullnode from communicating with the network.
Once reconnected, the fullnode had fallen behind chain-head.

Due to the initial network disruption, a spike in overall XRP network transaction traffic, insufficient network throughput, and disk I/O latency, the fullnode was unable to index the XRP ledger fast enough to catch up with chain-head.

Mitigation

We approached the solution from two directions. First, we reached out to Ripple engineering for guidance on anything we could do to help mitigate the current network load. Second, we began implementing an improved hardware setup to accommodate more than enough resources the node was demanding. Once the new stack was operational, we were able to resume operations for XRP.

Future Remediation

We’ve completed a capacity-planning exercise, and created new nodes with an improved architecture, capable of handling current and future projected XRP network behavior.

Posted Dec 14, 2020 - 10:54 PST

Resolved

This incident has been resolved.

Posted Dec 10, 2020 - 10:47 PST

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Dec 10, 2020 - 09:23 PST

Identified

The issue has been identified and a fix is being implemented.

Posted Dec 10, 2020 - 07:06 PST

Investigating

We are currently investigating this issue.

Posted Dec 09, 2020 - 16:26 PST

This incident affected: Digital Assets (Ripple).