The cryptocurrency market that anticipated a bull-run before bitcoin halving, has been staring at the falling market for the past couple of weeks. However, the fall witnessed on 13 March was followed by BitMEX announcing a “hardware issue” due to which it had to go offline for 45 minutes as BTC price continued to plunge.
However, many traders and analysts in the space called it a bluff and claimed that there were no “hardware issues”. BitMEX’s Chief Executive Officer [CEO], Arthur Hayes on 16 March stated that his team has been gathering facts about the event and will address questions and concerns transparently and comprehensively over the coming days.
BitMEX’s Chief Technology Officer [CTO], Samuel Reed provided an explanation for the abrupt break down of the exchange on 13 March. The CTO claimed that the exchange was under an attack from a botnet. This botnet has also been responsible for an attack taking place on the exchange on 15 February.
According to Reed, the botnet attack in February was absorbed by BitMEX’s DDoS mitigation strategies. The DDoS mitigation helped BitMEX protect its targeted network or serves from a distributed denial-of-service [DDoS] attack on the L3 and L4 levels, thus avoiding downtime for the exchange. However, the strategy for the 13 March attack appeared to have changed with two attacks taking place at 02:15 UTC and 12:56 UTC.
According to the CTO:
“The botnet found an endpoint that was consistently, reliably slow. The query they hit did a 400ms reverse sequential scan rather than using the index (Parallel Index Scan / Gather Merge for PG fans), because an ANALYZE hadn’t been automatically run for too long by RDS defaults.”
Numerous scans were running parallelly that caused the database “to start swapping, pegged to 100% CPU, with over 99% of that as iowait.” An IOWait metric is the amount of time that a CPU is idle as there is no task ready to run and, at least one task is not ready to run because it’s waiting for I/O [Input/Output].
“On AWS, this looks quite a bit like a dying EBS volume, so we failed over the database and service resumed.”
The team was able to identify and fix the slow query after the second attack. As a preventive measure, BitMEX has been making systematic changes on its backend and reviewing older systems. While some changes have been made, others, like public-facing protocols around downtime, trade suspension, resumption, and communication are still in the works.
The exchange will be providing a detailed report about the attack, liquidations and insurance fund shortly. It has been trying to make-up to its customers by refunding those affected on 13 March.