Ethereum Reorgs After The Merge

By July 20, 2021DeFi
Click here to view original web page at www.paradigm.xyz

There has recently been discussion about the possibility of miners adopting a hypothetical modified Ethereum client that allows them to essentially accept bribes to make a short reorg of the chain (the main use case for making such bribes being to attack DeFi protocols).

In this post, we explain how this attack vector will be harder to execute post-ETH2 merge.

A fork choice rule is a function, evaluated by the client, that takes as input the set of blocks and other messages that have been seen, and outputs to the client what the “canonical chain” is. Fork choice rules are required because there may be multiple valid chains to choose from (eg. if two competing blocks with the same parent get published at the same time).

A reorg is an event where a block that was part of the canonical chain becomes no longer part of the canonical chain because a competing block beat it out. Finality is a situation where a fork choice rule so strongly favors a block that it is mathematically impossible (or at least economically infeasible) for that block to get reorged.

In some fork choice rules (eg. Tendermint), reorgs cannot happen; the fork choice rule simply extends the existing chain by appending any blocks that have been finalized through BFT consensus. In other fork choice rules, reorgs are very frequent.

Nakamoto (eg. PoW Ethereum / Bitcoin)NoneFrequentLongest chain
GASPER (eg. PoS ethereum)Every 2-epochs (~12 min)Very occasionalChain with strongest support after last finalized block
TendermintSingle block (~1-10s)NeverOnly finalized blocks

In Proof of Work blockchains like Ethereum, we typically see the “longest chain rule” (or, more accurately, “highest-total-difficulty chain rule”). This means that when a client is shown 2 blockchains, it chooses the one with the highest total difficulty (i.e. the sum of the difficulty of all blocks in that chain).

Assuming for the sake of this example that blocks can have difficulty of either 100 or 110, imagine the below scenario:

  1. We start syncing from Block 1 with difficulty 100.
  2. Block 2a and 3a arrive each with difficulty 100 and we insert them in our chain, for a fork of total difficulty 300.
  3. Block 3b with difficulty 110 arrives which declares 2a as its parent, creating a fork of total difficulty 310. The fork-choice rule will notice that the “heaviest” chain is now the second fork and will switch over to it. This is 1-block reorg, since only block 3a was changed. Note that the blocks are not discarded altogether, since a new block may arrive that causes the fork-choice to switch back to the first fork.
  4. Block 2b and 3c arrive, each with difficulty 110 creating a new fork of total difficulty 320! This means that the fork-choice rule will now use 2b instead of 2a and 3c instead of 3b which were the blocks that were in the last canonical chain. This is a 2-block re-org.

You can see where this is going. If a new block 4a arrived declaring 3a as its parent, the fork-choice rule would switch over back to the first fork and so on.

Short reorgs happen all the time due to latency. Miner A and Miner B may find a valid block simultaneously, but due to how blocks propagate in the p2p network, a part of the network will first see A’s block and another part will see B’s block. If the two blocks have the same difficulty, there will be a tie, and clients either pick randomly or choose the earlier-seen block. Typically, the tie is eventually broken when some third miner C builds a block on top of either A’s block or B’s block, and the other block is forgotten. Occasionally, bad luck can lead to 2-5 block reorgs. Reorgs longer than that are almost always due to extreme network failure, client bugs, or malicious attacks.

Short reorgs are not fatal, but they do still have some important detrimental consequences to the network:

  • Node costs: when a re-org happens there is some memory & disk overhead due to having to switch over to the new fork, possibly replaying transactions or state edits.
  • User experience degradation: the possibility of reorgs means that users need to wait longer before they can safely treat a transaction involving them as “confirmed”. An important sub-case of this is businesses such as exchanges needing to wait longer before they accept a deposit.
  • Transaction context uncertainty: when a user sends a transaction, they have fewer assurances about what context that transaction will be executed in (eg. would the most recent N blocks be reverted?). Notably, this increases vulnerability of DeFi transactions to accidental failure, worse-than-expected trade results, or harmful MEV extraction.
  • Increased vulnerability to 51% attacks: in a longest-chain-rule-driven system, if the chain reorgs from B1 to B2, then the difficulty of B1 no longer contributes to securing the chain. Attackers no longer need to beat out all honest miners, they need to beat out the portion of honest miners that don’t get reorged. If reorging is frequent, this makes the attacker’s job significantly easier.

The worst thing that can happen

In the worst case, frequent reorgs can completely nullify a blockchain’s settlement assurances and prevent it from progressing. Normally, the “incentive compatible” tactic for a block producer should be to extend the longest chain. But what happens if one particular block’s post-state is exceptionally lucrative (in the sense that there are very high fees or MEV that can be extracted only by building a block directly after that block)? This issue was explored in the past in the context of Bitcoin without the block reward & Selfish Mining and is being explored today in the context of DeFi-related MEV in the Ethereum ecosystem.

In these cases, there is a large incentive to try to “steal” the fees or MEV by competing with instead of extending the tip of the canonical chain. In the example below, block 1’s post-state is exceptionally lucrative, and block 2a has already been mined. However, not 1 but 3 block producers have chosen to mine on top of block 1 instead of block 2a (in order to claim any MEV exposed after block 1), and this could extend to an arbitrary number of parties.

For obvious reasons, such a pattern opens a wide door for malicious 51% attacks. We call miners engaging in such reorg-mining tactics “myopically rational” because the decision to do so can be rational in the short term. However, they have an explicit (stakers) or implicit (miners) long position on ETH (since fees and the block reward are denominated in ETH), which means that any such attacks that reduce user trust in Ethereum would be against their best interest, and thus not rational long term.

In Nakamoto PoW, the blocks are solidified in the fork choice “serially”. First, a block is mined, at which point a single competing block can potentially reorg it. If the block survives as part of the canonical chain, after (on average) 13 seconds some other miner builds a second block on top. At that point, a chain of two competing blocks is needed to reorg it. As more blocks get built on top, the difficulty of reorging the chain continues to increase, but slowly.

The Ethereum beacon chain implements a PoS protocol called Gasper with a fork-choice rule called LMD-GHOST. Contrary to Nakamoto PoW, there are 2 roles during block production:

  • Proposer: A validator is tasked with proposing a block.
  • Attesters: A group of validators who vote on which block they consider the head of the canonical chain. Attester votes are called attestations and they give “weight” to a block. Controlling the attesters means controlling the fork choice rule.

Every 12 seconds there is a “slot”, which represents an opportunity to propose a block. For each slot, a shuffling algorithm pseudorandomly chooses a committee consisting of ~1/32 of all validators, where 1 validator in each committee is the proposer and the rest are the attesters. The attesters proceed to vote in parallel on blocks that they consider to be part of the canonical chain. Because committees are sampled pseudorandomly, attackers do not have a way to concentrate their validators into a single slot.

Today, the Beacon Chain has ~196k validators, meaning every slot has a committee with a size of ~6125. As a result, even single-block reorgs are extremely difficult, because an attacker controlling only a few validators has no way to beat the honest majority of thousands of attesters.

To gain some intuition on why this is true, let’s see an example with 2 slots and 24 validators, where 9 of them are malicious. The validators get split into 2 committees, where due to the random shuffle, an adversary is unlikely to ever control >50% of any of the groups they get assigned to and cause a reorg.

More formally, the probability of a malicious actor with p% of stake controlling over 50% of an N-validator size committee follows a binomial distribution (where k = N/2):

Probability of controlling >50% of a 6125-size committee
45%1.25e-15
46%1.28e-10
47%1.09e-6
48%0.008
49%0.058

We now understand that making a reorg directly requires the attacker to control close to 50% of all validators.

There are more subtle attacks that are possible if an attacker has ~25-49% of attesters (see this paper, or here for a quick summary). However, there are known fixes to these attacks that can be implemented unobtrusively, increasing security closer to an unconditional 50%.

Finally, long reorgs are not possible because all blocks that are deeper than 2 epochs in the past are considered “finalized”, i.e. it is impossible to revert past them. If an attacker caused two conflicting blocks to be finalized (e.g by controlling 67% of the stake), the system would need to fall back to social intervention to recover.

Now that we’ve seen how reorgs work across different fork choice rules, it’s worth going through a simple game-theoretic example to understand when it would make sense for a miner or validator to run software that executes reorgs to profit.

We can informally describe each scenario with a payoff matrix, where “defect” means “download and use the software that implements reorgs”. The payoffs are “myopic”, not taking into account long-term consequences.

In longest-chain PoW, short-range reorgs can be done probabilistically with even a small portion of the validator set. There will always occasionally be blocks with exceptionally lucrative post-states such that even a 1-10% success rate is worth trying to compete with an existing child of that block.

The miner could either be a medium-sized pool banking on the possibility that they will find the next 2-3 blocks in a row, or they could send some portion of their revenue into an anyone-can-claim contract to bribe others running the same software to build on their chain and help it overcome the existing canonical chain.

As a result, some miners may be tempted to run reorg clients.

In Gasper, reorgs of 1-64 slots are possible, but require the attacker to control a large portion of the entire validator set (because they cannot concentrate their stake in a specific slot, so they need to be large enough to have enough stake randomly selected in the slot range they want to attack). Adoption of reorg-mining software is useless unless a very large number of other validators also adopt it at the same time.

Hence, if 51% of validators have even the slightest level of altruism or laziness, no one running reorg-mining software is a stable equilibrium.

In Tendermint, the story is even cleaner: reorgs are impossible outright, and any violation of single-slot finality would require 1/3+ of validators to be slashed. Similar to the Gasper case, this also means that no one running reorg-mining software is a stable equilibrium.

As we can see from the above, while adoption of “reorg geth” is possible in all cases, fork-choice rules based on some notion of parallel attestation have honest equilibria which are more stable than the equilibria in Nakamoto fork-choice.

The most effective prevention measure in the context of Ethereum is to further speed up work on the merge, in particular, to quickly achieve the credible capability of making an “emergency merge” which would transition the chain to PoS. Rushing the merge would have high risk and might break infrastructure, but a credible commitment to do it anyway if many miners start reorg-attacking the chain would align incentives against such behavior.

The time close to the merge is the time of greatest risk because miners are still in charge of the system but their time horizon decreases. However, two factors mitigate this risk:

  1. Ethereum miners are often at the same time (i) miners of other blockchains, and/or (ii) Ethereum community members in other capacities, so some motivations for good behavior continue to exist.
  2. As the merge approaches, the difficulty, cost and risk of doing an emergency merge also decrease. Months before the merge’s intended date, an emergency merge would be highly disruptive. Two weeks before the merge’s intended date, it would be a parameter setting to clients that validator operators have already downloaded.

Post-merge, reorg validating will become much less of a problem, because single attesters or small groups of attesters cannot reorg a block on their own. Reorg-attacking successfully requires solving the hard coordination problem of getting a large portion of validators onboard at the same time. However, some small risk remains. If it is desired to improve security further, then Ethereum can either adjust the fork choice rule further to increase reorg attack requirements to the 50% theoretical maximum or find a way to move toward single-slot-finality consensus outright.

All Today's Crypto News In One Place