What are we talking about ?
The besu team created a new plugin that can be plugged with Besu to have an RPC specific lightweight node. The main idea was to remove all the features that are not needed to reply RPC requests, like P2P, EVM execution, the transaction pool.
The node running the fleet plugin has a small database, can sync very quickly and has better performance.
How does it work ?
The first question that can come to mind is how the node can keep in sync with the chain if there in no EVM execution.
This is possible thanks to Besu Bonsai implementation of the Ethereum Patricia Merkle Trie. One of the key features of Bonsai is Trie log. A trie log is simply a State diff between two blocks N and N-1. This is used for example in Ethereum mainnet to handle reorgs as we need to rollback and roll forward the state thanks the trie logs. This mecanism is also used in Linea, with the state manager.
We chose to implement a captain-follower architecture, where
- The captain is a Besu vanilla node that executes the blocks, generates the trie log and notifies all the registered followers about the new head
- Each follower asks the captain the block data (blockchain data and state diffs) for each block missing between its head and the one communicated by the captain. This is done with an RPC call fleet_getBlock. With the data received from the captain, the follower update the blockchain with the new head data and the state by applying the trie logs.
You can find below the whole process between the captain and the followers
Why have we created this new mechanism
We wanted to have a scaling solution for RPC, i.e
- Scaling a new node should take few minutes
- The node should have better performances and can support more load than a Besu vanilla node
To achieve the first goal (1.), we need a smaller database, since new nodes (in RPC node operators) are usually started with database snapshots to avoid syncing the entire node from scratch.
For example we were able to reduce Linea database to only 19 GiB, by keeping only the last 2048 blocks and the last 512 blocks state (trie logs). These two numbers are configurable and before increasing them, we need to keep in mind the impact of each parameter :
- Increasing the number of blocks will have an impact on the size of the database
- Increasing the number state diffs will have an impact on the performances of some state RPC calls, as the besu fleet node needs to rollback to the old state to execute the state call.
We suggest to do performance load testing and analysis before changing them.
But, what does this mean exactly ?
This means that a besu fleet node is a lightweight RPC node that can handle only RPC traffic for the last 512 blocks for state calls like eth_call, and the last 2048 for blockchain calls, like eth_getBlockByNumber.
We decided to focus only on near head calls, because more than 90% of RPC traffic on Linea is related to near head calls.
What are the performances of a besu fleet node
Fleet nodes can sync very quickly, in less than 10 minutes, from a fresh snapshot database (< 24 hours). This is possible thanks to the small size of the database, and the fact that the followers doesn’t execute the new blocks since the snapshot head, but only apply the tries logs.
We noticed also almost twice better performance than a Besu vanilla node on state RPC calls. This can be explained by an excellent RocksDB cache hit ratio, around 99.9%. In this case, only 0.1% of the disk reads can hit the disk, knowing that there is a system cache that can even reduce that number. Reaching this high hit ratio was possible because the node was 100% dedicated to RPC traffic, so no execution related to p2p request, or block processing. These executions that happen in a vanilla node can pollute the cache, and reduce its efficiency. In the case of Fleet node, we use the temporal locality property of RPC traffic, which helped to reach high cache hit ratio on disk accesses.
This was a key finding in our load testing, i.e make the node handle only RPC traffic improved the quality of RocksDB cache and other caches, thanks to the temporal locality property of RPC traffic.
Each Captain can handle at least 300 followers, as we were limited by the infrastructure to test with mode nodes.
You want to read more about Besu fleet mode, you can check the original blog post.