AptosBFT: all you need to know about the BFT consensus in Aptos

Crypto education

Aptos uses its own consensus algorithm called AptosBFT, based on HotStuff - which, in turn, is based on Practical Byzantine Fault Tolerance, of pBFT. Sounds complicated? Read our beginner guide, where we unpack all these terms.

What is Aptos blockchain and which consensus does it use?

Aptos is a new L1 blockchain that aims to become the fastest and most scalable in the industry. It can achieve 160,000 tps with a latency of less than 1s, meaning that transactions are confirmed within a second.

Aptos uses a new programming language called Move, as well as Move VM (virtual machine). Both technologies were originally developed for Diem – a blockchain project funded by Meta (Facebook). When the Diem project was sold to Silvergate Network in March 2022, a group of former Diem engineers went their own way to create Aptos.

Like any other blockchain, Aptos has a consensus algorithm - a procedure for approving new blocks and transactions. Aptos’ algorithm is called AptosBFT, and it is very similar to DiemBFT originally developed for Diem.

The Aptos consensus is like an onion, or a Russian doll, or those nesting gift boxes where smaller ones all go into a bigger one. We’ll peel the layers off one by one:

  • First, we’ll talk about BFT Byzantine fault tolerance and why it’s so important for blockchains.
  • Next, we’ll discuss how the standard BFT consensus works and why it’s not perfect.
  • Then we’ll look at Practical BFT consensus, or pBFT;
  • The next layer of the onion is HotStuff – a pioneering implementation of BFT that is allows a blockchain to reach a speed of 100,000 tps or more;
  • Finally, we’ll see how Aptos modified the HotStuff consensus to AptosBFT – and how it is stress-testing it through the Aptos Incentivized Testnet (AIT) to build the safest L1 blockchain.

What is Byzantine fault tolerance in blockchain?

BFT stands for ‘Byzantine Fault-Tolerant’. A network is said to be Byzantine fault-tolerant if it can continue to operate normally even if some of its members turn malicious or go offline.

The so-called Byzantine fault (failure) isn’t unique to blockchain: it’s a famous problem in game theory. Imagine a group of generals who must collectively decide if they should attack an enemy or not. There are two issues, though:

1. Each general is in his own camp, so they have to communicate by sending messages to lieutenants;

2. Some of the generals are traitors and actually want to lose the battle.. They might vote for the worst strategy, send conflicting messages to different colleagues to prevent them from reaching a consensus, or even send nothing at all.

In order to save the army, the good generals need a way to make a majority decision in spite of the presence of bad actors. For blockchains, this means that properly functioning nodes need to be able to decide correctly if a block is valid or not, even if some of the nodes act maliciously or simply crash.

“‘Acting maliciously”’ here means that a node will declare an invalid block or transaction valid, or vice versa. For example, if a transaction is a double-spend (uses the same coins that have already been spent by the same address), it’s not valid, but double-spending is something that bad actors like to do. However, a node can also become defective by accident.

Building a Byzantine-fault-tolerant blockchain is particularly tricky because the network is decentralized, so there is no way to know or check which nodes are sending correct or faulty messages.

Why ‘Byzantine’, though? It’s hard to say where the name of the problem comes from, but the Byzantines had a reputation for being treacherous, sneaky, and conniving.

Bitcoin and Ethereum’s solution versus the modern BFT consensus

Bitcoin’s mysterious creator Satoshi Nakamoto was aware of the Byzantine fault problem, and his solution is very robust, though it comes at a high price. In order to propose (broadcast) a block, a Bitcoin node has to also suggest a solution for a complex cryptographic puzzle - and all the others must accept or reject it. This way invalid blocks are identified quickly, and every attempt to fool the others will cost a malicious ‘general’ significant resources.

That’s the power of Proof-of-Work mining, but the flip side is that every node has to check everything that others submit. In other words, each general needs to read every single message sent by every other general – billions of them every second. It takes so much energy and time that Bitcoin – while probably the most Byzantine fault-tolerant blockchain out there – is slow, expensive, and consumes massive amounts of energy.

Nowadays, most new L1 blockchains,including Aptos, use some version of a Byzantine fault-tolerant Proof-of-Stake consensus rather than Proof-of-Work. One thing that many of these models have in common is that they can tolerate up to one-third of the nodes turning malicious (rather than 50% as in the case of Bitcoin). Why is this so? Let’s see.

The standard BFT Proof-of-Stake consensus: how many traitors can it tolerate?

Imagine that you are a decent Byzantine general and you have a pile of messages from your colleagues sitting on your desk, each saying ‘attack’ or ‘retreat’. What will you do?

You suspect that some of the generals are traitors, but you don’t know who exactly.  So your best option is to count all the ‘attack’ and all the ‘retreat’ messages and adhere to the strategy that received the most votes. Here we presume that all loyal generals will vote for the better strategy (let’s say to attack) and all the traitors will choose the losing strategy (retreat).

Example 1

If there are 4 generals with 1 traitor among them, every loyal general will receive 2 ‘attack’ votes from his honest colleagues and 1 ‘retreat’ message from the traitor – plus his own vote to attack. Thus, every loyal general (all 3 of them) will choose to attack, and the army will win.

But if there are 2 traitors among the 4 generals, then every loyal commander will get 1 ‘attack’ message and 2 ‘retreat’ messages. The honest general won’t know what to do, and the campaign will end in a fiasco.

When n (the number of generals) = 4, they can tolerate 1 bad actor.

Example 2

When there are 7 generals, they can surely tolerate 1 traitor. What about 2? Well, every loyal general will get 4 ‘attack’ and 2 ‘retreat’ messages, so the majority (the 6 good generals) will still attack the enemy and win.

But if there are 3 traitors, every good general will receive 3 ‘attack’ and 3 ‘retreat’ letters – and won’t know what to do. Indecision will lead to a defeat.

With n=7, the system can tolerate 2 bad actors.

It’s not difficult to show mathematically that the largest number of traitors (t) such a network can tolerate is t=(n-1)/3. Or, in other words, if you have t traitors, you need 3t+1 nodes to make the system Byzantine fault tolerant. The number of malicious nodes has to be less than one-third of the whole.

Practical BFT consensus (pBFT)

When trying to implement a BFT consensus in a real network is overhead – the excess time and resources needed for all the nodes to check each other’s messages and choose the course of action. To limit the overhead time, computer scientists Barbara Liskov and Miguel Castro proposed the so-called Practical Byzantine Fault Tolerant algorithm (pBFT) in 1999.

Barbara Liskov, a pioneer of distributed networks

In pBFT, the rule about having no more than one third of ‘bad’ nodes still holds: for every f potential traitors, you need 3f+1 nodes in total for the consensus to hold. The difference is that for each round, the nodes select a leader who is responsible for creating a new block. Also, the nodes are divided into primary (let’s call them “generals”) and secondary, or clients (lieutenants, if you wish).

The exchange of messages in pBFT happens in several steps:

1. A client sends a request to the leader: ‘Are we attacking or retreating?

2. The leader broadcasts the request to all the generals;

3. Every general – and the leader – sends an answer back to the network;

4. Once every general and the leader has collected 2f+1 matching messages (including their own), they conclude that the consensus has been reached;

5. Once the client has received f+1 matching messages, they also accept that the consensus has been successful.

6. Honest nodes also have a way to vote to replace the leader node if it goes silent or turns malicious.

One of the biggest advantages of pBFT lies in its immediate finality: once the nodes agree that a transaction or a block is valid, it is confirmed and final. You don’t need to wait for multiple blocks (confirmations) as in Ethereum or Bitcoin.

The flip side is that as the set of nodes (validators) grows, the number of messages they send to each other in each round increases very fast. For example, for a set of 7 nodes (with max 2 ‘traitors’) a single round of consensus will have at least 71 messages; once you have 13 nodes (86% more), the number of messages grows to 237 (234% more), and so on. As a result, pBFT networks have trouble scaling.

The best-known implementation of pBFT in blockchain is probably Tendermint, used by Cosmos (Tendermint has changed its name to Ignite). Compared to the original, it adds a system of leader rotation and other tweaks that allow it to scale well.

Other pBFT implementations include Zilliqa, Hyperledger Fabric, and – yes, we’re finally getting to it – HotStuff, which is the consensus used by Aptos.

HotStuff: a BFT consensus algorithm that gave rise to AptosBFT

In March 2018,  VMware Research group introduced HotStuff, a new consensus model that improves on pBFT in a number of important ways.

A clearer system of communication

Instead of sending messages to the leader and every other ‘general’, each node communicates only with the leader. The leader broadcasts the message (a suggested block) to be voted on; each node sends its vote to the leader, who collects the messages.

The leader needs to get what is called a Quorum Certificate (QC) of n-f votes, where n is the total number of nodes and f is the maximum number of malicious nodes the system can tolerate with that n. When the QC has been reached, the leader broadcasts the results; the nodes then verify the decision.

This drastically reduces the total number of messages that go through the system, making the network extremely scalable. In fact, Aptos is the most scalable L1 blockchain out there.

HotStuff’s leader-centric communication system also makes Aptos exceptionally fast: it can process 100,000-160,000 transactions per second with subsecond finality, meaning that a transaction takes less than a second to be confirmed on the blockchain. For comparison: on Solana, transactions take around 6 seconds to reach finality, and on Binance Smart Chain, around 30 seconds.

An improved process of choosing a new leader

The procedure of picking a new validator to be the leader is smoothly integrated with the rest of the consensus mechanism. By contrast, in pBFT a node that wants to propose a new leader first needs to collect enough confirmations from other nodes that they agree to pick a new leader, too.

HotStuff’s method for choosing the new leader (and making those rounds of voting seamlessly follow each other) is called PaceMaker. Having to rotate the leader all the time does result in some extra overhead as opposed to pBFT, where a change happens only if there is a problem with the leader node. But on the other hand, constant rotation helps avoid the situation where the leader goes malicious and starts to undermine the system in a subtle way without the other nodes noticing.

Pipelining

This is an improvement on the basic pBFT model: the leader has several ‘slots’ for blocks on different stages of readiness, so that it can work on several at the same time. So even if validating a single block still takes several rounds of messages (Prepare, Pre-commit, Commit, and Decide), within the same round the network manages to complete the Decide stage for Block X, the Commit stage for Block X-1, the Pre-commit operation for Block X-2, and the Prepare stage for Block X-3.

This is reflected on a diagram from the original HotStuff research paper:

AptosBFT: the consensus used by the Aptos blockchain

When Facebook set out to build its own blockchain, Libra, HotStuff was adapted to produce an even faster, more scalable, yet very robust consensus algorithm. It was named LibraBFT – but when Libra became Diem, the consensus mechanism was renamed to DiemBFT, and now we know it as AptosBFT.

Diem was sold to Silvergate in March 2022, but since the DiemBFT consensus algorithm is open-source, the team of Aptos could still use it – together with the Move language and Move VM (virtual machine).

We won’t go into too much technical detail on the differences between HotStuff and AptosBFT, but we will highlight a few important features of Aptos.

Permissionless vs. permissioned

First, Diem was developed to be a permissioned blockchain, while Aptos is truly decentralized. So Diem would have had a set of validators chosen by the company – similar to Binance Smart Chain, which has just 21 validators.

By contrast, Aptos can – and surely will – have thousands of nodes. Anyone can create a node, as long as they meet the hardware requirements, and any node can become the leader.

PaceMaker

HotStuff introduced the PaceMaker but didn’t offer detailed specifications for it: this achievement belongs to AptosBFT. PaceMaker is a functionality built into every validator node that ensures that voting rounds advance without breakdowns. It triggers the election of a new leader in three cases:

  • The current leader has successfully broadcasted a block;
  • The current leader has gone silent for a certain period of time (the maximum allowed delay is also set through the PaceMaker);
  • The leader cannot collect enough validator votes for a Quorum Certificate.

VRF-based leader selection

The PaceMaker makes sure the same validator is not selected as leader two times in a row and that nobody can manipulate the leader selection process or even predict which node will be chosen.. This is accomplished through the use of a so-called Verifiably Random Function, or VRF.

Fewer messages = less overhead

In the best-case scenario (no malicious nodes), AptosBFT requires every node to send just one message per round.

Consensus key rotation

Just like every crypto wallet comes with a private key, required to sign transactions, a node comes with a consensus key. Validators use this key to vote on blocks.

Private key theft is a serious risk to blockchains, but Aptos introduces an elegant solution: key rotation for both regular users and validators. It’s similar to changing your email password every once in a while. In addition to rotating their consensus keys, Aptos validators will have access to some novel key recovery techniques.

Incentives for nodes

AptosBFT (and DiemBFT) features Proof-of-Stake, while HotStuff in its original form doesn’t have to be combined with PoS. In Aptos, validators will need to stake coins to participate in the consensus; they will also be rewarded as long as they play by the rules. And vice versa: if a validator breaks the rules of the consensus, their stake can be slashed.

Once again, there is a big difference between Aptos and Diem: in Aptos, anyone can stake tokens to launch their own node or delegate tokens to an existing validator. Diem’s plan was to switch to such a permissionless model within five years of launch.

Will Aptos BFT consensus work?

For a new L1 blockchain like Aptos, stress-testing the consensus is essential before launching the mainnet. Will voting go as intended? Will it really take less than a second to finalize a block? And what if a group of malicious validators team up to try to manipulate the system?

Testing AptosBFT is one of the goals of Aptos’ incentivized testnet program, or AIT. It brings together hundreds of testnet node validators who look for bugs, implement upgrades, and much else, while earning rewards along the way. Many thousands of would-be validators applied for the AIT1, AIT2, and AIT3 stages.

From the Aptos meme channel. Credit: @SvyatoyRen

AIT follows the successful examples of networks like Cosmos, which pioneered incentivized testnets back in 2021 with its Game of Stakes. In that campaign, validators had the task to break the consensus – and they did. This shows the importance of testing the consensus beforehand.

As Pontem is building the first set of foundational dApps and primitives for Aptos, we are also following the same security-first approach. Pontem is currently undergoing several code audits, and in the coming months we may even launch a canary network of our own for developers to test our tooling in complete safety.

Aptos aims to be the fastest and most production-ready L1 in the industry with its 100-130k tps. This is possible only with a state-of-the-art consensus algorithm like AptosBFT, and we should see it in action soon enough, as Aptos plans to launch the mainnet in Fall 2022. We’ll continue to bring you the latest news on Aptos and its consensus – join Pontem on Telegram, Twitter, or Discord  and don’t miss the next update!

About Pontem Network

Pontem Network is a product studio building foundational dApps and development tools for Aptos, the most production-ready, scalable, and secure blockchain in the world. Together with the Aptos team, we are creating an ecosystem able to onboard the first billion blockchain users across the globe. Our products use the visionary Move coding language and Move VM – technology that was originally developed for Diem, a blockchain backed by Meta .

Install our wallet and try DEX