Big-O scaling

Computer science has this thing called “big-O notation” (that’s O as in “oh”, not zero). It is a way of describing how algorithms behave as they’re given bigger problems to solve.

During the Great Maximum Blocksize Debate, there have been repeated claims that “Bitcoin is not scalable, because it is O(n2).” N-squared scaling is not sustainable; double N, and you need four times the resources (memory or CPU).

At the Montreal Scaling Bitcoin conference, I had a chance to talk to a couple of those people and ask them what the heck they’re talking about. Turns out they’re talking about a few different things.

Some of them are taking Metcalfe’s Law (“the value of a telecommunications network is proportional to the square of the number of connected users of the system (n2)” and applying it to Bitcoin transactions, assuming that if there are N people using Bitcoin they will collectively generate N2 transactions. That’s just silly, because even if everybody can theoretically transact with everybody else using Bitcoin, they won’t. I’ve transacted with probably under 100 other people or companies in the five years I’ve been using Bitcoin; the demand for transactions scales up linearly with the number of people using it.

Then there’s the observation that, assuming growth in the number of transactions “n” over time, the entire transaction history grows O(n2) (see Patrick Strateman’s recent talk). Grow faster than CPU power or hard disk space and eventually nobody new will be able to validate the entire blockchain.

If you start with the belief that you must validate the entire blockchain from the genesis block forward to be a True Bitcoiner, then this is a problem. But that’s an unnecessarily narrow view, in my opinion– I think people should be free to make trust/cost/convenience tradeoffs. For example, a new user could avoid downloading the entire historical block chain by getting a snapshot of the ledger as of some recent block (“fetch the UTXO set”) from one or more nodes on the network. That is much faster and convenient than fetching the entire blockchain history, and is almost perfectly safe– even if you get a bad copy of the ledger, the worst that can happen is either there are extra entries in the ledger (maybe the attacker grants themselves a million bitcoin they don’t actually have), or there are entries missing from the ledger. If there are extra entries, then an attacker could send you an invalid transaction that would never confirm. They can do essentially the same thing today by just sending you a transaction and sending the rest of the network a double-spend of the transaction.

If there are missing entries, your wallet might think the opposite– that a valid transaction is invalid. It will discover it’s mistake as soon as the transaction is verified in the block chain– and can recover by getting the ledger from a more reliable peer.

Finally, there’s this O(n2) argument:

If we assume that there are ‘n’ users, and a constant proportion of them (say 1%) run full nodes, and each user generates a certain number of transaction per day… then the total validation work done by all the full nodes every day is O(n2). (really, O((n/100)2), but constant factors are ignored in big-o notation)

There are two things wrong with this argument; first, the assumption that a constant proportion of users will run full nodes as the network grows might be incorrect. It is likely a larger and larger proportion of people will choose to run partially-validating nodes. In the future there might be a billion people using bitcoin but only tens of thousands (instead of tens of millions) fully-validating nodes. That would be a bright, successful, decentralized, secure future.

The second thing wrong with that argument is that while the entire network might, indeed, perform O(n2) validation work, each of the n individuals would only perform O(n) work– and that is the important metric, because each individual doesn’t care how much work the rest of the network is doing to validate transactions, they just care about how much work their computer must do.

 
534
Kudos
 
534
Kudos

Now read this

Will a 20MB max increase centralization?

Continuing with objections to raising the maximum block size: More transactions means more bandwidth and CPU and storage cost, and more cost means increased centralization because fewer people will be able to afford that cost. I can’t... Continue →