One of the important issues raised during the course of the Olympic Stress Net release is the large amount of data clients need to store; In just over three months of operation, and in the last month in particular, the amount of data in each Ethereum client’s blockchain folder has grown to an impressive 10-40 gigabytes, depending on which client you’re using and whether compression is enabled or not. Although it is important to note that this is indeed a stress test scenario, where users are encouraged to offload transactions on the blockchain and only pay the free test ether as a transaction fee, and thus the transaction throughput is many times higher than bitcoin, it’s still a legitimate concern for users who, in many cases, don’t have hundreds of gigabytes to spare for storing other people’s transaction history.
First, let’s explore why the current Ethereum customer base is so large. Ethereum, unlike Bitcoin, has the property that each block contains something called the “state root”: the root hash of a specialized kind of Merkle tree that stores the entire state of the system: all account balances, contract storage, contract code and account nonces are in it.
The purpose is simple: it allows a node that only receives the last block, along with the certainty that the last block is in fact the newest block, to “synchronize” with the blockchain extremely quickly without processing any historical transactions, by simply downloading the rest of the tree of nodes in the network (the proposed HashLookup wire protocol message will make this easier), verifying that the tree is correct by checking that all hashes match, and then going from there. In a fully decentralized context, this will likely be done through an extended version of Bitcoin’s Header First Verification strategy, which will look something like this:
- Download as many block headers as the client can get their hands on.
- Determine the header that is at the end of the longest chain. From this header, for security reasons, go back 100 blocks and name the block at this position P100(H) (“the hundredth generation grandparent of the head”).
- Download the state tree from the state root of P100(H) by using the HashLookup opcode (note that this can be parallelized among any number of peers after the first round or two). Make sure all parts of the tree match.
- Proceed normally from there.
For light clients, the State Root is even more beneficial: they can instantly get the exact balance and status of any account simply by asking the network for a specific branch of the tree, without having to follow Bitcoin’s multi-level 1-of-N “After ask for all transaction outputs, then ask for all transactions that spend those outputs and take the rest” light client model.
However, this state tree mechanism has an important disadvantage when implemented naively: the intermediate nodes in the tree greatly increase the memory required to store all the data. To see why, look at this chart here:
The change in the tree during each individual block is fairly small, and the magic of the tree as a data structure is that most data can simply be referenced twice without being copied. Nevertheless, for each state change, a logarithmically large number of nodes (i.e. ~5 at 1000 nodes, ~10 at 1000000 nodes, ~15 at 1000000000 nodes) must be stored twice, one version for the old tree and one version for the new trie. So when a node eventually processes each block, we can assume that the total disk space utilization, in computer terms, is approximate O(n*log(n))Where n is the transaction load. In practical terms, the Ethereum blockchain is only 1.3 gigabytes, but the size of the database including all those extra nodes is 10-40 gigabytes.
So what can we do? A backward-looking solution is to simply implement header-first sync, which essentially resets new users’ disk consumption to zero and allows users to keep their disk consumption down by re-syncing every month or two, but that is a somewhat ugly solution. The alternative approach is to implement state tree pruning: essentially use reference counting to keep track of when there are nodes in the tree (here using “node” in computer science terms meaning “piece of data located somewhere in a diagram or a tree structure”, not “computers on the network”) fall out of the tree and put them on “death row” at that point: unless the node is somehow reused within the next X blocks (e.g. X = 5000), after this number of blocks has elapsed, the node should be permanently deleted from the database. Essentially we store the tree nodes that are part of the current state and we even store recent history, but we don’t store history older than 5000 blocks.
X should be set as low as possible to save space, but setting X too low compromises robustness: Once this technique is implemented, a node cannot do more than reset X Blocks without essentially completely restarting synchronization. Now let’s see how this approach can be fully implemented, considering all the corner cases:
- When editing a block with number N, keep track of all nodes (in the state, tree, and acknowledgment trees) whose reference count falls to zero. Place the hashes of these nodes in some sort of data structure in a “death row” database so that the list can later be retrieved by block number (particularly block number). N+X) and mark the node database entry itself as deletable at block N+X.
- When a node that is on death row is reinstated (a practical example of this is account A, which is given a specific balance/nonce/code/memory combination). fand then changes to another value Gand then the employment status of account B f while the knot for f is on death row), then the reference counter is incremented to one again. If this node is deleted again in a future block M (With M > N), then put it back into the future block’s death row to be erased at block M+X.
- When you get to the processing block N+Xget the list of hashes you reported back while blocking N. Check the node associated with each hash; If the node is still marked for deletion during that particular block (i.e. not restored, and most importantly not restored and later marked for deletion again), delete it. Also clear the list of hashes in the death row database.
- Sometimes the new head of a chain isn’t above the previous head and you have to back up a block. For these cases, you need to keep a journal of all changes to reference counts in the database (that’s “journal” as in journaling filesystems; essentially an ordered list of changes made); When resetting a block, clear the death row list created when that block was created and undo the changes made according to the journal (and clear the journal when you’re done).
- When editing a block, clear the journal on the block NX; You can’t do more than restore X Blocks anyway, so the journal is redundant (and would actually defeat the whole point of pruning if it were kept).
Once this is done, the database should only store state nodes associated with the last one X Blocks so you still have all the information you need from those blocks but no more. There are also other optimizations. Especially afterwards X Blocks, transaction and receipt trees should be deleted entirely, and even blocks arguably can be deleted as well – although there is an important argument for keeping a subset of “archive nodes” that store absolutely everything to make available to the rest of the network upon acquisition help the data it needs.
Now how much savings can this bring us? As it turns out, quite a lot! Especially if we took the ultimate daredevil route and got started X = 0 (i.e. lose absolutely any ability to handle even single-block forks without storing any history), then the size of the database would essentially be the size of the state: a value that even now (this data was captured at block 670000 ) is around 40 megabytes – the majority of which is made up of accounts like this one having their slots filled to intentionally spam the network. at X = 100000, we would essentially get the current size of 10–40 gigabytes, since most of the growth happened in the last hundred thousand blocks, and the extra space required to store magazines and death row lists would account for the remainder of the difference. For any value in between, we can assume that the increase in storage space is linear (i.e X = 10000 would take us about ninety percent of the way there to near zero).
Note that we might want to use a hybrid strategy: keep every block but not every state tree node; In this case, we would need to add about 1.4 gigabytes to store the block data. It is important to note that the root cause of blockchain size is NOT fast block times; Currently, the block headers from the last three months are about 300 megabytes, the rest are transactions from the last month, so if usage is high we can expect transactions to continue to dominate. However, light clients must also trim block headers if they are to survive in low-memory situations.
The strategy described above was implemented in Pyeth in a very early alpha form; It will be properly implemented in all clients in time after the launch of Frontier, since such memory bloat is only a mid-term and not a short-term scalability issue.