State pruning

One of the important issues raised during the course of the Olympic Stress Net release is the large amount of data clients need to store; In just over three months of operation, and in the last month in particular, the amount of data in each Ethereum client’s blockchain folder has grown to an impressive 10-40 gigabytes, depending on which client you’re using and whether compression is enabled or not. Although it is important to note that this is indeed a stress test scenario, where users are encouraged to offload transactions on the blockchain and only pay the free test ether as a transaction fee, and thus the transaction throughput is many times higher than bitcoin, it’s still a legitimate concern for users who, in many cases, don’t have hundreds of gigabytes to spare for storing other people’s transaction history.

First, let’s explore why the current Ethereum customer base is so large. Ethereum, unlike Bitcoin, has the property that each block contains something called the “state root”: the root hash of a specialized kind of Merkle tree that stores the entire state of the system: all account balances, contract storage, contract code and account nonces are in it.

The purpose is simple: it allows a node that only receives the last block, along with the certainty that the last block is in fact the newest block, to “synchronize” with the blockchain extremely quickly without processing any historical transactions, by simply downloading the rest of the tree of nodes in the network (the proposed HashLookup wire protocol message will make this easier), verifying that the tree is correct by checking that all hashes match, and then going from there. In a fully decentralized context, this will likely be done through an extended version of Bitcoin’s Header First Verification strategy, which will look something like this:

  1. Download as many block headers as the client can get their hands on.
  2. Determine the header that is at the end of the longest chain. From this header, for security reasons, go back 100 blocks and name the block at this position P100(H) (“the hundredth generation grandparent of the head”).
  3. Download the state tree from the state root of P100(H) by using the HashLookup opcode (note that this can be parallelized among any number of peers after the first round or two). Make sure all parts of the tree match.
  4. Proceed normally from there.

For light clients, the State Root is even more beneficial: they can instantly get the exact balance and status of any account simply by asking the network for a specific branch of the tree, without having to follow Bitcoin’s multi-level 1-of-N “After ask for all transaction outputs, then ask for all transactions that spend those outputs and take the rest” light client model.

However, this state tree mechanism has an important disadvantage when implemented naively: the intermediate nodes in the tree greatly increase the memory required to store all the data. To see why, look at this chart here:

The change in the tree during each individual block is fairly small, and the magic of the tree as a data structure is that most data can simply be referenced twice without being copied. Nevertheless, for each state change, a logarithmically large number of nodes (i.e. ~5 at 1000 nodes, ~10 at 1000000 nodes, ~15 at 1000000000 nodes) must be stored twice, one version for the old tree and one version for the new trie. So when a node eventually processes each block, we can assume that the total disk space utilization, in computer terms, is approximate O(n*log(n))Where n is the transaction load. In practical terms, the Ethereum blockchain is only 1.3 gigabytes, but the size of the database including all those extra nodes is 10-40 gigabytes.

So what can we do? A backward-looking solution is to simply implement header-first sync, which essentially resets new users’ disk consumption to zero and allows users to keep their disk consumption down by re-syncing every month or two, but that is a somewhat ugly solution. The alternative approach is to implement state tree pruning: essentially use reference counting to keep track of when there are nodes in the tree (here using “node” in computer science terms meaning “piece of data located somewhere in a diagram or a tree structure”, not “computers on the network”) fall out of the tree and put them on “death row” at that point: unless the node is somehow reused within the next X blocks (e.g. X = 5000), after this number of blocks has elapsed, the node should be permanently deleted from the database. Essentially we store the tree nodes that are part of the current state and we even store recent history, but we don’t store history older than 5000 blocks.

X should be set as low as possible to save space, but setting X too low compromises robustness: Once this technique is implemented, a node cannot do more than reset X Blocks without essentially completely restarting synchronization. Now let’s see how this approach can be fully implemented, considering all the corner cases:

  1. When editing a block with number N, keep track of all nodes (in the state, tree, and acknowledgment trees) whose reference count falls to zero. Place the hashes of these nodes in some sort of data structure in a “death row” database so that the list can later be retrieved by block number (particularly block number). N+X) and mark the node database entry itself as deletable at block N+X.
  2. When a node that is on death row is reinstated (a practical example of this is account A, which is given a specific balance/nonce/code/memory combination). fand then changes to another value Gand then the employment status of account B f while the knot for f is on death row), then the reference counter is incremented to one again. If this node is deleted again in a future block M (With M > N), then put it back into the future block’s death row to be erased at block M+X.
  3. When you get to the processing block N+Xget the list of hashes you reported back while blocking N. Check the node associated with each hash; If the node is still marked for deletion during that particular block (i.e. not restored, and most importantly not restored and later marked for deletion again), delete it. Also clear the list of hashes in the death row database.
  4. Sometimes the new head of a chain isn’t above the previous head and you have to back up a block. For these cases, you need to keep a journal of all changes to reference counts in the database (that’s “journal” as in journaling filesystems; essentially an ordered list of changes made); When resetting a block, clear the death row list created when that block was created and undo the changes made according to the journal (and clear the journal when you’re done).
  5. When editing a block, clear the journal on the block NX; You can’t do more than restore X Blocks anyway, so the journal is redundant (and would actually defeat the whole point of pruning if it were kept).

Once this is done, the database should only store state nodes associated with the last one X Blocks so you still have all the information you need from those blocks but no more. There are also other optimizations. Especially afterwards X Blocks, transaction and receipt trees should be deleted entirely, and even blocks arguably can be deleted as well – although there is an important argument for keeping a subset of “archive nodes” that store absolutely everything to make available to the rest of the network upon acquisition help the data it needs.

Now how much savings can this bring us? As it turns out, quite a lot! Especially if we took the ultimate daredevil route and got started X = 0 (i.e. lose absolutely any ability to handle even single-block forks without storing any history), then the size of the database would essentially be the size of the state: a value that even now (this data was captured at block 670000 ) is around 40 megabytes – the majority of which is made up of accounts like this one having their slots filled to intentionally spam the network. at X = 100000, we would essentially get the current size of 10–40 gigabytes, since most of the growth happened in the last hundred thousand blocks, and the extra space required to store magazines and death row lists would account for the remainder of the difference. For any value in between, we can assume that the increase in storage space is linear (i.e X = 10000 would take us about ninety percent of the way there to near zero).

Note that we might want to use a hybrid strategy: keep every block but not every state tree node; In this case, we would need to add about 1.4 gigabytes to store the block data. It is important to note that the root cause of blockchain size is NOT fast block times; Currently, the block headers from the last three months are about 300 megabytes, the rest are transactions from the last month, so if usage is high we can expect transactions to continue to dominate. However, light clients must also trim block headers if they are to survive in low-memory situations.

The strategy described above was implemented in Pyeth in a very early alpha form; It will be properly implemented in all clients in time after the launch of Frontier, since such memory bloat is only a mid-term and not a short-term scalability issue.


Share post on

Sonic BTC is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Mihai’s Ethereum Venture Replace. The First 12 months. Ethereum

Mihai’s Ethereum Venture Replace. The First 12 months.

Into the ether Hello, fellow Ethereans. I am Mihai Alisie, one of the Ethereum...

By Roger Gofman
Ethereum

Ethereum Builders: Ein vorgeschlagenes Experiment

Hallo, liebe Ethereer. Ich bin Mihai Alisie, einer der Gründer von Ethereum, und seit...

By Roger Gofman
The surprising future course of Ethereum Ethereum

The surprising future course of Ethereum

Ethereum has prepared what we believe to be the most exciting digital alliance since...

By Roger Gofman
Implementation of Vitalik’s imaginative and prescient Ethereum

Implementation of Vitalik’s imaginative and prescient

Although the core concept was proven last summer when lead developers Vitalik Buterin, Gavin...

By Roger Gofman
Ethereum

Ethereum Builders: Erschließung des kollaborativen Potenzials

Hallo, liebe Ethereer. Ich bin Mihai Alisie, einer der Gründer von Ethereum, und seit...

By Roger Gofman
Ethereum Basis open tender concerning: Board choice Ethereum

Ethereum Basis open tender concerning: Board choice

As Ethereum approaches its first official launch with Frontier, we spent the last week...

By Roger Gofman
Visions, Half 1: The Worth of Blockchain Know-how Ethereum

Visions, Half 1: The Worth of Blockchain Know-how

One of the questions that has perhaps been central to my own research in...

By Roger Gofman
Visions, Half 2: The Drawback of Belief Ethereum

Visions, Half 2: The Drawback of Belief

Special thanks to: Robert Sams, Gavin Wood, Mark Karpeles and countless cryptocurrency critics on...

By Roger Gofman

Latest Posts

Dot sample [DOT]: Does a drop on this entrance imply a drop on this entrance – AMBCrypto Information Polkadot

Dot sample [DOT]: Does a drop on this entrance imply a drop on this entrance – AMBCrypto Information

Dot pattern, she had one high development activity For quite some time we've been...

By Roger Gofman
WEF Launches Crypto Sustainability Coalition to Use Web3 Applied sciences in Combat In opposition to Local weather Change – Blockchain Bitcoin Information – Bitcoin Information Stellar

WEF Launches Crypto Sustainability Coalition to Use Web3 Applied sciences in Combat In opposition to Local weather Change – Blockchain Bitcoin Information – Bitcoin Information

The World Economic Forum (WEF) created the Crypto Sustainability Coalition, an initiative dedicated to...

By Roger Gofman
Mihai’s Ethereum Venture Replace. The First 12 months. Ethereum

Mihai’s Ethereum Venture Replace. The First 12 months.

Into the ether Hello, fellow Ethereans. I am Mihai Alisie, one of the Ethereum...

By Roger Gofman
Is Binance USD (BUSD) Heading within the Proper Route on Sunday?  – Traders Observer Binance

Is Binance USD (BUSD) Heading within the Proper Route on Sunday? – Traders Observer

Binance USD receives a strong long-term technical score of 71 from InvestorsObserver's research based...

By Roger Gofman
Ethereum co-founder Vitalik Buterin expects Dogecoin and Zcash to maneuver to PoS |  Bitcoinist.com – Bitcoinist Dogecoin

Ethereum co-founder Vitalik Buterin expects Dogecoin and Zcash to maneuver to PoS | Bitcoinist.com – Bitcoinist

Ethereum co-founder Vitalik Buterin believes that other blockchains like Dogecoin and Zcash should follow...

By Roger Gofman
MASSIVE BITCOIN PROBLEM TODAY!!!! [careful now] BITCOIN PRICE PREDICTION 2022 // BITCOIN NEWS TODAY Videos

MASSIVE BITCOIN PROBLEM TODAY!!!! [careful now] BITCOIN PRICE PREDICTION 2022 // BITCOIN NEWS TODAY

Bitcoin News Today - MASSIVE BITCOIN PROBLEM TODAY!!!! [careful now] BITCOIN PRICE PREDICTION 2022...

By Roger Gofman
Cardano Vasil: Charles Hoskinson Talks One other Factor That May Profit Blockchain – U.As we speak Cardano

Cardano Vasil: Charles Hoskinson Talks One other Factor That May Profit Blockchain – U.As we speak

Tomiwabold Olajide Cardano is the third largest blockchain with a staking market cap of...

By Roger Gofman
Ethereum

Ethereum Builders: Ein vorgeschlagenes Experiment

Hallo, liebe Ethereer. Ich bin Mihai Alisie, einer der Gründer von Ethereum, und seit...

By Roger Gofman