- In order to be an active block and transaction verifying participant in the Ethereum network (a majority of dapps, miners, Infura, etc.), the disk size required to run a full node currently sits at ~130 – 150GB
- The growth in Ethereum’s chain size is well known (+200% Y/Y), however the size itself isn’t the only friction, as fully synced nodes require constant cryptographic-linked verification in order to confirm transactions.
- The cost to run a full node will vary dramatically across an end-user. For many, users can run a full node on AWS for anywhere from $50-$100 a month, or even a local instance (~300 GB of SSD) for as low as ~$30/month.
- Considering the rapid decline in active Ethereum nodes (down ~66% since the start of 2018), the costs of minimizing trust may be getting too high for the average active network participant.
This report, with the help of the TokenAnalyst team, is an indirect follow-up to our recent Infura piece, exploring the costs to run nodes for various users in the Ethereum ecosystem and how the cost tradeoffs impact trust.
What is an Ethereum full node, and which users run one?
One of the bigger points of confusion surrounding Ethereum is the distinction between ‘full nodes’ and ‘archive nodes’. A full node is any computer or server that is connected to the Ethereum network, downloads the entire blockchain, validates the state to comply with the consensus mechanisms for the protocol, and can serve the network with data requests and block validation. An archive node is a full node that also includes a data archive of snapshots for every historical state at any given block, commonly used by block explorers and deep analytics on the Ethereum network.
The main node clients are Geth and Parity, both of which offer full node implementations:
- geth: The default Geth sync setting, which allows a faster sync by reordering the download of the state database prior to filling in block bodies and receipts.
- geth–syncmode full: A slower geth sync mode that verifies all blocks and transactions starting at genesis.
- parity: The default Parity sync mode which downloads a snapshot of the recent 30,000 “best blocks” (chain of last valid blocks with the greatest amount of accumulated work behind it) and the current state database. Once this snapshot synchronizes, the parity client moves to a full sync of prior history — once this sync occurs a parity default node becomes a full node.
- parity–no-warp: A slower parity sync mode that verifies all blocks and transactions starting at genesis.
In theory, anyone can run a full Ethereum node on their computer, which entails validating transactions and blocks on the Ethereum blockchain, provided entry hardware and bandwidth requirements are met. The confusion of these requirements primarily arises due to the varying perceptions of the total size of data downloaded (when considering what is the ‘entire’ blockchain) and the hardware and bandwidth requirements needed to maintain node sync. The fact that there’s a multitude of different node client implementations, each having their own parameters and configurations to set up, is another cause of confusion.
Three Types of Ethereum Nodes with typical end-user and disk size requirement
Sources: EthHub, TokenAnalyst, The Block
Note: Light clients store just the header chain, and request everything else on demand. Used for low capacity devices, like mobile.
In short, in order to be an active block and transaction verifying participant in the Ethereum network (a majority of dapps, miners, Infura, etc.) the disk size required to run a full node currently sits at ~130-150 GB.
What does historical ‘state’ information include?
The growth in Ethereum’s chain size is well known (+200% Y/Y), however the size itself isn’t the only friction, as fully synced nodes require constant cryptographic-linked verification in order to confirm transactions. These proofs, combined with the 50 million-plus unique address accounts (with their own respective list of associated data), make up a complex data structure known as the state trie. While some may consider ~130GB of memory relatively non-intensive (high-end consumer laptops typically have anywhere from 500GB – 1TB of disk disk), the need for a fully synced node to continually verify constant changes in state adds a higher burden on the node operator to maintain sync.
Sources: Etherscan.io, Blockchain, The Block
For this reason, the growth in both chain and state size are two separate considerations to consider when looking to maintain a fully synced node. The current state demands require the use of SSD drives instead of magnetic drives, and the pace of growth in both chain and state could force new hardware requirements in the coming years, however, these issues can be corrected to a degree by scheduled ETH 1.x scaling upgrades (better caching, data structures, etc.).
Futhermore, while the full node contains the full historical dataset of all the core components of the blockchain – blocks, transactions, logs, and receipts – archive nodes maintain this core info as well as the ‘state’ of the blockchain at every point in time (across every block height). This extra ‘state’ information includes:
- Transaction traces (used to see function calls between smart contracts and events that are propagated as a result of computation on a contract)
- Historical address balances
- Smart Contract creation, code, and historical changes in code
- Smart Contract storage throughout history
While only a few end-users will require the need for an archive node (chain analytics, auditors, block explorers, etc.), those users will see additional data size requirements to store ‘state’ related information beyond the typical chain size. According to TokenAnalyst, an on-chain infrastructure and data provider, the total extra ‘state’ related to information within TokenAnalyst database is approx. ~640GB.
What are the costs to run full nodes for different users?
The cost to run a full node will vary dramatically across an end-user. For many, users can run a full node on AWS for anywhere from $50-$100 a month, or even a local instance (~300 GB of SSD) for as low as ~$30/month. Meanwhile, for the select few users that require running an archive node, standard archive with 2-3TB can be ~$270 -370 a month.
You could probably get away with downgrading that to a medium instance once the initial sync is complete. That would bring it down to $60-70 per month total. An archival node needs 2-3tb, which would be around $270-370 per month.
— Lane Rettig (@lrettig) January 2, 2019
Elsewhere, on the extreme end of performance requirements, users that require maximum efficient nodes in order to speed up the retrieval of traces, balancediff, and storage diff data across entire Ethereum blockchain (real-time deep historical analysis and audit functions), will require a multitude of full-sync archive nodes dedicated to different sections of blocks across the entire Ethereum blockchain, which inevitably push costs to the extreme. An example of this is in a monthly snapshot of TokenAnalyst’s AWS bills for the month of December, where they scaled up to 95 full-sync archive Parity nodes, and at one point spending approx ~$3,400 per day.
The caveat is that the costs above reflect not only the nodes, but also a few tertiary machines that helped facilitate the data pipeline and accelerate the process of raw onchain data. Additionally, this intensive setup is a case study of the highest possible requirements for data retrieval – getting every possible granular data point from the entire blockchain – including the state at every historical block/point in time, in a matter of two weeks. Furthermore, a significant proportion of these resources was expended to extract the information from blocks 2.3M – 2.8M, where Ethereum experienced a DDOS attack which essentially bloated the blockchain and stuffed transactions in that timespan with thousands of traces.
The costs of trust
While running a full node does allow the user to independently verify the validity of the network, it still requires you to trust the client implementation (Geth or Parity in most cases) to some extent. What does trust really mean here? Trusting the work of an independent set of developers actively working on these clients – whose code is not infallible – as demonstrated at times in prior client bugs.
One argument can be made that true trustworthiness and accountability requires the need to export the full data in a decipherable, non-hashed format and verify for yourself that the “numbers add up” – especially after chain re-orgs, forks, and network upgrades – and not leave the brunt of the validation to the respective node client, to an Infura, or to an Etherscan. This, however, requires significant compute and hardware resources, and is becoming increasingly more challenging for an average user with a consumer laptop. Considering the rapid decline in active Ethereum nodes (down ~66% since the start of 2018), the costs of minimizing trust may be getting too high for the average active network participant.
Sources: Ethernodes.org, Coin.dance, Webarchive, The Block
Alternatively, if the end goal of the user is to minimize trust to its purest point, one could strive to connect to more diverse node client peers, diversifying nodes across geography, client implementations, and node providers. This combination of utilizing both diverse full nodes and pulling and verifying data from archive nodes allows trust to be dispersed across different vectors, albeit at a much higher cost. Recognizing these tradeoffs allows users to better understand and evaluate the higher costs of stronger trust-minimized verification, if so desired.
For many dapps, miners, and infrastructure providers, one full node at ~130 GB is fine enough for now.