The Need for Decentralized Cloud Storage

Written by

Read time

12 min

Web3, the next iteration of the internet, will be built on decentralized technology across three fundamental pillars: consensus, storage, and computation.

Blockchain technology is what set off a revolution of decentralization and brought about the concept of Web3, representing the idea to not only decentralize consensus, but to use this technology to decentralize the rest of the internet too.

“Web3 is the stack of protocols that enable fully decentralized applications.” – Nader Dabit

Just like Web2, Web3 is a complex amalgamation of a wide array of technologies that together form the Web3 ecosystem. Despite its complexity, we can break down the ecosystem into three key infrastructural pillars that need to be developed to achieve full decentralization of the internet: consensus, storage, and computation.

Consensus has matured quickly since Bitcoin’s launch in 2009, with dozens of other successful models of decentralized consensus having been brought to life since then. Over time attempts at decentralizing storage and computation have emerged that aim to complement these to build the next pillars of a truly decentralized internet.

Figure 1: Illustrative slice of projects enabling each of the Web 3 pillars

In this piece we will look at decentralized storage, which describes peer-to-peer networks, in which members combine disk space to create what is essentially a global hard drive that is trustless, immutable, and in some cases permanent and censorship-resistant.

The Need for Decentralized Cloud Storage

In this section we take a close look at the question why we even need decentralized storage. First, we look at decentralized storage from a blockchain perspective. Then, we take a closer look at NFTs and dApps from the perspective of decentralization, immutability, and permanence to understand why decentralized storage is preferred over centralized Web2 storage approaches.

The need for decentralized storage from a blockchain perspective can be examined from two primary perspectives:

Economic: storing data on chain is very expensive. Data that does not need to be stored on a blockchain should not be stored on a blockchain.

Technical: storing data on chain is very inefficient, and blocks only have a limited size. To prevent blocks from being filled up with useless data, we need to offload that data elsewhere.

The economic perspective: how expensive is it to store data?

Storing data directly on a blockchain is extremely expensive, which is why blockchains primarily store transactions (or their outcomes as state data), and which is also why smart contracts are reduced to as few lines of code as possible. This is where decentralized storage networks come in: they store data that is too expensive to store on the blockchain, but need to be permanent, immutable, and resistant to censorship.

If we wanted to store the image file of the Bored Aped Yacht Club #3368 NFT On the Bitcoin Network, we would require at least 1700 OP_RETURN transactions (conservative estimate) to save the entire file, assuming standard consensus rules and node settings (80 bytes of arbitrary data per OP_RETURN, max one OP_RETURN per transaction). With a transaction fee of 12 sats/vB, that’s 0.028 BTC for a single image of the 10,000 piece collection.

Storing the same image data on the Ethereum network’s permanent storage would cost roughly 7.9 ETH at ~95 gwei gas fees requiring nearly 23m gas units in a single smart contract deployment. For most applications, such storage costs are just not feasible.

Figure 2: Projects with active Mainnets. 200 years storage duration selected to match Arweave’s minimum definition of permanence. Sources: Network Documentations, Arweave Storage Calculator

If we further compare these costs against the cost of storing the same data on a decentralized storage network, we can quickly see that purpose-built storage networks are far more cost-efficient at storing files, while also ensuring permanence, immutability, and censorship resistance – more on that later.

The technical perspective: why should we want to avoid storing data directly on public blockchains?

Blockchains, as the name suggests, consist of blocks that are connected to one another forming a chronological sequence of blocks. Each block points to the previous block to ensure that data in past blocks cannot be adjusted. The data that is contained in the blocks are transactions or state descriptors. Thousands of nodes globally ensure that nobody cheats the system and consensus between the nodes is maintained.

With every block, a set of transactions are added that change the state of values within the network. Since the size of the blocks is capped, only a certain amount of transactions can be processed per block. This gives an implicit time-value to blocks, that is reflected in the fees that network participants are willing to pay to have their transaction confirmed and included in a block.

When a block is filled, transactions stay within a node’s mempool until the block is confirmed and the transactions are added to the next block. If a transaction is not confirmed for an extended period, it may be impacted by slippage or by bots frontrunning the transaction. Storing arbitrary data on blockchains amplifies this issue by occupying blockspace and pushing transactions to be included in later blocks.

The limited supply of blockspace coupled with the demand for transactions to be included in a block thus drives up transaction fees for the entire network, which can dissuade users from interacting with the network.

Arbitrary data on blockchains can be reduced through decentralized storage networks, by offloading those data loads while offering similar characteristics to public blockchains.

But why not store files on centralized networks?

The previous perspectives explain why we shouldn’t store data on blockchains, however, the next question becomes: why store data on decentralized networks? Data could just as easily be stored on a Web2 centralized server. The answer to this is quite simple: to ensure immutability and trustlessness, and to enable permanence and censorship-resistance of the data.

The case for NFTs

Let’s take a look at NFTs: non-fungible tokens (NFTs) represent a unique (i.e., non-fungible) ownership token that is stored on a blockchain and is controlled by a smart contract. The blockchain records who owns the unique token and points to something called metadata, which describes what the token represents. The metadata includes details about the NFT as well as links to other data such as media files – this is what gives the NFT context and meaning.

Figure 3: Simplified illustration of a blockchain, blocks, an NFT, and off-chain metadata.

Metadata can be stored anywhere. As long as the data is accessible through the pointer embedded in the NFT smart contract, the contents will be available in the NFT. If metadata is stored on a centralized server the data could be tampered with, the server could be destroyed or access to the data can be restricted – stripping the NFT of its context and meaning. When an NFT collection facilitates hundreds of thousands of ETH in transactions, has a floor price well above US$100k per NFT, and prices of up to US$70k per kb of image data, users expect every byte of metadata to be just as immutable and permanent as the on-chain NFT.

Figure 3: Crypto Punk Floor Price based on last sale (no floor available at time of writing); Crypto Punk image size based on byte-length of Crypto Punks V2 on-chain byte string. Data as of May 10th, 2022. Sources: OpenSea, on-chain data, IPFS metadata.
Figure 4: Crypto Punk Floor Price based on last sale (no floor available at time of writing); Crypto Punk image size based on byte-length of Crypto Punks V2 on-chain byte string. Data as of May 10th, 2022. Sources: OpenSea, on-chain data, IPFS metadata.

Arguably the value of NFTs is not primarily driven by the image data they refer to, but instead it’s driven by communities that build a movement and an ecosystem around their collections. Nonetheless, without the metadata the NFTs would have no meaning and without meaning communities could not form.

NFTs are not only limited to art collectibles. The data stored within the metadata could be anything, as long as it can be saved as data: legal documents denoting ownership of tangible assets such as the deeds to a property or ownership certificates of financial instruments could be referenced in an NFT. Such data holds an extrinsic off-chain value, and the preservation of every byte of data is at least as valuable as the entire NFT: if by changing only one byte the entire dataset could be invalidated, immutability of the original data holds even greater importance.

How secure is NFT metadata really?

Apart from being the top three NFT collections on OpenSea in terms of total trade volume at the time of writing this piece, above three collections also each use a different approach to store metadata, each with varying levels of security (i.e., immutability and permanence).

Figure 5: Crypto Punks NFT collection smart contract and metadata storage addresses.

Crypto Punks is the most secure, with all metadata and image data being directly stored on-chain. By parsing the metadata and image data smart contract address, you can directly retrieve NFT attributes and raw image data. Because all data is stored on-chain, the NFT inherits the security attributes of the Ethereum chain. These NFTs are immutable, will always be on-chain and will always be accessible as long as the Ethereum network exists.

Figure 6: BAYC NFT collection smart contract and metadata storage addresses.

BAYC stores metadata and image data using the InterPlanetary File System (IFPS), which is a peer-to-peer hypermedia protocol that solves decentralized content addressing. On IPFS once the content receives a content ID (CID), which also acts as a link to the data, it cannot be changed anymore. While IPFS is considered to be censorship-resistant, there are still risks of data being removed from IPFS nodes which would result in the NFT metadata eventually disappearing.

Figure 7: MAYC NFT collection smart contract and metadata storage addresses.

Finally, MAYC stores all NFT metadata on a centralized webserver, which points to images that are hosted on IPFS. While the images are retrievable through their IPFS CIDs, the metadata stored on the webserver can be changed at any time. This means that all NFTs within the MAYC NFT collection can have their traits and images removed, or have their images replaced with other images.

Out of the top three collections, MAYC is the least secure lacking metadata immutability, permanence, and censorship resistance.

Ultimately how the developers implemented the metadata hosting will determine how secure the NFT metadata is. On-chain is the most secure, but extremely expensive, hence not always a good option. Centralized servers run impermanence and mutability risks. Decentralized storage networks present a middle ground that balance cheaper costs with permanence, immutability, and censorship resistance.

The case for dApps

dApps (decentralized applications) are fundamentally different to NFTs, in that dApps enable services that facilitate interaction with a blockchain. A dApp consists of a front-end user interface and sometimes a back-end to enable and facilitate interaction with smart contracts. A smart contract is a self-executing piece of code on a decentralized blockchain network, that users can interact with. In contrast to dApps, regular apps have their backend code on centralized servers on individual devices.

What is special about smart contracts is that all aspects of the smart contract’s operation are written directly into the code and can be publicly reviewed before interacting with the code. Interacting directly with a smart contract, however, requires some technical background and an understanding of how the blockchain and smart contract engine of that blockchain work. To bridge this gap, dApps provide an easy interface for users to interact with a decentralized blockchain network.

Figure 5: Simplified illustration of dApp interaction with a blockchain.
Figure 8: Simplified illustration of dApp interaction with a blockchain.

On decentralized networks that support smart contract execution, every write operation comes at a cost. Sometimes these operations can be quite complex – this is where dApp back-ends come in. dApp backends are different to smart contracts, as they exist primarily to either convert inputs into smart contract compatible inputs or to shift certain computational loads away from smart contracts to optimize reduce gas costs.

The value proposition for dApps is fundamentally different to that of NFTs, as they provide users with a service instead of having their value locked in an asset. dApps are in a constant state of change: improvements and bug fixes are regularly applied, which causes the underlying data to change over time. As a result, a dApp cannot be measured on the value of the underlying data. Instead, the value of the service needs to be measured in the context of the specific dApp. For DeFi (decentralized finance) dApps, the value can be measured based on the asset transfer volumes facilitated through the dApp or total value locked (TVL), while social dApps may focus more on interaction and user metrics.


Description automatically generated
Figure 9: The most popular dApps by US$ volume as reported by DappRadar as of May 11th, 2022.

The above list by DappRadar shows the top ten dApps by volume, collectively facilitating transfers of over US$150bn within the last 30 days at the time of writing. While the dApps listed here are primarily DeFi and Exchange dApps, dApps can fulfill any purpose. As long as the application interacts with blockchains by way of smart contracts through some sort of user interface, the application can be considered a dApp. Other popular dApp categories include, games, metaverses, marketplaces, social media and name services.

But why should dApps be decentralized if users can interact with the core mechanics of the dApp through smart contracts on a blockchain? The answer lies in assuring service availability and permanence. With a decentralized storage network that replicates copies of the data to dozens of nodes, dApps reduce the likelihood of going offline due to server malfunctions, improve resistance to DNS hacks and live on, even if development comes to an end. Also, depending on the decentralized storage network, a certain level of censorship-resistance is also introduced in that no single centralized entity can easily remove the data.

Graphical user interface, application

Description automatically generated
Figure 10: Aave founder Stani Kulechov tweets that the Aave dApp front-end went offline on Jan 20th 2022, but was still accessible through an IPFS-hosted copy of the website. https://twitter.com/StaniKulechov/status/1487754439691845633

How decentralized are dApps really?

A common misconception is that dApps are decentralized in all aspects (as implied by the word “dApp”). While some dApps, such as Uniswap and Aave, go the extra mile to ensure their dApps can be accessed both from a centralized server and as well through decentralized networks, many dApps opt to only host their services on centralized servers. Still, as long as the applications access and interact with smart contracts on decentralized blockchains, these tools are considered dApps.

When measuring the extent of decentralization of dApps in the context of service accessibility there are a few factors to consider:

  1. Is the dApp front-end accessible through decentralized networks?
  2. If yes, to what extent is the data on those networks immutable and permanent?
  3. Could users locally host the dApp front-end to access the services (i.e., is the dApp source code open source?)

If we look deeper into the above top three dApps in terms of volume, we find that out of the top three dApps, all dApps publish their source code allowing for individuals to deploy the dApp frontends to decentralized storage providers. Furthermore, Uniswap and Curve go a step further and directly provide regularly updated CIDs in their ENS (Ethereum Name Service) records.

Figure 11: Uniswap and Curve deployed on IPFS with latest links updated in their ENS records

While the above is only an illustrative example of best practices, currently only few dApps actively decentralize their user interface. The premise of DeFi, which is often also referred to as Open Finance, is to make away with restrictions on who can access and trade financial assets. While the underutilization of decentralized storage does create a break in that narrative, it also creates opportunities for future growth and adoption.

The next piece of this series looks at individual decentralized storage solutions in more detail, covering decentralization mechanics, storage pricing algorithms, and tokenomics. Click here to read it now!


Related content