17/06/22
The Need for Decentralized Cloud Storage
Category
Written by
Read time
12 min
Web3, the next iteration of the internet, will be built on decentralized technology across three fundamental pillars: consensus, storage, and computation.
Blockchain technology is what set off a revolution of decentralization and brought about the concept of Web3, representing the idea to not only decentralize consensus, but to use this technology to decentralize the rest of the internet too.
“Web3 is the stack of protocols that enable fully decentralized applications.” – Nader Dabit
Just like Web2, Web3 is a complex amalgamation of a wide array of technologies that together form the Web3 ecosystem. Despite its complexity, we can break down the ecosystem into three key infrastructural pillars that need to be developed to achieve full decentralization of the internet: consensus, storage, and computation.
Consensus has matured quickly since Bitcoin’s launch in 2009, with dozens of other successful models of decentralized consensus having been brought to life since then. Over time attempts at decentralizing storage and computation have emerged that aim to complement these to build the next pillars of a truly decentralized internet.
In this piece we will look at decentralized storage, which describes peer-to-peer networks, in which members combine disk space to create what is essentially a global hard drive that is trustless, immutable, and in some cases permanent and censorship-resistant.
The Need for Decentralized Cloud Storage
In this section we take a close look at the question why we even need decentralized storage. First, we look at decentralized storage from a blockchain perspective. Then, we take a closer look at NFTs and dApps from the perspective of decentralization, immutability, and permanence to understand why decentralized storage is preferred over centralized Web2 storage approaches.
The need for decentralized storage from a blockchain perspective can be examined from two primary perspectives:
Economic: storing data on chain is very expensive. Data that does not need to be stored on a blockchain should not be stored on a blockchain.
Technical: storing data on chain is very inefficient, and blocks only have a limited size. To prevent blocks from being filled up with useless data, we need to offload that data elsewhere.
The economic perspective: how expensive is it to store data?
Storing data directly on a blockchain is extremely expensive, which is why blockchains primarily store transactions (or their outcomes as state data), and which is also why smart contracts are reduced to as few lines of code as possible. This is where decentralized storage networks come in: they store data that is too expensive to store on the blockchain, but need to be permanent, immutable, and resistant to censorship.
If we wanted to store the image file of the Bored Aped Yacht Club #3368 NFT On the Bitcoin Network, we would require at least 1700 OP_RETURN transactions (conservative estimate) to save the entire file, assuming standard consensus rules and node settings (80 bytes of arbitrary data per OP_RETURN, max one OP_RETURN per transaction). With a transaction fee of 12 sats/vB, that’s 0.028 BTC for a single image of the 10,000 piece collection.
Storing the same image data on the Ethereum network’s permanent storage would cost roughly 7.9 ETH at ~95 gwei gas fees requiring nearly 23m gas units in a single smart contract deployment. For most applications, such storage costs are just not feasible.
If we further compare these costs against the cost of storing the same data on a decentralized storage network, we can quickly see that purpose-built storage networks are far more cost-efficient at storing files, while also ensuring permanence, immutability, and censorship resistance – more on that later.
The technical perspective: why should we want to avoid storing data directly on public blockchains?
Blockchains, as the name suggests, consist of blocks that are connected to one another forming a chronological sequence of blocks. Each block points to the previous block to ensure that data in past blocks cannot be adjusted. The data that is contained in the blocks are transactions or state descriptors. Thousands of nodes globally ensure that nobody cheats the system and consensus between the nodes is maintained.
With every block, a set of transactions are added that change the state of values within the network. Since the size of the blocks is capped, only a certain amount of transactions can be processed per block. This gives an implicit time-value to blocks, that is reflected in the fees that network participants are willing to pay to have their transaction confirmed and included in a block.
When a block is filled, transactions stay within a node’s mempool until the block is confirmed and the transactions are added to the next block. If a transaction is not confirmed for an extended period, it may be impacted by slippage or by bots frontrunning the transaction. Storing arbitrary data on blockchains amplifies this issue by occupying blockspace and pushing transactions to be included in later blocks.
The limited supply of blockspace coupled with the demand for transactions to be included in a block thus drives up transaction fees for the entire network, which can dissuade users from interacting with the network.
Arbitrary data on blockchains can be reduced through decentralized storage networks, by offloading those data loads while offering similar characteristics to public blockchains.
But why not store files on centralized networks?
The previous perspectives explain why we shouldn’t store data on blockchains, however, the next question becomes: why store data on decentralized networks? Data could just as easily be stored on a Web2 centralized server. The answer to this is quite simple: to ensure immutability and trustlessness, and to enable permanence and censorship-resistance of the data.
The case for NFTs
Let’s take a look at NFTs: non-fungible tokens (NFTs) represent a unique (i.e., non-fungible) ownership token that is stored on a blockchain and is controlled by a smart contract. The blockchain records who owns the unique token and points to something called metadata, which describes what the token represents. The metadata includes details about the NFT as well as links to other data such as media files – this is what gives the NFT context and meaning.
Metadata can be stored anywhere. As long as the data is accessible through the pointer embedded in the NFT smart contract, the contents will be available in the NFT. If metadata is stored on a centralized server the data could be tampered with, the server could be destroyed or access to the data can be restricted – stripping the NFT of its context and meaning. When an NFT collection facilitates hundreds of thousands of ETH in transactions, has a floor price well above US$100k per NFT, and prices of up to US$70k per kb of image data, users expect every byte of metadata to be just as immutable and permanent as the on-chain NFT.
Arguably the value of NFTs is not primarily driven by the image data they refer to, but instead it’s driven by communities that build a movement and an ecosystem around their collections. Nonetheless, without the metadata the NFTs would have no meaning and without meaning communities could not form.
NFTs are not only limited to art collectibles. The data stored within the metadata could be anything, as long as it can be saved as data: legal documents denoting ownership of tangible assets such as the deeds to a property or ownership certificates of financial instruments could be referenced in an NFT. Such data holds an extrinsic off-chain value, and the preservation of every byte of data is at least as valuable as the entire NFT: if by changing only one byte the entire dataset could be invalidated, immutability of the original data holds even greater importance.
How secure is NFT metadata really?
Apart from being the top three NFT collections on OpenSea in terms of total trade volume at the time of writing this piece, above three collections also each use a different approach to store metadata, each with varying levels of security (i.e., immutability and permanence).
Crypto Punks is the most secure, with all metadata and image data being directly stored on-chain. By parsing the metadata and image data smart contract address, you can directly retrieve NFT attributes and raw image data. Because all data is stored on-chain, the NFT inherits the security attributes of the Ethereum chain. These NFTs are immutable, will always be on-chain and will always be accessible as long as the Ethereum network exists.
BAYC stores metadata and image data using the InterPlanetary File System (IFPS), which is a peer-to-peer hypermedia protocol that solves decentralized content addressing. On IPFS once the content receives a content ID (CID), which also acts as a link to the data, it cannot be changed anymore. While IPFS is considered to be censorship-resistant, there are still risks of data being removed from IPFS nodes which would result in the NFT metadata eventually disappearing.
Finally, MAYC stores all NFT metadata on a centralized webserver, which points to images that are hosted on IPFS. While the images are retrievable through their IPFS CIDs, the metadata stored on the webserver can be changed at any time. This means that all NFTs within the MAYC NFT collection can have their traits and images removed, or have their images replaced with other images.
Out of the top three collections, MAYC is the least secure lacking metadata immutability, permanence, and censorship resistance.
Ultimately how the developers implemented the metadata hosting will determine how secure the NFT metadata is. On-chain is the most secure, but extremely expensive, hence not always a good option. Centralized servers run impermanence and mutability risks. Decentralized storage networks present a middle ground that balance cheaper costs with permanence, immutability, and censorship resistance.
The case for dApps
dApps (decentralized applications) are fundamentally different to NFTs, in that dApps enable services that facilitate interaction with a blockchain. A dApp consists of a front-end user interface and sometimes a back-end to enable and facilitate interaction with smart contracts. A smart contract is a self-executing piece of code on a decentralized blockchain network, that users can interact with. In contrast to dApps, regular apps have their backend code on centralized servers on individual devices.
What is special about smart contracts is that all aspects of the smart contract’s operation are written directly into the code and can be publicly reviewed before interacting with the code. Interacting directly with a smart contract, however, requires some technical background and an understanding of how the blockchain and smart contract engine of that blockchain work. To bridge this gap, dApps provide an easy interface for users to interact with a decentralized blockchain network.
On decentralized networks that support smart contract execution, every write operation comes at a cost. Sometimes these operations can be quite complex – this is where dApp back-ends come in. dApp backends are different to smart contracts, as they exist primarily to either convert inputs into smart contract compatible inputs or to shift certain computational loads away from smart contracts to optimize reduce gas costs.
The value proposition for dApps is fundamentally different to that of NFTs, as they provide users with a service instead of having their value locked in an asset. dApps are in a constant state of change: improvements and bug fixes are regularly applied, which causes the underlying data to change over time. As a result, a dApp cannot be measured on the value of the underlying data. Instead, the value of the service needs to be measured in the context of the specific dApp. For DeFi (decentralized finance) dApps, the value can be measured based on the asset transfer volumes facilitated through the dApp or total value locked (TVL), while social dApps may focus more on interaction and user metrics.
The above list by DappRadar shows the top ten dApps by volume, collectively facilitating transfers of over US$150bn within the last 30 days at the time of writing. While the dApps listed here are primarily DeFi and Exchange dApps, dApps can fulfill any purpose. As long as the application interacts with blockchains by way of smart contracts through some sort of user interface, the application can be considered a dApp. Other popular dApp categories include, games, metaverses, marketplaces, social media and name services.
But why should dApps be decentralized if users can interact with the core mechanics of the dApp through smart contracts on a blockchain? The answer lies in assuring service availability and permanence. With a decentralized storage network that replicates copies of the data to dozens of nodes, dApps reduce the likelihood of going offline due to server malfunctions, improve resistance to DNS hacks and live on, even if development comes to an end. Also, depending on the decentralized storage network, a certain level of censorship-resistance is also introduced in that no single centralized entity can easily remove the data.
How decentralized are dApps really?
A common misconception is that dApps are decentralized in all aspects (as implied by the word “dApp”). While some dApps, such as Uniswap and Aave, go the extra mile to ensure their dApps can be accessed both from a centralized server and as well through decentralized networks, many dApps opt to only host their services on centralized servers. Still, as long as the applications access and interact with smart contracts on decentralized blockchains, these tools are considered dApps.
When measuring the extent of decentralization of dApps in the context of service accessibility there are a few factors to consider:
- Is the dApp front-end accessible through decentralized networks?
- If yes, to what extent is the data on those networks immutable and permanent?
- Could users locally host the dApp front-end to access the services (i.e., is the dApp source code open source?)
If we look deeper into the above top three dApps in terms of volume, we find that out of the top three dApps, all dApps publish their source code allowing for individuals to deploy the dApp frontends to decentralized storage providers. Furthermore, Uniswap and Curve go a step further and directly provide regularly updated CIDs in their ENS (Ethereum Name Service) records.
While the above is only an illustrative example of best practices, currently only few dApps actively decentralize their user interface. The premise of DeFi, which is often also referred to as Open Finance, is to make away with restrictions on who can access and trade financial assets. While the underutilization of decentralized storage does create a break in that narrative, it also creates opportunities for future growth and adoption.
The next piece of this series looks at individual decentralized storage solutions in more detail, covering decentralization mechanics, storage pricing algorithms, and tokenomics. Click here to read it now!
SHARE THIS PIECE
Related content
In this piece @lingchenjaneliu explores the staking market landscape, including key actors and top players, as well as latest developments in Proof-of-Stake. Special thanks to @0xPhillan for ...
The year of 2022 was a year of great highs and new lows for the Web3 industry. Read more about the defining events of 2022!
Metaverse Concept and History The term "metaverse" was invented by American author Neal Stephenson in his science fiction novel titled Snow Crash, published in 1992. ...
Short answer: Account abstraction creates a new account type which exists as a smart contract. By having the account exist as a smart contract, transaction ...