06/09/22

Swarm: an Ethereum-based P2P Storage Protocol

Written by

Read time

11 min

Swarm is a decentralized storage network built on the Ethereum network and incentivized through the BZZ token, an Ethereum-based ERC20 token. Swarm’s vision is to “extend the blockchain with peer-to-peer storage and communication to realize the world computer that can serve as an operating system and deployment environment for decentralized applications”.

Swarm also aims to provide freedom of information through permissionless publishing and privacy through features such as anonymous browsing, deniable storage, untraceable messaging, and file representation formats that leak no metadata.

Anybody with additional hard drive space and bandwidth can join the Swam.

The network has been in development as early as 2015, and as of February 2022 the network could reliably upload and download roughly 5mb of data at speeds of 6.47MiB/s up and 12.47MiB/s down. Finally, data on Ethereum swarm is accessible through human readable formats and can be resolved through ENS domains.

Token

The utility token of the Swarm network is the BZZ token, which is an ERC20 token that lives on the Ethereum blockchain. The token supply is dynamic and changes with purchases/sales of the token from and to the bonding curve, which determines the price of the token. More details in the tokenomics section.

Storage Technology & Consensus Mechanism

Swarm consists of four interconnected yet clearly separable layers that together form the infrastructure of Swarm, of which the overlay network and an API to access that network form the core of the Swarm protocol.

Table

Description automatically generated
Figure 1: Swarm’s layered design. Source: The Book of Swarm

The overlay network determines how files are stored and represents the protocols underlying storage model. The storage model developed by Swarm is called Distributed Immutable Store of Chunks (DISC) and forms the basis of how to nodes communicate with each other. Swarm is essentially a slightly different interpretation of a distributed hash table (DHT), similar to how IPFS nodes manage and keep track of the various nodes they have connected to, but instead of storing where files are to be found DISC directly stores tiny pieces of files (i.e., chunks) – more on that later.

In Swarm, nodes are expected to make decisions in regards to which other nodes to connect to based on their proximity so that local connection decisions can reach globally optimal routing of messages (known as Kademlia connectivity). Every node tracks both the network address and Swarm address of other nodes, the latter of which tracks enables to define the proximity of two addresses to each other.

Using the degree of proximity, nodes that are closest to each other form a fully connected neighborhood requiring a minimum of 8 nodes, and also connect to 8 further nodes each in an increasingly lesser different degree of proximity to the neighborhood (i.e., they are farther away from each other).

Circle

Description automatically generated with medium confidence
Figure 2: Kademlia connectivity used in Swarm. Source: Swarm Whitepaper

This network design ensures that messages intended for nodes that are very far away from each other can always reach their destination, even if the nodes are not directly connected.

Swarm stores data on these nodes as chunks, which represent 4 kilobytes of data with an address that exists in the same address space as node addresses, enabling the calculation of proximity of nodes and chunks. Swarm requires nodes that are in close proximity of a data chunk to store that same chunk locally, thus creating clusters of replicated data within a neighborhood. Since a chunk is essentially just a segment of a larger file, without the context of what the full file is meant to be, nodes are unable to rebuild the full file. Chunks can be encrypted for additional privacy. Furthermore, to ensure file redundancy and consistent availability when nodes leave or join the network, nodes continuously synchronize chunks with their neighbors.

To retrieve a chunk from a neighborhood, a client communicates with a node which is in close proximity to itself requesting the retrieval of the chunk. Using the chunk address and the Kademlia algorithm, the nodes recursively forward the message through various nodes in varying proximity layers until they reach the neighborhood hosting the file, which then returns the file along the same route. If any node along the way happens to have the chunk in their local storage, it is sent back as a response instead. Nodes are incentivized to cache chunks of data to reduce bandwidth usage of the network. This is achieved through opportunistic caching, which refers to the caching chunks of distant neighborhoods to receive payment for retrieval of those chunks. Cached data lives in the caching subsystem of a Swarm node.

In Swarm, each node has two local subsystems, namely the reserve and the cache. In simple terms, the reserve stores chunks that have postage stamps attached to it. Postage stamps are purchased through BZZ tokens and indicate the value a user places on storing these files on Swarm. When a file is stored on a node, the postage stamp acts as a sort of rent that decreases over time. Once the value of the stamp reaches a certain threshold, it is moved from the reserve (i.e., paid for storage) to the cache.

The cache stores chunks that are not protected by the reserve, either because the storage stamp value has reduced over time, or because the cached chunk is from a distant node. Chunks in cache are ranked by their latest retrieval as a means to indicate the popularity of the chunk and whether it is worth to continue storing the chunk. The cache is regularly cleared of unpopular chunks, ensuring that popular content is permeated across the network and easily retrievable, while also maximizing income for nodes: When nodes on the network return a chunk from a retrieval request, the nodes earn BZZ tokens, hence economically incentivizing the holding of as many chunks as possible.

The method of retrieving and transferring files described above increases anonymity in the network, because a node forward request and an initial request initiation are identical in terms of structure. This ambiguity obfuscates the identity of those retrieving files.

However, this approach leads to unpopular chunks of data to disappear from nodes overtime, thus impacting permanence of the system. To combat this, Swarm implements a postage lottery system called “RACE” (raffle, apply, claim and earn), that is executed through smart contracts on the Ethereum blockchain. While the detailed mechanism of race goes beyond the scope of this research, it suffices to know that these raffles:

  • act as spot checks on nodes
  • for nodes presents an opportunity to earn additional income
  • encourage nodes to stay online as they would otherwise miss raffles
  • require nodes to store the right data and properly maintain the stored chunks

Finally, in order to reconstruct files, nodes need to be able to understand which chunks belong to which files, and the downloader needs to be able to verify the correctness of the chunks. In Swarm, every chunk address is unique, implying a unique address-payload association. This uniqueness creates the immutability of the chunk, as only that chunk can contain the data embedded in it. The canonical content addressed chunk in Swarm is called a binary Merkle tree chunk (BMT chunk), and the address of BMT chunks is calculated using the binary tree hash algorithm (BMT hash).

Swarm has two kinds of chunks; a content addressed chunk and a single owner chunk. While these differ in terms of data structure, both use the BMT hash verify chunk integrity and to reconstruct the full file. Ultimately, users can use their Swarm hash (also known as bzzhash) to signal to the network to retrieve all chunks and recreate the file. For more details on the BMT hashing algorithm, please refer to The Book of Swarm.

For additional protection against data loss, caused by nodes going offline or being otherwise unable to access data, Swarm applies Cauchy-Reed-Solomon erasure coding to 4 kilobyte sized chunks of the file before they are hashed into the Merkle tree. This allows the network to retrieve data, even when a portion of the chunks are inaccessible.

Finally, Swarm includes a pinning function which allows nodes to save all chunks locally and prevent the chunks from being removed.

Data Permanence & Pricing Mechanics

Swarm applies the Swarm Account Protocol (SWAP) to incentivize nodes to collaborate with each other in routing messages, while decreasing frivolous bandwidth use. Nodes track the relative bandwidth consumption with peers they connect with, creating a debts and credits balancing mechanism between any two nodes at any given point in time. When node A requests data from node B, and node B responds, then node B has a credit surplus, while node A has debt liabilities. This can continue until a certain threshold is reached, after which node B will not accept further requests until node A has repaid liabilities. To pay back liabilities and have both nodes return to a balanced state of fewer debts and liabilities within the threshold, node A can either wait for reciprocal requests from node B which would reduce node A’s debts, or they send a cheque that can be cashed out for BZZ tokens to node B pay back the debt. This creates a “service-for-service” relationship between nodes.

Diagram

Description automatically generated
Figure 3: Swarm Accounting Protocol. Source: Swarm whitepaper.

Cheques are handled on-chain by a smart contract. Nodes must decide for themselves whether to cash a cheque upon receipt, or to wait to reduce transaction costs on the Ethereum network. If the node waits, however, they increase the risk of settlement failure, i.e., the check can bounce due to the cheque provider moving funds outside of their chequebook wallet. This is where Swarm employs a reputation system: because the smart contract records failed cheque withdrawals, nodes can see publicly which other nodes did not make good on their cheques and can refrain from communicating with that node in the future.

Apart from the SWAP protocol, nodes can also earn additional BZZ by holding unpopular data and participating in the RACE lottery system.

The SWAP and RACE systems are positive incentive mechanisms. Swarm also employs negative incentive systems called “competitive insurance”. Competitive insurance requires nodes to store every bit of promised, and failure to do so is not only unprofitable, but outright catastrophic to the insurer. While SWAP incentivizes short term data storage, and RACE incentivized long-term data storage of popular files, competitive insurance incentivizes long-term data storage of any files stored on the network no matter their popularity, as well as simultaneously prevent users from spinning up new nodes to sell empty long-term storage promises, only to cash-out and deactivate their node shortly after.

The competitive insurance system works with a deposit system. Nodes that want to sell long-term storage (aka promissory storage) must have a stake verified and locked-in with an Ethereum-based smart contract at the time of making their promise – essentially a security deposit. If the security deposit has been locked, the node is entitled to make storage promises up to the duration of the locked stakes. If, during the promise period, a node fails to prove ownership of the data they promised to store, they lose their entire security deposit. If a user or a node finds that content with a promise is inaccessible, they can submit a challenge to a smart contract that handles the verification process.

Nodes are compensated for their promises over time. When a user stores data on a node, they pay up-front for the entire storage duration. This amount is locked, and is released in installments to the node as long as they can provide proof of custody of the files.

Swarm let’s end users, through client software, determine the amount of data and duration that data is to be stored on the network. The price for storage is automatically calculated through the client software and smart contracts. Swarm allows for different levels of data retention:

  • minimal – a few hours
  • temporary – a week
  • long term – a year
  • forever – 10 years

The purchase of storage space over time in Swarm is called a postage subscriptions, and they are managed by the postage subscription API, which shows users how much data of a specific subscription has been uploaded and for how long it can be stored at its current price (e.g. 88/100 megabytes for 23 days).

To access one’s content on Swarm, one either needs to run a Swarm node, or use a public gateway. This setup is similar to that of IPFS. Swarm recommends to only retrieve encrypted content through one’s own node, as the if using a public gateway, as soon as the content leaves the gateway and is transmitted over http to the user it will not be encrypted anymore.

Tokenomics

While the project has been with the Ethereum Foundation since 2015 within their Geth team, only in June 2021 Swarm launched a public token sale. Details of the token distribution can be found below:

  • 27.6 million BZZ (42%) – Early Token Sale (Early backers in Sept 2020, Private round in Dec 2020)
  • 5.17 million BZZ (8%) – Public Token Sale (June 2021, unlocked in August 2021)
  • 15.87 million BZZ (23%) – Ecosystem (Infrastructure for L1 solutions, development, airdrops, grants, donations)
  • 12.50 million BZZ (19%) – Present and Future Team Members
  • 4.9 million BZZ (7%) – Swarm Foundation (protocol, network and business development, marketing and community support)

The price of the BZZ token is determined by a bonding curve that is controlled by smart contracts instead of traditional market makers, where purchases and sales of tokens to the bonding curve will directly adjust the price that users pay for new tokens. This makes it prohibitively expensive to buy or dump large amounts of tokens at once, thus protecting the utility of the token against speculative actions.

Diagram

Description automatically generated
Figure 4: Swarm bonding curve explanation on Bzzaar exchange (bzz.echange). Source: https://medium.com/ethereum-swarm/swarm-and-its-bzzaar-bonding-curve-ac2fa9889914

As a result of this bonding curve, the actual circulating supply of tokens is in constant fluctuation. Although there is a hardcoded maximum of 125 million BZZ tokens, it’s extremely unlikely the higher end of token supply will ever circulate, due to the shape of the bonding curve which steepens heavily.

Figure 5: Shape of BZZ bonding curve. Source: https://medium.com/ethereum-swarm/swarm-and-its-bzzaar-bonding-curve-ac2fa9889914

The bonding curve smart contract address can be found here: https://etherscan.io/address/0x4f32ab778e85c4ad0cead54f8f82f5ee74d46904

Although the bonding curve is fully automated, the Swarm Foundation maintains control to manually shut down the bonding curve in emergency situations, which include:

  • A critical or exploitable bug in the bonding curve contract is discovered
  • MakerDAO discovers a critical bug or is shut down for any reason; and
  • DAI loses its peg to the USD

As of writing, there are roughly 63.5 million BZZ tokens in circulation, which is -2.7 million BZZ tokens below the 66.2 million tokens that were minted and distributed during token launch. Indicating that tokens have likely been sold back to the bonding curve since launch.

Real-time circulating supply: https://tokenservice.ethswarm.org/circulating_supply

Real-time bonding curve: https://tokenservice.ethswarm.org/token_price

References

Altcoin Disrupt (2021) What is Swarm? ICO Upcoming? Will Swarm 100X? Decentralized? $10K – $100K. Available at: https://www.youtube.com/watch?v=rxPlYf9Pe2A

Coding Bootcamps (n.d.) How to Work with Ethereum Swarm Storage. Available at: https://www.coding-bootcamps.com/blog/how-to-work-with-ethereum-swarm-storage.html

ETHDenver (2022) The State Of Ethereum Swarm – Angela Vitzthum. Available at: https://www.youtube.com/watch?v=22HfkeEmOK4

Ethereum Wiki (2020) Swarm Hash. Available at: https://eth.wiki/concepts/swarm-hash

Munair (2021) A Case for Swarming Medical History. Available at: https://munair.medium.com/a-case-for-swarming-medical-history-77baa5e40424

Swarm (n.d.) Swarm Docs. Available at: https://docs.ethswarm.org/docs/

Swarm Hive (2021) BZZ Tokenomics. Available at: https://medium.com/ethereum-swarm/swarm-tokenomics-91254cd5adf

Swarm Hive (2021) Swarm and its “Bzzaar” Bonding Curve. Available at: https://medium.com/ethereum-swarm/swarm-and-its-bzzaar-bonding-curve-ac2fa9889914

Swarm team (2021) SWARM: Storage and communication infrastructure for a self-sovereign digital society. Available at: https://www.ethswarm.org/swarm-whitepaper.pdf

Thebojda (2022) A Brief Introduction to Ethereum Swarm. Available at: https://hackernoon.com/a-brief-introduction-to-ethereum-swarm

Trón, V. (2021) The book of Swarm: storage and communication infrastructure for self-sovereign digital society back-end stack for the decentralised web. V1.0 pre-release 7. Available at: https://www.ethswarm.org/The-Book-of-Swarm.pdf

SHARE THIS PIECE

Related content