What is an Archive?

4 min readFeb 28, 2022

To quote Subsquid CEO Dmitry Zhelezov (you can subscribe to his newsletter here), “Web3 is eating the software world.” This is for good reason.

Web2 companies, the likes of Facebook, Twitter, Netflix, and Amazon, silo application data in walled gardens. For these industry behemoths, this data is used as a moat, meant to set their businesses apart from the competition. Web3 tears down these walls, making data fully available on-chain.

Thanks to the innovations in terms of distributed computing and data storage that started with Bitcoin and continues on to the Polkadot ecosystem, developers and startups can come along, access this data, and use it to build amazing applications. This, of course, enables an enormous amount of innovation.

However, up until now, there has been an issue for these Web3 builders: processing the data. In order to be truly user-friendly (which implies being cross-chain), DApps need to be able to process many terabytes of blockchain data stored across hundreds of networks. Blockchains, despite their inherent transparency, are a really badly suited database for on-chain data search and retrieval. Much of this complexity comes from the very mechanism by which blockchains work, with new ‘blocks’ being added regularly for what could turn out to be an eternity. Moreover, blockchains receive occasional updates, such as forks or, in the case of parachains, runtime upgrades, which need to be taken into account by DApps.

The solutions developers have proposed in terms of processing on-chain data have to date failed, or proven themselves to be inefficient. For instance, some Web3 builders have chosen to implement centralised API infrastructure into their apps, nullifying their attempts elsewhere at decentralisation. Others have sought to create their own backends from scratch. Building such a solution, however, can be extremely expensive and time-consuming. Just maintaining this infrastructure can be extremely costly, especially for new startups or newly formed DAOs.

Subsquid: Powerful APIs for Web3 Builders

Subsquid exists to make on-chain data easily and quickly accessible to Web3 builders and the users of their applications. Our ultimate mission is to help build a Web3 future where sophisticated software products, including highly-scalable multichain DApps, can enjoy the benefits of decentralised backends at significantly reduced cost and hassle.

Subsquid does this through a multi-layer architecture made up of Squids and Archives. We’ve previously described Squids — the customisable APIs that developers build on the Subsquid Framework– in this article. Now, we’d like to provide a brief overview of the Archives.

What is an Archive?

In simple terms, a Subsquid Archive is a bit like a ‘crawler’ for Web3 — specialised software that systematically browses blockchains in order to collect historical data for processing by Squids (APIs). From a technical perspective, an Archive is an essential component of Subsquid’s multi-layer architecture, responsible for continuously ingesting raw data from the blockchain and processing historical data, as well as new blocks, as they are created. This data is then saved in a database, along with Events and Extrinsics, for easier access through a GraphQL gateway.

Archives should not be confused with Archive nodes, although the concept is vaguely similar, since both preserve the full blockchain history without any pruning. The difference is that Subsquid Archives are special services with specific endpoints, tailored for data lookup and retrieval.

Importantly, Archives at Subsquid are intended to be run and hosted on a distributed network of infrastructure operators. The process towards decentralizing this network is running at full speed, with multiple noteworthy node operators already committed to running Squid Archives.

Following the Subsquid TGE, this system will be supported and secured by the SQD token. Our calculations show that following the token launch, infrastructure providers will be able to earn APYs up to 40%. The setup procedure for Archives is quite simple, and operators will be able to compound their earnings by processing data on multiple networks.

Archive Architecture

The following diagram shows a complete overview of the architecture of a Subsquid Archive:

In essence, the Substrate Archive works to extract block information, Events, and Extrinsics from the blockchain. It then stores this data in a Postgres database, while status updates are saved in a Redis key-value database.

Once stored in the database, all archived data becomes immediately available for client queries. This is made possible through a GraphQL server gateway implemented into all Subsquid Archives.

It is worth noting that an Archive can be shared by multiple Squids. This means that it is possible to segment how data is presented without having to replicate the data source. This opens up many new customisation opportunities for application developers.

Become a Subsquid infrastructure provider and run an Archive. If you would like to start running an Archive using your existing computing infrastructure, please fill out this form and a member of our team will get in touch as soon as possible.

Otherwise, please make sure to follow Subsquid social media and subscribe to our newsletter to stay up-to-date on the status of the project.

Our Channels

Website: http://subsquid.io/

Newsletter: http://eepurl.com/hBOqLT

Twitter: https://www.twitter.com/subsquid

Discord: https://discord.gg/5cQBWHWJvW

Telegram: http://t.me/subsquid

Subsocial: https://app.subsocial.network/@subsquid

What is an Archive?

Subsquid: Powerful APIs for Web3 Builders

What is an Archive?

Archive Architecture

Written by Subsquid