The archive node will store its data in a PostgreSQL database that node operators host on a provider of their choice, including self-hosting, if desired. However, for redundancy, archive node data can also be stored to an object storage (e.g. Google Cloud Storage; soon S3 & others) or to a
mina.log file, which can live on your computer or be streamed to any typical logging service (e.g. LogDNA).
Archive data is critical for applications that require historical lookup. On the protocol side, archive data is currently important for disaster recovery to reconstruct a certain state, but may not be required in a future version of Mina. To that end, having a single archive node setup might not be sufficient. If the daemon that sends blocks to the archive process or if the archive process itself fails for some reason, there can be missing blocks in the database. To minimize the risk of archive data loss there are a few redundancy techniques that can be employed.
A single archive node setup has a daemon sending blocks to an archive process which writes them to the database.
It is possible to connect multiple daemons to the archive process by specifying the address of an archive process in multiple daemons, thereby reducing the dependency on a single daemon to provide blocks to the archive process.
For example, the server-port of an archive process is 3086, then the daemons can connect to it using the flag
mina daemon \ ..... -archive-address <Ip-address>:3086\
Similarly, it is possible to have multiple archive processes write to the same database. In this case the postgres uri passed to the archive process would be same across multiple archive processes.
However, multiple archive processes writing to a database concurrently could cause data inconsistencies (explained in https://github.com/MinaProtocol/mina/issues/7567). To avoid this, set the transaction isolation level of the archive database to
Serializable using the following query:
ALTER DATABASE <DATABASE NAME> SET DEFAULT_TRANSACTION_ISOLATION TO SERIALIZABLE ;
This should be done after creating the database and before connecting an archive process to it.
To further ensure there that archive data can be restored one can use the following features to backup block data and restore them when necessary.
We have a mechanism in place for logging a high-fidelity machine-readable representation of blocks using JSON including some opaque information deep within. We use these logs internally to quickly replay blocks to get to certain chain-states for debugging. This information suffices to recreate exact states of the network.
Some of the internal data look like this:
This JSON will evolve as the format of the block and transaction payloads evolve in the network.
To indicate a daemon to upload block data to Google Cloud Storage, pass the flag
--upload-blocks-to-gcloud . To successfully upload the file, daemon requires the following environment variables to be set:
GCLOUD_KEYFILE : Key file for authentication
NETWORK_NAME: Network name to be used in the filename to easily distinguish between blocks in different networks (main-net and testnets)
GCLOUD_BLOCK_UPLOAD_BUCKET : Google Cloud Storage bucket where the files are uploaded
The daemon generates a file for each block with the name
<network-name>-<protocol-state-hash>.json . These are called precomputed blocks and will have all the fields of a block.
The daemon also logs the block data if the flag
-log-precomputed-blocks is passed. The log to look for is
Saw block with state hash $state_hash that contains
precomputed_block in the metadata and has the block information. This is the same information (precomputed blocks) that gets uploaded to Google Cloud Storage.
From a fully synced archive database, one can generate block data for each block using the
The tool takes an
--end-state-hash, and an optional --start-state-hash and writes all the blocks in the chain starting from start-state-hash and ending at end-state-hash (including start and end).
If only the end hash is provided, then the tool generates blocks starting with the unparented block closest to the end block. This would be the genesis block if there are no missing blocks in between. The tool generates a file with name
<protocol-state-hash>.json for each block. The block data in these files are called extensional blocks. Since these are generated from the database, they would have only the data stored in the archive database and would not contain any other information pertaining to a block (for example, blockchain snark) that the precomputed blocks would have and therefore, can only be used to restore blocks in the archive database.
Alternatively, instead of specifying state hashes, you can provide the flag
--all-blocks, and the tool will write out all blocks contained in the database.
mina-missing-block-auditor can be used to determine any missing blocks in an archive database. The tool outputs a list of state hashes of all the blocks in the database that are missing a parent. This can be used to monitor the archive database for any missing blocks. The URI of the postgres database can be specified using the flag
Missing blocks in an archive database can be restored if there is block data (precomputed or extensional) available from the options listed above using the tool
mina-archive-blocks --precomputed --archive-uri <postgres uri> FILES
- For extensional blocks: (Generated from option 3)
mina-archive-blocks --extensional --archive-uri <postgres uri> FILES
Staking ledgers are used to determine slot winners for each epoch. Mina daemon stores staking ledger for the current and the next epoch (after it is finalized). When transitioning to a new epoch, the "next" staking ledger from the previous epoch is used to determine slot winners of the new epoch and a new "next" staking ledger is chosen. Since staking ledgers for older epochs are no longer accessible, users may want to still keep them around for reporting or other purposes.
Currently these ledgers can be exported using the cli command-
mina ledger export [current-staged-ledger|staking-epoch-ledger|next-epoch-ledger]
Epoch ledger transition happens once every 14 days (given slot-time = 3mins and slots-per-epoch = 7140). The window to backup a staking ledger is ~27 days considering "next" staking ledger is finalized after k (currently 290) blocks in the current epoch and therefore will be available for the rest of the current epoch and the entire next epoch.