3rd update of 2022 on BlockTrades work on Hive software

![blocktrades update.png](https://images.hive.blog/DQmSihw8Kz4U7TuCQa98DDdCzqbqPFRumuVWAbareiYZW1Z/blocktrades%20update.png) Below are highlights of some of the Hive-related programming issues worked on by the BlockTrades team during the past month. # Hived (blockchain node software) work We finally got around to a long-needed overhaul of the communication between the blockchain thread (this processes blocks, transactions and operations) and the peer-to-peer thread (this thread gets all that data from peer nodes in the Hive p2p network and puts it on a “write_queue” for the blockchain thread to process). The p2p code was originally written for another blockchain (BitShares) and when it was incorporated into Steem’s code (and later Hive code), a few things were somewhat broken. The most problematic issue was that the p2p thread employs cooperative multi-tasking (non-OS level fibers/tasks implemented in a library called “fc”) to service requests from peers in the network in a timely fashion. Whenever a peer provides a new block or transaction, the p2p task assigned to this peer adds this new item to the blockchain’s write_queue and blocks until the blockchain signals that it has processed that new item (this task is waiting to find out if the blockchain says the new item is valid or not, so that it knows whether or not to share it with other peers). When all the code was using fc-task-aware blocking primitives (fc::promise), a new p2p task could be started as soon as the one placing the item on the queue blocked. But when this code was transplanted to Steem, the blocking code was switched to use boost:promise objects, which are not fc-task-aware, so no new p2p task would get scheduled whenever a p2p task placed a new item on the write_queue, effectively stopping the p2p thread from doing any more work until the blockchain finished processing the new item. If the blockchain took a while to process items on the queue, this could even result in a loss of peers because they decided that this hived node wasn’t responding in a timely manner and disconnected from it. But generally it just slowed down the speed of the p2p network. To resolve this problem, we changed the code so that the p2p tasks now block using an fc::promise when placing items on the write_queue. We also create new task that waits on this response, so that the primary task for the peer can continue to manage communication with the peer while our new task is awaiting a response from the blockchain thread. Now, there is one other way this write queue can be written to, via an API call to broadcast_transaction (or it’s bad cousin, broadcast_transaction_synchronous), and these still use a boost promise (and should do so). So the write queue processor in the blockchain thread has to respond to each item that gets processed by waking up the associated blocked task using either an fc promise or a boost promise, depending on the source of the new item (either the p2p thread or the thread used to process the API call). Another problem we fixed was the “potential peer” database was not getting saved to the peers.json file when hived exited. This meant that when a hived node was restarted, it always had to first connect to at least one of Hive’s public “seed nodes” that are hardcoded into hived itself before it could find other peers in the network. This created an undesirable centralization point for hived nodes that needed to re-connect to the network after being restarted. Now that this file is being properly saved off at shutdown, a restarting node can try to connect to any of the peers in its peer database, not just the seed nodes. The new code also periodically saves to the peer database, allowing a user to inspect the “quality” of the peers connected to their node (e.g. how long since a given peer has failed to respond to a request). While we were fixing this problem, we also took the opportunity to improve the algorithm used by the node to select from potential peers stored in the database. Now the node will first try to connect to new peers it saw most recently that it didn’t have any communication problems with. Next it will retry connecting to peers that it did experience an error with, trying first to connect to peers where the error happened longest ago. We had reports that the p2p layer was also under-performing when a new node was synced to other peers from scratch. For this reason, most experienced hived node operators don’t use this method, but instead download a block_log file with all the blockchain data from a trusted source. But since we were working in the p2p code anyways, we decided to spend some time optimizing the sync performance of the p2p layer. We made many improvements to the algorithms used during sync, and our tests have shown that sync time is now solely bottlenecked by the time required by the blockchain to process blocks (i.e.any further improvements to the p2p syncing process would not further speed up the overall syncing process, they would only lower CPU usage by the p2p thread). This is true even when we set a block_number/block_hash checkpoint when launching a sync of hived. Setting a checkpoint allows the blockchain thread to do less work. Observing the speed with which blocks were processed during our testing, I would guess the blockchain was almost 3x faster at processing blocks before the checkpoint block number. So even when the blockchain thread is configured to do the least amount of work, it is still the performance bottleneck now, and we would need to substantially speedup blockchain processing before it would make sense to look at making further improvements to p2p sync performance. ### Command-line-interface (CLI) wallet improvements The CLI wallet is a command-line wallet that is mostly used by a few expert hive users and cryptocurrency exchanges. It also is useful for automated testing purposes. We’ve been refactoring the CLI wallet code to ease future maintenance of this application and also improving the wallet API (this is an API provided by the CLI wallet process that can be used by external processes such as Hive-based applications or automated scripts). As part of the improvement process, we’ve also added an option to API calls to control how the output from the API call is formatted (for example, as web client-friendly json or as human-friendly tabular data). # Hive Application Framework (HAF) We’re currently adding filtering options for operating “lightweight” HAF servers that store less blockchain data. The first filtering option we’re adding uses the same syntax used by the account history plugin to limit operations and transactions stored to those which impact a specific set of accounts. This form of filtering will allow hafah servers to duplicate the lightweight behavior of a regular account history node that is configured to only record data for a few accounts (for example, exchanges often operate their own account history node in this mode to save storage space). We’ve spent a lot of time creating more tests for account history functionality, and further verifying the results of hafah against the latest development version of hived’s account history plugin (and we’ve also verified the performance of that versus the master branch of hived’s account history plugin deployed in production now). # HAF account history app (aka hafah) We’re periodically testing hafah on our production system, then making improvements whenever this exposes a performance problem not discovered by automated testing. We’re also finishing up work now for creating dockerized HAF servers and modifying the continuous integration process (i.e. automated testing) for all HAF-based apps to re-use existing dockerized HAF servers when possible. We’re using hafah’s CI as the guinea pig for this process. This will allow for much faster testing time on average, especially when we want to run benchmarks on a HAF app with a fully populated HAF database. Currently it takes over 10 hours to fully populate a HAF database with 60M+ blocks from scratch. Conceptually, the idea is simple: if a HAF app (e.g. hafah) needs a specific version of a HAF server populated with a certain number of blocks, the automated testing system will first see if it can download a pre-populated docker image from gitlab’s docker registry (or a local cache on the builder) with the proper version of HAF and the required amount of block data. Only if this fails will it be required to create one itself (which can then be stored in the docker registry and re-used in subsequent test runs). # Hivemind (social media middleware server used by web sites) We’ve added a new person to the Hive development team who is working on conversion of Hivemind to a HAF-based app (under Bartek’s supervision). # What’s next? * Modify the one-step script for installing HAF to optionally download a trusted block_log and block_log.index file (or maybe just allow an option for fast-syncing using a checkpoint to reduce block processing time now that peer syncing process is faster and may actually perform better than downloading a block_log and replaying it). * Continue work on filtering of operations by sql_serializer to allow for smaller HAF server databases. * Collect benchmarks for hafah operating in “irreversible block mode” and compare to a hafah operation in “normal” mode. * Further testing of hafah on production servers (api.hive.blog). * Finish conversion of hivemind to a HAF-based app. * Complete testing of new P2P code under forking conditions and various live mode scenarios and in a mirrornet testnet using only hived servers with the new P2P code (tests so far have only been performed on the mainnet in a mixed-hived environment). * Experiment with methods of improving block finality time. * Complete work on resource credit rationalization. Current expected date for the next hardfork is the latter part of April assuming no serious problems are uncovered during testing over the next month.

See: 3rd update of 2022 on BlockTrades work on Hive software by @blocktrades