4th update of 2022 on BlockTrades work on Hive software
Below are highlights of some of the Hive-related programming issues worked on by the BlockTrades team since my last post.
# Hived (blockchain node software) work
### Optimization of Resource Credit (RC) calculations
During our work on rationalizing the resource credit costs of blockchain operations, we discovered that the same resource credit calculations were being performed several times for a block, so the code has been optimized to avoid the unnecessary work.
### Testing and fixing bugs in new wallet_api code
We continued testing and fixing bugs that the tests uncovered in the new wallet_api code that I discussed last week. I think we’re nearly done with this task.
### Enhancing our testing system
As we get closer to a potential hardfork date, the capabilities of our testing system becomes more important, especially since we’ve made changes to some very low-level code in the upcoming release. So we’re expanding the capabilities of our “testtools” testing system to model complex systems that include not only varying numbers of witness nodes, but also multiple API nodes as well.
### Testing the new p2p code with testtools-based testnets
Using testtools, we’ve been creating new testnet-based tests to look for bugs in the updated peer-to-peer (p2p) network code that I discussed last week.
The first bug we found turned out not to be a bug in the p2p code at all: the new p2p code was so much faster than the old code that it exposed a problem in the test itself. The test assumed that the node would take a certain amount of time before it received a block, but with the faster p2p network, the node got the block much earlier, breaking the test’s expectation and causing it to report a failure. So in this case, we ended up fixing the test itself.
### Testing the new p2p code using a mirrornet
In addition to using testtools, we’ve setup a mirrornet to test with. A mirrornet is a testnet setup with testtools that has a blockchain history that “mirrors” the blocks on the mainnet, and it also translates transactions it observes in real-time from the mainnet to the mirrornet, so that we can more accurately model real-world loading conditions in our testing.
The mirrornet is particularly important when we want to do performance testing, as it allows us to do testing that we could otherwise not do easily, even on the mainnet (because on the mainnet, we can’t examine scenarios where all the nodes are running the new code, it is a “mixed-node” environment consistently of old nodes and new nodes and we can’t control which of these nodes are the block producers).
We found one bug already using the mirrornet (it was a latent bug in the fork database that got exposed by the p2p changes) and we’re planning to create further tests to look for more bugs.
### Planning to do flood testing next
We’re also planning to extend the mirrornet’s capability to allow for “flood testing” where we broadcast a large number of transactions to the API nodes in a short period of time. This will allow us to stress-test the network and look for more areas where we may be able to further speed up node performance.
### Speeding up build-and-test time (CI)
We’ve updated our build-and-test system (CI) for hive and HAF to use the [ninja build tool](https://ninja-build.org/) instead of “make” to speed up our build times. Among other things, ninja is better at determining real dependencies in the build process and therefore allows for more parallel execution of subtasks in the build process.
By switching to ninja and making a few other configuration changes to our CI runner systems, we were able to drop average build time ranges for the hive repo from a previous range of 40-60mins down to an average of 25mins.
Similar changes to the HAF CI flow led to even better improvements in the overall build-and-test time: previously it took 60-90mins, now it is down to a range of 30-40mins to complete.
It is worth noting that devs can also use ninja for their local builds instead of make (this isn’t just for speeding up CI times), and we highly recommend it as a faster alternative.
# Hive Application Framework (HAF)
### Filtering of operations using regexes to allow for “small” HAF servers
We completed filtering of hive operations based on account names, and we’re now working on filtering of operations based on regexs in the sql_serializer plugin. This latter type of filtering should allow standalone HAF apps operate with very small HAF databases. I estimate that just filtering out splinterlands and hive-engine related custom_json operations should drop storage requirements for an app by 90%. I expect this work will be completed in the next week.
### Benchmarking of alternative file systems for HAF-based
Although filtering of operations should enable stand-alone HAF apps to operate with very small databases, public API nodes using HAF will still want to store all the blockchain data in order to support account history API calls (via hafah) and also to enable them to run any open-source HAF app.
Unfortunately, on an ext4 file system, a HAF database with the entire blockchain consumes around 2.7TB. This isn’t horrible, but it’s not small either, so we’ve started investigating using ZFS and compressed ZFS filesystems as an alternative.
We’re still deep into performing these benchmarks because they take a long time to run and consume a lot of hardware resources, but preliminary results are very encouraging. Using a 2TB fast nvme drive configured with a ZFS file system and lz4 compression, a 62M block HAF database was filled in only 12 hours and fit in 1.2TB.
This is a particularly appealing hardware setup, because it would give full API nodes a relatively cheap option for their storage needs (and would likely fit on the existing servers used by most public API nodes).
There’s also a nice compression of the blockchain (block_log file) on these drives as well: compressed size of 361GB vs uncompressed size of 549GB.
We still need to do more performance testing on the serving side of HAF in this configuration (i.e. API performance) and test more hardware and software configurations to determine the optimal setup, and we’ll be continuing that work in the coming week.
# HAF account history app (aka hafah)
We’re periodically testing hafah on our production system, then making improvements whenever this exposes a performance problem not discovered by automated testing.
### Improving API performance
One performance problem that came up recently was an API call where hafah was taking 7 minutes to respond. We found a relatively simple initial solution that speeded it up to an acceptable level, but still worse than a standard account history node. Nobody likes to lose to existing software on a performance metric however, so we looked further and found a slightly more code-intensive solution that dropped the time down to 0.3s (which we’re pretty happy about).
This solution will unfortunately require a little more data to be written to the HAF database by the sql_serializer plugin, so it won’t be fully completed until sometime next week.
### Dockerized HAF servers for app testing
We’re continuing to work on creating dockerized HAF servers and modifying the continuous integration process (i.e. automated testing) for all HAF-based apps to re-use existing dockerized HAF servers when possible. This work is primarily being done in the Hafah repo.
We made some good progress on this task, but then got distracted by the need to optimize the performance of the slow queries mentioned above. The work will resume on this task shortly.
# Hivemind (social media middleware server used by web sites)
We continued to work on conversion of Hivemind to a HAF-based app. We now have a version of hivemind that no longer needs to make any API calls to a hived node (i.e. now all indexing needs are filled from the data inside the HAF database).
# What’s next?
* Modify the one-step script for installing HAF to optionally download a trusted block_log and block_log.index file (or maybe just allow an option for fast-syncing using a checkpoint to reduce block processing time now that peer syncing process is faster and may actually perform better than downloading a block_log and replaying it). This task is on hold until we have someone free to work on it.
* Continue work on filtering of operations by sql_serializer using regexs to allow for smaller HAF server databases.
* Collect benchmarks for hafah operating in “irreversible block mode” and compare to a hafah operation in “normal” mode.
* Further testing of hafah on production servers (api.hive.blog).
* Finish conversion of hivemind to a HAF-based app.
* More testing of new P2P code under forking conditions and various live mode scenarios and in a mirrornet testnet using only hived servers with the new P2P code.
* Experiment with methods of improving block finality time.
* Complete work on resource credit rationalization.
* Continue benchmarking of HAF and Hafah on ZFS and EXT4 file systems with various hardware and software configurations.
Current expected date for the next hardfork is still the end of April assuming no serious problems are uncovered during testing over the next week or two. But that date is rapidly approaching, so it is going to take a fairly herculean effort if we are to complete the key tasks in this timeframe.
See: 4th update of 2022 on BlockTrades work on Hive software by @blocktrades