6th update of 2022 on BlockTrades work on Hive software
![blocktrades update.png](https://images.hive.blog/DQmSihw8Kz4U7TuCQa98DDdCzqbqPFRumuVWAbareiYZW1Z/blocktrades%20update.png)
Below are highlights of some of the Hive-related programming issues worked on by the BlockTrades team since my last post.
# Hived (blockchain node software) work
### Optimization of Resource Credit (RC) calculations
We’re currently analyzing how proposed changes to RC calculations will affect different types of Hive users. As a first step to this, we’re adding code to insert RC data generated during the replay of the blockchain into a database to give us a flexible analysis environment.
### Testing and fixing bugs in rc delegation and new wallet_api code
We’ve continued to create new tests to cover bugs discovered (and fixed) in RC delegation and wallet_api code. I believe this work will be completed by tomorrow.
### Testing the new p2p code with testtools-based testnets
Our new tests that simulate network forks enabled us to identify a few more bugs in the updated p2p/locking code (actually a couple of the bugs were in the existing fork database code, but the buggy code was never called during a fork switch previously because the p2p thread was always frozen during a fork switch due to the bad locking in the old code). We fixed the bugs and also made some further improvements to logging to ease diagnosis of any further bugs we might find (and also help in performance optimization).
Currently there’s no further known bugs associated with p2p/locking changes, but our testers are scheduled to create further test scenarios (e.g. testing transition of nodes to the new hardfork protocol).
### Mirrornet (testnet that mirrors traffic from mainnet) to test p2p code
After a couple more bug fixes, the mirrornet code is “mostly working” now, but there’s still one issue as a node is replayed and first enters live sync. This bug will probably be fixed tomorrow, so I expect we can launch a new public mirrornet early next week. At this point it would be good to get more people to launch nodes on the mirrornet so we can get a nicely distributed network with differently configured nodes.
We plan to use the new mirrornet to look at potential optimizations of the new hived nodes with better locking when they are under heavy load conditions (and to perform further testing, of course).
### Completed implementation of block finality protocol
The new block finality protocol has been coded and very lightly tested.
Next we’ll begin creating CI tests to exercise this code under various stressed network conditions (e.g. simulating breaks in the p2p network connections). Despite the lack of a lot of testing yet, based on the cleanness of the design, I don’t expect many bugs in this code.
# Hive Application Framework (HAF)
### Reduced size of hashes stored in HAF databases
One of our new HAF app developers noticed a discrepancy in the way HAF was storing transaction hashes (and block hashes as well), and we ultimately found that the sql_serializer plugin was storing off the hex values as ASCII strings instead of encoded binary (so we were using 40 bytes to store the hash when we could be using only 20 bytes to do it).
We’ve changed the storage mechanism now to store them as a pure binary hash, which will reduce the size of the transactions table (and some indexes) and also should improve performance somewhat. We’re currently running a benchmark to determine how much smaller the affected tables get after a full replay.
### Fix performance problem when a HAF app “breaks”
We found that if a HAF app stops processing blocks (e.g. there is some bug in the app’s code), this could cause reversible data to begin accumulating in the database instead of being moved to an irreversible state, and this could result in a slowdown of other still-working HAF apps running on that server. This was undesirable, since we want HAF apps to be isolated from each other, by default. Fortunately, we found a quick fix that essentially alleviates this issue and the fix was committed today.
### Filtering of operations to create small HAF servers
I benchmarked HAF database size for both forms of filtering for a HAF servers (1. account-based filtering similar to what is used by the account-history plugin and 2. operations-regex filtering which enables filtering out specific types of operations with particular data) and the news is quite good, although not unexpected.
For example, using the operations/regex filtering to filter out just splinterlands game operations (but not hive engine ones), the HAF database size on a compressed ZFS drive dropped from 1200GB down to 359GB. This also reduced the time to recreate the database table indexes after the replay on our “typically configured server” from 5.5 hours down to 1.1 hour.
Even more impressively, a HAF database configured just to store account history data for popular Hive exchanges only consumed 5GB on a ZFS compressed drive (as opposed to 1200GB for a full HAF database). And most of that 5GB was consumed by the transactions table, so we may cut that size in half with the new reduced-size hashes mentioned above.
Note that for many “stand-alone” HAF servers that are dedicated to a particular app or set of apps that mostly rely on custom_json, 5GB size is a realistic “minimum database size” requirement, with any additional data just being whatever is required by the apps themselves (i.e. storage of the app-specific custom_json and state tables for the app).
### Benchmarking of various HAF server configurations
We’ve continued to benchmark HAF running in various hardware and software configurations to figure out optimal configurations for HAF servers in terms of performance and cost effectiveness.
Based on that benchmarking, we’ve modified the default maintenance_worker_mem setting from the default 64MB to 6GB while we’re re-building the indexes after a HAF replay. We’ll also be checking to see if this change will be very beneficial when performing table vacuums of various types (and therefore whether it should be increased some at other times or even permanently for the server).
Currently we’re analyzing temporarily disabling filesystem sync acknowledgment during initial population of a HAF database. Surprisingly, we haven’t seen a benefit from this yet, so we need to investigate further to figure out why. However, we did see substantial benefits in configuring drives used by the database to “noatime”.
# HAF account history app (aka hafah)
We’re currently enabling automated testing of the version of Hafah that uses PostgREST instead of a python-based web server. Our benchmarking so far continues to show a substantial benefit from using PostgREST, with one of the latest tests showing a 2x improvement in response time (100ms for a 500kb response versus 200ms in the python-based server) for a `get_ops_in_block` call (which was previously the biggest bottleneck during our Hafah benchmark test). I believe this work will be completed by tomorrow.
In related work, we’ve making further changes to our benchmarking script so that this data can be analyzed easily from our automated testing system so that we can easily see improvements or regressions whenever the code is changed.
# Hivemind (social media middleware server used by web sites)
We continued to work on conversion of Hivemind to a HAF-based app. The HAF-based version has been tested using our standard hivemind CI tests and only a small number of tests failed, so I expect we’ll have a usable version available soon that we can begin benchmarking.
# HAF-based block explorer
We’re still in the early stages of this project, but this work is what led to the discovery that we were storing the transaction hashes inefficiently and the issue with retaining reversible data for too long when a HAF app breaks, so it’s already yielding some useful benefits as a means of testing and improving HAF itself.
# Some upcoming tasks
* Modify the one-step script for installing HAF to optionally download a trusted block_log and block_log.index file (or maybe just allow an option for fast-syncing using a checkpoint to reduce block processing time now that peer syncing process is faster and may actually perform better than downloading a block_log and replaying it). This task is on hold until we have someone free to work on it.
* Collect benchmarks for hafah operating in “irreversible block mode” and compare to a hafah operation in “normal” mode. Task is on hold until we’ve finished optimization of HAfAH (mainly just PostgREST benchmarking now).
* Further testing of hafah on production servers (api.hive.blog).
* Finish conversion of hivemind to a HAF-based app.
* More testing of new P2P code under forking conditions and various live mode scenarios and in a mirrornet testnet using only hived servers with the new P2P code.
* Finished testing of new block finality code.
* Complete work on resource credit rationalization.
# When hardfork 26?
We’re mostly in a “test and benchmarking” mode now, so if all goes well with testing in the next week, we may be able to schedule the next hardfork in late May.
See: 6th update of 2022 on BlockTrades work on Hive software by @blocktrades