5th update of 2022 on BlockTrades work on Hive software
![blocktrades update.png](https://images.hive.blog/DQmSihw8Kz4U7TuCQa98DDdCzqbqPFRumuVWAbareiYZW1Z/blocktrades%20update.png)
Below are highlights of some of the Hive-related programming issues worked on by the BlockTrades team since my last post.
# Hived (blockchain node software) work
### Optimization of Resource Credit (RC) calculations
Changes were made and tested to add extra cost of custom operations that are used for RC delegations (while these aren’t strictly consensus, they do impose more costs on a hived node than other custom operations).
### Testing and fixing bugs in new wallet_api code
Found and fixed some more bugs in wallet_api code. I think this task will be closed out soon.
### Testing the new p2p code with testtools-based testnets
We fixed a minor race condition in the new sync code that got exposed during testing: https://gitlab.syncad.com/hive/hive/-/commit/c62803aa6a1627771ee0f05f95710154263a843c
Testing of the new p2p code also exposed a latent bug in the fork_database code: the fork database’s copy of the head block didn’t get updated properly when a fork switch occurred. This could cause problems for the new p2p code during syncing, because it now uses this copy of the head block instead of the one in chainbase to reduce mutex contention on chainbase. Now the head block is properly updated during fork switches: https://gitlab.syncad.com/hive/hive/-/commit/5c0b7fc0e290859d6d1809234a2c87cedecc760c
During this testing, we also found a longstanding problem with the way the mutex locks that protect access to critical resources were being handled: many of the read locks include a timeout parameter (i.e. if a mutex read lock wasn’t obtained within 1s, the lock failed and the code would have to deal with this failure) and it turns out that these locks performed with a timeout can actually fail for no reason (even before the timeout time expires) and required that the code using the lock explicitly check the results of the lock attempt (checks that weren’t performed). This didn’t happen very often (maybe once in a million times), but over long periods of time this has no doubt resulted in occasional unexpected failures inside the code.
To fix this problem, by default, read locks are now “untimed” (they will block until they get the lock) and only the API server uses locks with timeouts (these calls are allowed to fail and using timeouts prevents API calls from taking up too much of the chainbase’s access time and potentially starving the critical execution of the write_queue that is writing blockchain data to chainbase). We also replaced the use of boost::interprocess locks with standard locks, as the locks are only used within a single hived process and there didn't seem to be a need for these presumably more expensive locks. This work was merged in here: https://gitlab.syncad.com/hive/hive/-/merge_requests/401
### Mirrornet (testnet that mirrors traffic from mainnet) to test p2p code
While trying to setup the mirrornet to test the new p2p code, we found some further problems in the mirrornet code, and these are currently being fixed.
Once they are fixed, we’ll resume attempts to test the p2p code under heavy loading conditions, but in the meantime we decided to rely on the “tried-and-true” method of throwing the new code into our production environment (watching it carefully, of course) to test the above mentioned locking changes that were made.
We also exercised the new locking code using the new API benchmark/stress tests that we used to test account history nodes while the node was providing sync data to another node. Neither test exposed any bugs or performance regressions.
### Completed initial design of block finality protocol
We completed our design for the new code to improve block finality time and we’ve begun implementation of the new code (sometimes with distractions to work on other tasks, unfortunately). I’ll write some more on this topic in a separate post once we’ve proved out the design more.
# Hive Application Framework (HAF)
### Filtering of operations using regexes to allow for “small” HAF servers
I believe the code for using regexes to filter operations is complete or nearly so, and tests are now being developed for it, but I forgot to get an update today on the status of this work, so I’ll update on this point tomorrow.
### Benchmarking of alternative file systems for HAF-based
We’ve continued to benchmark HAF running in various hardware and software configurations to figure out optimal configurations for HAF servers in terms of performance and cost effectiveness. Among other things, we discovered that the location of PostgreSQL write-ahead logs (by default written to /var/lib/postgresql) can have a significant impact on the time it takes to reindex a HAF database (note that this is separately specified from the location of the HAF database itself).
# HAF account history app (aka hafah)
We implemented and tested the changes I mentioned last week to create a new index in the HAF database to speedup get_account_history calls (and probably other similar future calls as well that may be needed by other HAF apps).
We’re now looking to see if we can speed up the performance of the next biggest bottleneck API call (get_ops_in_block), but performance of this call is already acceptable if we can’t further speed it up.
In order to speed up our optimization work, we also made a higher-level script to eliminate some of the manual steps that were previously required to perform a benchmark of HAfAH and get useful analytical data from the benchmark (this will probably be committed tomorrow). This script may serve as a useful starting point for other HAF apps looking to benchmark the performance of the app's API calls.
# Hivemind (social media middleware server used by web sites)
We continued to work on conversion of Hivemind to a HAF-based app with a slight detour: currently the code is being reviewed for possible improvements in coding style to meet best practices for Python.
We also updated the ujson package used by Hivemind because of security concerns about an older version of the package.
And finally we merged in an old optimization we made to processing of custom_json operations.
# HAF-based block explorer
We're in the early stages of developing a HAF-based block explorer (open-source, of course). Two of our newer developers are getting introduced to the associated concepts and also reviewing HAF documentation as part of this work.
# Condenser (source code hive.blog and several other Hive-based web sites)
We’ve also been reviewing and merging in updates from @quochuy for condenser and we have a few more to merge in during the coming days (some fixes for Hive Authenticate and improvements to deter phishing attempts).
# What’s next?
* Modify the one-step script for installing HAF to optionally download a trusted block_log and block_log.index file (or maybe just allow an option for fast-syncing using a checkpoint to reduce block processing time now that peer syncing process is faster and may actually perform better than downloading a block_log and replaying it). This task is on hold until we have someone free to work on it.
* Test filtering of operations by sql_serializer using regexs and account name to allow for smaller HAF server databases.
* Collect benchmarks for hafah operating in “irreversible block mode” and compare to a hafah operation in “normal” mode. Task is on hold until we’ve finished basic optimizations of HAfAH API.
* Further testing of hafah on production servers (api.hive.blog).
* Finish conversion of hivemind to a HAF-based app.
* More testing of new P2P code under forking conditions and various live mode scenarios and in a mirrornet testnet using only hived servers with the new P2P code.
* Complete work on improving block finality time.
* Complete work on resource credit rationalization.
* Continue benchmarking of HAF and Hafah on ZFS and EXT4 file systems with various hardware and software configurations.
I’m pushing the current expected date for the next hardfork to May, given the large number of testing and performance benchmarking tasks still facing us and a couple of key functional tasks still to be completed (RC rationalization and block finality improvement tasks).
See: 5th update of 2022 on BlockTrades work on Hive software by @blocktrades