30th update of 2021 on BlockTrades work on Hive software
![blocktrades update.png](https://images.hive.blog/DQmSihw8Kz4U7TuCQa98DDdCzqbqPFRumuVWAbareiYZW1Z/blocktrades%20update.png)
Below is a list of some of the Hive-related programming issues worked on by BlockTrades team during the past week:
# Hived work (blockchain node software)
### Improving resource credit (RC) plugin
We’re continuing to experiment with tweaking the RC credit plugin to more accurately account for real world costs. Because a lot of the work involves long-running reindexing tests whenever we make a change, I expect this task to continue for a couple of weeks at least.
### Miscellaneous work on hived
We upgraded the CI build/runtime/test images to Ubuntu 20.04 as part of our general move of hive development from U18 to U20.
We completed an analysis of some strange log messages during periods of high blockchain activity. These messages claimed some transactions were being included into a series of consecutive blocks when the transacting account didn’t have enough RC to pay for them. However, we could see on block explorers that these transactions were not actually included, so we decided to dig in to find out what was going on, in case there was some real problem. The ultimate issue was pretty complicated, but not problematic. For those want to see what kinds of problems programmers have to deal with, you can read the full result of this investigation here:
https://gitlab.syncad.com/hive/hive/-/issues/197
We also continued work on improving testtools and tests for hived. This change impacts anyone who is writing tests for hived:
https://gitlab.syncad.com/hive/hive/-/merge_requests/313
# Hive Application Framework: framework for building robust and scalable Hive apps
Most of our work this week continues to be HAF-related and we had a lot of good news in the past few days on the performance front.
### New tables generated by sql_serializer for HAF apps reduced hafah sync time by 45%
We finished optimizing the latest version of sql_serializer that writes two new tables to the database to indicate what accounts exist (hive.accounts table) and what accounts are affected by which blockchain operations (hive.account_operations).
After optimization, writing these tables only added 10% to previous time required to generate a HAF database (in prototype form it took twice as long), which was really good news for us, because these two tables eliminate the need to have to a reindex/sync app in the HAF account history app (Hafah).
To put this in perspective, previously to reindex Hafah required two steps:
* Run sql_serializer to generate HAF database to the head block (24499 seconds)
* Run hafah reindex to generate tables needed by Hafah (23942 seconds)
So the total time to process 49M blocks was 48441 seconds = 13.5 hours
With the new version of sql_serializer, there is no need for the second step. The new version of sql_serializer runs a little longer (26738 seconds), but as there is noHafah sync step, the total time was reduced from 13.5 hours to7.3 hours (45% faster).
### Space consumed by HAF database tables at 59M blocks
hive.blocks | 7135 MB |
hive.operations | 969 GB |
hive.transactions | 408 GB |
hive.transactions_multisig | 438 MB |
hive.accounts | 108 MB |
hive.account_operations | 190 GB |
The HAF database above occupies 2180GB after adding in table indexes and other overhead.
The last two tables above are the new ones that were just added. They increased the HAF database size increased by about 14%. But it is safe to assume that these tables will be useful to most HAF apps.
For apps that don’t require all this data, we will be adding a filtering option to what data is stored by sql_serializer, allowing standalone HAF apps to very dramatically lower their storage requirements (probably as low as 50GB or so).
Note that the above requirement does not account for the normal storage requirements for a hived node (450GB for block_log and block_log.index) but it does eliminate the need for 580GB database for rocksdb account history plugin, which is obsoleted by the Hafah app. So a hived node + HAF server with full blockchain data would require about 2.7TB of space. I suspect the sweet spot now for performance vs cost for most API servers will be to get two 2TB nvme drives and configure them as a RAID0 stripe.
### Confirmed that Hafah results match results from account history plugin
We completed the code that compares results of Hafah against the account history plugin that it replaces and verified that the output was the same. We also developed scripts to benchmark the various phases of computation performed by Hafah when processing an api call (SQL query time vs serialization of query output into json form).
### Hafah server benchmarks (more good news)
After more optimization work on the Hafah API server, we benchmarked it under various loads and compared the results against the hived account history plugin that it replaces. Benchmarks completed so far are here: https://gitlab.syncad.com/hive/HAfAH/-/issues/6
The results were inline with what I hoped for when we initiated this project, but still extremely gratifying: on average, Hafah serves up data 20 times faster than the account history plugin, and for worst case API calls, it is as much as 40 times faster!
As an example, the worst case time for a Hafah API call on our fast system was 3.7s, whereas the same call on the rocksdb plugin would take 108s (except on any public node this API call would just timeout).
Another way to look at this is that a single server running the new code can handle the account history workload of 20 of our current servers.
And even for light loads, the new servers will feel much more responsive to users. The average times for account history API calls to complete are sub 100ms now, so the dominant factor for most API calls will be the latency between the client app and the API server.
One final observation is that the new HAF-based solution seems to scale better as the size of the blockchain grows: the performance benefits were more substantial at 50M blocks than at 5M blocks, for example. Since Hive is already one of the most active blockchains and signs point to rapid growth, I believe this aspect will become increasingly critical to Hive’s scalability.
### Added 2 new programmers to finish up balance_tracker app
We’ve introduced two new programmers to our blockchain team that will focus on development of HAF-based apps. Having fresh eyes on the project will help us to identify deficiencies in the current documentation for HAF.
As a starter project they are implementing the API for the new HAF-based balance tracker app I described in my post last week. They will also be creating an associated web-based API that will allow users to graph balances of Hive and HBD over time (i.e.. like a portfolio tracker on a trading site to see how your account balances grow or shrink with time).
# Work in progress and upcoming work
* Cleanup documentation for HAF and HAF apps
* Fix slow down when sql_serializer swaps from massive sync to live sync mode because of sql_index_threshold being used improperly
* Add operation filtering option to sql_serializer
* Finish up balance_tracker application
* Repo and branch cleanup
* Finish conversion of hivemind to HAF-based app. Once we’re further along with HAF-based hivemind, we’ll test it using the fork-inducing tool.
* Continue RC related work.
See: 30th update of 2021 on BlockTrades work on Hive software by @blocktrades