24th update of 2021 on BlockTrades work on Hive software
![blocktrades update.png](https://images.hive.blog/DQmSihw8Kz4U7TuCQa98DDdCzqbqPFRumuVWAbareiYZW1Z/blocktrades%20update.png)
Below is a list of Hive-related programming issues worked on by BlockTrades team during last week:
# Hived work (blockchain node software)
### Voting changes that will take effect with HF26
Eliminated rule that prevented voting more than once per block (only allowed one vote every 3s). Vote edits are no longer penalized with no curation rewards - they behave mostly as if the original/previous vote was not there. Dust votes are fully considered as votes (it will no longer be possible to delete a comment that has received dust votes).
https://gitlab.syncad.com/hive/hive/-/merge_requests/258
### CLI wallet
Wallet tests were rewritten to use the new, faster [Test tools](): https://gitlab.syncad.com/hive/hive/-/merge_requests/251
We’re also working on supporting offline operation of the cli wallet:
https://gitlab.syncad.com/hive/hive/-/merge_requests/265
### Code cleanup
We’ve made websocketpp into a submodule (previously entire codebase for websocketpp was directly copied into hived code repo via the embedded fc library). This should make it easier to update the websocketpp library in the future from its source repo:
https://gitlab.syncad.com/hive/hive/-/merge_requests/235
### sql_serializer (Hived plugin that streams data to HAF database)
During testing of HAF, we found some and fixed some bugs in the sql serializer plugin.
When a hived node is started, it drops all irreversible data it has, so reversible data in the associated HAF database also has to be removed, and the data of HAF-based apps data have to be rewound to the last irreversible block. To fix this issue, the sql_serializer now generates an artificial BACK_FROM_FORK HAF event, so that all the reversible data will be removed from the HAF database:
https://gitlab.syncad.com/hive/hive/-/merge_requests/266
As mentioned in a previous report, there was a bug if the sql_serializer lost connection to the postgres database it is writing to (for example, if the postgres database was temporarily shutdown for maintenance then restarted). So we enhanced the sql_serializer to automatically try to reconnect in this case:
https://gitlab.syncad.com/hive/hive/-/merge_requests/263
# Hivemind (2nd layer applications + social media middleware)
As mentioned previously, we’re planning to migrate to Ubuntu 20 as the recommended deployment environment for hived and hivemind. As part of this change, we’re planning to move to postgres 12 for hivemind, because this is the default version of postgres shipped with Ubuntu 20.
### Final postgres 10 release for production servers, moving development releases to postgres 12
During our performance testing of postgres 12 in the last couple of weeks, we’ve found numerous places where we will need to inject postgres 12-specific syntax to achieve good performance.
So we’re planning to release a final version of postgres 10-compatible hivemind in the coming week containing all current fixes and performance enhancements, then move the develop branch to be postgres 12-only. Going forward this means that future releases of hivemind will require postgres 12 (but stick with postgres 10 for production servers until those releases are made available, as the final postgres 10 release will perform much better on postgres 10).
### Analyzing Postgres 12’s “just-in-time” compilation of queries
We’ve determined that just-in-time (jit) compiling of queries (first introduced in postgres 11) has a detrimental effect on several hivemind-based queries and no observed benefits so far. This should not be surprising as jit is mostly beneficial for speeding up execution of long-running queries, and hivemind was designed to avoid such queries in general. Also, hivemind queries are often complex, so they take longer to compile.
We’re planning to disable jit during hivemind live sync. We’ll be performing some benchmarks in the coming week to see if jit benefits any queries during massive sync (these are the only queries in hivemind that might plausibly benefit). We’ll also need to check if the move to a HAF-based hivemind changes the performance profile of jit during massive sync.
### Postgres 12 breaks ring-fencing of Common Table Expressions (CTEs) by default
Another issue we found during the port to postgres 12 is that postgres 12 tries to globally optimize queries that contain CTEs (i.e. WITH clauses in a query). Previously, sql code inside a CTE was always optimized separately from other parts of the query that depend on the data generated by the CTE. This separate optimization step is often referred to as “ring-fencing” of the CTE.
In many hivemind queries, we’ve taken advantage of ring-fencing to force the SQL query planner to use a beneficial ordering of joins within the queries. By breaking ring-fencing, postgres 12 was breaking these optimizations.
Fortunately, postgres 12 updated the syntax for WITH statements to enable the old ring-fencing behavior on specific queries by adding the MATERIALIZED keyword. For example, to enforce ring-fencing under postgres 12, you would use the syntax:
`WITH results AS MATERIALIZED` to achieve the same behavior as `WITH results AS` in postgres 10. Unfortunately, this syntax isn’t accepted by postgres 10, and we need to make this change on a number of queries, so this is one of the driving reasons we decided to move future development to postgres 12 (but I suspect we’ll find other reasons as we go along).
# Hive Application Framework: framework for building robust and scalable Hive apps
### Fixing/Optimizing HAF-based account history app (Hafah)
We’re currently optimizing and testing our first HAF-based app (code-named Hafah) that emulates the functionality of hived’s account history plugin (and ultimately will replace it). During the past week we’ve been running full tests to the current headblock of the mainnet (i.e. 57M+ blocks) and making sure it enters live sync mode properly (after some fixes, it does).
We’ve also been testing using our “fork-inducing” tool on a testnet and this also helped us to identify some bugs in HAF (now fixed). We’re also doing further work on this tool to eliminate some random aspects of its operation to ensure repeatability of its testing ability.
Some bugs identified and fixed by recent testing of HAF include:
https://gitlab.syncad.com/hive/psql_tools/-/merge_requests/13
https://gitlab.syncad.com/hive/psql_tools/-/merge_requests/14
https://gitlab.syncad.com/hive/psql_tools/-/merge_requests/16
### Benchmarking concurrent operation of sql_serializer and Hafah
We tested running the sql_serializer replaying from block 0 to headblock while at the same time concurrently running the Hafah app on the same postgres database. Unfortunately, this unexpectedly resulted in a slowdown as compared to separately running the sql_serializer in massive sync mode, followed by running Hafah on the resulting data, so we’re investigating potential causes for this.
The slowdown manifested in the form of the sql_serializer taking longer to reach the headblock. Hafah initially trailed the sql_serializer but was eventually able to keep up with the data streamed by the sql_serializer in massive sync mode before it actually reached the head block and entered live sync mode.
Currently I suspect the issue is either the introduction of indexes and foreign keys into the HAF tables (required for Hafah to run) or autovacuums on the HAF tables (these make Hafah perform better), and we’ll be investigating selectively disabling some of these to see if any are the root cause of the slowdown.
### Investigating multi-threading the jsonrpc server used by HAF
We’ve assigned a dev to investigate possible ways to multi-thread the jsonrpc server used by HAF (and traditional hivemind). As mentioned in my previous report, we discovered that this becomes a bottleneck for API traffic at high loads when the API calls themselves are fast. As this is a research project, it will likely take several weeks before we have something more to report on this issue.
### Conversion of hivemind to HAF-based app
We didn’t have a chance to work on HAF-based hivemind during the previous week as we were tied up with HAF and the HAF account history app, but I think we’ll be able to resume work on it during the upcoming week.
### Condenser (source code for hive.blog and a number of other Hive frontend web sites)
We reviewed and deployed a number of enhancements and bug fixes by @quochuy.
While investigating another issue with hive.blog recently, I saw some malformed URL requests on https://hive.blog server logs which I suspect are being programmatically generated by condenser itself. This issue is a relatively minor problem and is still under investigation.
# Upcoming work for next week
* Release a final official version of hivemind with postgres 10 support, then update hivemind CI to start testing using postgres 12 instead of 10.
* Finish testing fixes and optimizations to HAF base-level components (sql_serializer and forkmanager).
* For Hafah, we’ll be 1) continuing researching multithreading of jsonrpc bottleneck, 2) further benchmarking of API performance, 3) verifying results against a hived account history node, 4) analyze causes of slowdown of concurrent hived replay and Hafah during massive sync, and 5) continuing set up continuous integration testing for Hafah.
* Resume work on HAF-based hivemind. For HAF-based hivemind, we plan to restructure its massive sync process to simplify and optimize performance by taking advantage of HAF-based design. Next we’ll modify live sync operation to only use HAF data (currently it still makes some calls to hived during live sync). Once we’re further along with HAF-based hivemind, we’ll test it using the fork-inducing tool.
See: 24th update of 2021 on BlockTrades work on Hive software by @blocktrades