31st update of 2021 on BlockTrades work on Hive software
![blocktrades update.png](https://images.hive.blog/DQmSihw8Kz4U7TuCQa98DDdCzqbqPFRumuVWAbareiYZW1Z/blocktrades%20update.png)
Below is a list of some of the Hive-related programming issues worked on by BlockTrades team during the past work period. I haven’t had time to check in with all our Hive developers this week, so this report will just focus on the devs I’ve been working with directly, and I’ll report on work done by other devs in my next report.
# Hive Application Framework: framework for building robust and scalable Hive apps
While running benchmarks of the HAF sql_serializer plugin over the past few weeks, I noticed that while the sql_serializer indexed most of the blocks very quickly, its performance was considerably slower once it dropped out of “massive sync” mode (where many blocks are processed simultaneously) into “single block” live sync mode. It also appeared that the sql_serializer was dropping out of massive sync mode earlier than it should, which could result in hundreds of thousands or even a million or more blocks being processed in this slower single block mode.
Benchmarks this week showed that the serializer can only process about 6.7 blocks/second on average in live sync mode, whereas in massive sync mode, it can process about 2300 blocks per second.
#### Live sync will always be slower than massive sync, but probably we can do better
Slower performance was expected in live sync mode, because this mode requires the sql tables to maintain several indexes and these indexes slow down the rate at which data can be added to the tables, but the magnitude of the slowdown was surprising (to me at least).
So one of the first things we’re looking at to improve this situation is optimizing live sync performance. I believe there’s a good chance that we can improve the speed of live sync mode considerably still, because the live sync coding so far was only focused on proper functional behavior.
But even if we get a 4x speedup in live sync performance, it is still much slower than massive sync mode, so we also need to reduce the amount of old blocks that get processed in live sync mode.
### Performance impacts of swapping to live sync mode too early
As a real-world measurement of the impact of this, we replayed a hived with a block_log that was 6.5 days old, so it was missing about 189K blocks, and it only took 7.15 hours to process the 59M blocks in the block log. But it took another 10 hours to process the remaining 189K new blocks that it received from the peer-to-peer network that were missing from the block_log (plus it had to process new blocks that got created while processing the old blocks).
We can envision an impractical “worst case” scenario where we sync a hived with an empty block_log with the sql_serializer enabled. This would currently take 59M blocks / 6.7 blocks/s / 3600 seconds/hr / 24 hrs/day = 101 days! I say this is an impractical scenario, because this can be worked around by first performing a normal sync of the hived node without the sql_serializer (which takes about 14 hours), then replaying with the sql_serializer with a fully filled block_log (so add another 7.5 hrs). But it does demonstrate one motivation for improving performance of the serializer with regard to live sync mode.
### Why does sql_serializer switch from massive sync to live sync so early?
Originally I believed that this transition from massive sync to live sync was accidentally happening too early because of an incorrect implementation of the `psql_index_threshold` setting. This setting helps hived decide whether it should start in massive sync mode or live sync mode, but I thought it might also be telling hived when to change from massive sync mode to live sync mode, so I expected we could fix the problem very easily and this issue was created to address the problem: https://gitlab.syncad.com/hive/haf/-/issues/9
But after further discussions this week with the devs working on the sql_serializer, it turns out that this flag wasn’t the problem.
The real reason the sql_serializer was switching from massive_sync to live sync mode was because massive_sync mode can only be used for processing irreversible blocks (blocks that can’t be removed from the blockchain due to a fork), so as soon as the serializer was no longer sure that a block wasn’t irreversible, it swapped to live sync mode.
The only way that the serializer knows a block is irreversible is if the block is in the block_log. So it first processes all the blocks in the block_log in massive sync mode, then switches to live sync mode to process all blocks it receives from the P2P network (this includes old blocks that were generated since hived was last running, plus new blocks that get generated while hived is processing old blocks).
So, ultimately the problem is that the serializer is processing new blocks as soon as it receives them from the P2P network, but these blocks only get marked as irreversible and added to the block_log after they are confirmed by later blocks received via the P2P network.
### How to stay in massive sync mode longer?
I’ve proposed a tentative solution to this problem that we’ll be trying to implement in the coming week: the serializer will continue to process blocks as they are received from the P2P network (this is important because the serializer makes use of information that must be computed at this time), but the resulting data will not be immediately sent to the SQL database, instead the data for the blocks will be stored in a queue in hived.
During massive sync mode, a task will be stepping through the blocks in the block_log and dispatching the associated SQL statements in the serializer queue. The serializer will stay in massive sync mode until it determines that the hived node has synced to within one minute of the head block of the chain AND all the serializer data for all blocks in the block_log has been sent to the SQL database, then it will switch to live sync mode. Switching to live sync mode currently takes about 30 minutes to build all the associated table indexes. Then the serializer will need to flush the queue of all blocks that have built up during the building of the table indexes. Finally, once the queue is flushed, new blocks from the P2P network can be sent immediately to the database, with no need to first store them in the local hived queue.
### Another possible improvement: make sql_serializer use non-blocking writes
Based on the benchmarks I ran, I believe that currently the sql_serializer writes data to the database as a blocking call (in other words, it waits for the SQL server to reply back that the block data has been written to the database before it allows hived to process another block).
Using a blocking call ensures that hived’s state stays in sync with the state of the HAF database, but this comes at the cost of less parallelism, which means processing each block takes longer, and it also places additional strain on hived in a critical execution path (during the time that hived is processing a block, hived can’t safely do much else, so the block writing time should be kept as short as possible to lower the hardware requirements needed to operate a hived node).
To avoid this problem, we will take a look at converting the database write to a non-blocking call and only block if and when the queue of unprocessed non-blocking calls gets too large. This will make replays with the sql_serializer faster and will also reduce the amount of time that hived is blocked and unable to process new blocks from the P2P network.
The only potential scenario I can think at the moment where using using a non-blocking call could cause a problem would be the case where the postgres server failed to write a block to the database (for example, if the database storage device filled up). With a non-blocking call, hived would continue to process blocks for a while instead of immediately shutting down, and hived’s state would become out of sync with the state of the HAF database.
Postgres servers tend to be very reliable, so this is an unlikely scenario, but even if it happens, in the worst case, it would only require the state of hived and the HAF database to be cleared and a replay from block 0. And more likely, a system admin would just re-initialize hived and HAF database using a relatively recent snapshot and a HAF database restore file, and then they could quickly be synced up to the head block again.
### Two new programmers working on balance_tracker app
We have two new Hive devs (a python programmer and a web developer) working on creating an API and web interface for balance_tracker (an example HAF app) as an introduction to the Hive programming environment.
### Investigating defining HAF APIs with PostgREST web server
In hivemind and in our previous HAF apps, we’ve been using a Python-based jsonrpc server that translate the RPC calls to manually written python functions that then make SQL queries to the HAF database. This works pretty well and we recently optimized it to perform better when heavily loaded, but there is a certain amount of boilerplate Python code that must be written with this methodology, which requires someone familiar with Python (not a difficult requirement) and also increases the chance for errors during the intermediate translation of data back and forth between Python and SQL.
To make this task a little more interesting, Bartek suggested we investigate an alternative way to define the API interface to a HAF app using a web server called [PostgREST](https://postgrest.org) to replace the python-based server.
This approach could be very interesting for experienced SQL developers, I think. Instead of writing Python code, APIs are defined directly in SQL as table views and/or stored procedures. API clients can then make REST calls to the PostgREST server and it converts the REST calls to equivalent SQL queries. So far this approach looks like a promising alternative for SQL programmers, and it will also be interesting to compare the performance between the PostgREST web server and the Python-based one.
See: 31st update of 2021 on BlockTrades work on Hive software by @blocktrades