nixd becomes unresponsive after a certain about of time, when nix-cli/rpc calls are made. Timescale is not always the same. Can happen even 20 minutes after starting nixd, in most cases failing between 1 - 3 times in a 24 hour period, some cases can run up to two days without issues. The following types of calls are made:
Every 15 minutes: nix-cli ghostnode list full
Every one hour: nix-cli -getinfo (can also be substituted with one of or a combination of getblockchaininfo, getnetworkinfo)
Amount of calls being made are not unreasonable, nor excessive. The following behaviour has been noted.
- When the problem starts, issuing nix-cli commands ends up with no response coming back. When running from the console (Linux), a flashing cursor is displayed as if it is waiting for nixd to return the results. This does not happen.
- After a while, the behaviour from point 1 then changes, and instead of a flashing cursor waiting for a response, the following error appears:
nix-cli -getinfo
error: couldn't parse reply from server
nix-cli getblockchaininfo
error: couldn't parse reply from server
nix-cli getnetworkinfo
error: couldn't parse reply from server
- In debug.log the following errors start to appear:
2018-09-14 02:20:02 socket sending timeout: 1201s
2018-09-14 02:20:02 socket sending timeout: 1201s
2018-09-14 02:20:02 socket sending timeout: 1201s
2018-09-14 02:20:02 socket sending timeout: 1201s
2018-09-14 02:20:02 socket sending timeout: 1201s
2018-09-14 02:20:02 socket sending timeout: 1201s
2018-09-14 02:20:02 socket sending timeout: 1201s
2018-09-14 02:20:02 socket sending timeout: 1201s
which then causes the following messages to appear when nix-cli/rpc calls are made:
2018-09-14 09:30:01 WARNING: request rejected because http work queue depth exceeded, it can be increased with the -rpcworkqueue= setting
2018-09-14 09:30:01 WARNING: request rejected because http work queue depth exceeded, it can be increased with the -rpcworkqueue= setting
- At this point, there is no way to shut down nixd safely, it can only be killed with kill -9
The errors in the debug.log relating to rpcworkqueue are false information, attempting to edit and change this parameter in nix.conf
by increasing the value doesn't resolve the problem. The problem is actually a locking issue with nixd. This problem has been seen before with Fixed Trade Coin, and also Syscoin. Both fixed the issues by addressing the locking problem. This can be found on Syscoin's github, from commits around May 5. The following commits are related to this issue with the appropriate fix (this exact same behaviour was noted with both Fixed Trade Coin and Syscoin and reported to them through their Discord/Slack channels):
syscoin/syscoin@dbe0afd
syscoin/syscoin@6f1e10f
Attached debug.log.
Platform: Ubuntu 16.04 x86_64. Compiled as per Nix Platform instructions.
debug.log