w3f / 1k-validators-be Goto Github PK
View Code? Open in Web Editor NEWThousand Validators Program backend.
License: Apache License 2.0
Thousand Validators Program backend.
License: Apache License 2.0
When a client updates they aren't being registered as running the latest version, and the backend requires a restart. I think the node registering logic is missing the field to update the client version.
Add some documentation to explain how everything works.
Right now faults are not always given for behaviour that should induce a fault. These fault events should be more strictly enforced.
and alert the Riot room when a validator was reported offline. Decide on the tolerance of offline reports before docking the rank of the validator.
Hi,
In my understanding, only the valid candidates will be nominated by the system. However, some invalid candidates also have been nominated. The following data is retrieved from the '/nominators' endpoint in era 1616 and might be the false nominations. Here is the API to check false nominations. https://onekv.herokuapp.com/falseNominations
[
{
"stash": "HhcrzHdB5iBx823XNfBUukjj4TUGzS9oXS8brwLm4ovMuVp",
"name": "KIRA Staking",
"elected": false,
"nominatorAddress": "5C8ZU7zugMubgENdcyiZouHcVYSoeWbF8TpXSdWjStzYbFZW",
"reason": "KIRA Staking has an identity but is not verified by registrar."
},
{
"stash": "EtJ4HxHYEDvYWRJAdmV4hYpTbGMJCmEgnLC8zAf6u5ZyT7C",
"name": "WolfEdge-Capital",
"elected": false,
"nominatorAddress": "5C8ZU7zugMubgENdcyiZouHcVYSoeWbF8TpXSdWjStzYbFZW",
"reason": "WolfEdge-Capital does not have an identity set."
},
{
"stash": "Dcw5vVBmon1PCERJXkYLvvMVmAE8xdqytUwNQLE8p1Hm33J",
"name": "robonomics_team-01",
"elected": false,
"nominatorAddress": "5DZN69GLFZbm7cF65QBSHC7Ndeqwgjsq7XptnvYbSHHxe7aa",
"reason": "robonomics_team-01 has an identity but is not verified by registrar."
},
{
"stash": "J7Z1bxUB7qhxjqT5js6yAkCZoU1VYNxPvTdg9mtyNNbU845",
"name": "Cube3-KSM-Val1-ValidatorA",
"elected": false,
"nominatorAddress": "5DZN69GLFZbm7cF65QBSHC7Ndeqwgjsq7XptnvYbSHHxe7aa",
"reason": "Cube3-KSM-Val1-ValidatorA does not have an identity set."
},
{
"stash": "CgpV58FSvuzGmfZXfiAQfkdDMVcFtpMq91ahk2zNYZdjdR9",
"name": "LunaNova-KSM-Val1-ValidatorA",
"elected": false,
"nominatorAddress": "5GgyyiDPHNKSoE2sWCn5dMuJmAgoXnM4dzrmPmBucakiqPYh",
"reason": "LunaNova-KSM-Val1-ValidatorA does not have an identity set."
},
{
"stash": "FrQ4W8Bo6wgXzkaGHLzVFSsfbWWHvqGGNP1YkRmTPSkN17J",
"name": "otter-sv-validator-1",
"elected": false,
"nominatorAddress": "5GgyyiDPHNKSoE2sWCn5dMuJmAgoXnM4dzrmPmBucakiqPYh",
"reason": "otter-sv-validator-1 offline. Offline since 0."
},
{
"stash": "HRYTEruAjwDD46kkgaTYpGHQC6uea3AkeLJg4iterSmmjo2",
"name": "Tornado-V1",
"elected": false,
"nominatorAddress": "5GgyyiDPHNKSoE2sWCn5dMuJmAgoXnM4dzrmPmBucakiqPYh",
"reason": "Tornado-V1 offline. Offline since 0."
},
{
"stash": "DAexrmQxJ8TKiqpcU2QSn2QiGppGCpWZkJ9p7Nyhm7DW6nB",
"name": "liberty-sv-validator-0",
"elected": false,
"nominatorAddress": "5GgyyiDPHNKSoE2sWCn5dMuJmAgoXnM4dzrmPmBucakiqPYh",
"reason": "liberty-sv-validator-0 does not have an identity set."
}
]
Transcript from Riot:
@will My validator has been up continuously since 9th May 2am BST but nominations have been inconsistent as of late i.e. on 1 day, then off 1 day, then on 1 day, then off 4 days, then on 3 days, then off since a day ago.
So I've dug through https://github.com/w3f/1k-validators-be, queried the backend URL mentioned above, can see a seemingly errnoneous "offlineAccumulated" value of 76104897 (ms) and then the text in "/invalid" that my node has been offline 1268 minutes this week.
As checkSingleCandidate() imposes a 98% weekly uptime requirement, this would explain why nominations were apparently pulled yesterday, and perhaps some of the other occasions as well.
As there happen to be 18 other nodes who also appear in the in the "/invalid" list, all with 1268 minutes of offline time this week, this would appear to be a problem on the 1K backend side.
i.e.$ curl -s 'https://otv-backend.w3f.community/invalid'|grep 'has been offline 1268\.' -c 19
I did notice this in the validator logs:
2020-05-14 12:15:40 ?? Disconnected from /dns4/telemetry-backend.w3f.community/tcp/443/x-parity-wss/%2Fsubmit: Sink(Custom { kind: Other, error: B(Custom { kind: Other, error: Io(Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" }) }) })
2020-05-14 12:22:30 ?? Pre-sealed block for proposal at 2304220. Hash now 0x37bf872e064f3a1523dce3390a50c4a93256697106215d3c860a896ffc436b95, previously 0x4b645e02cc97fab6db108603d2c7bcff2a802405fe939e1```
Also running lsof on the validator process I don't see any connections to the w3f telemetry server, so it looks like once the telemetry connection goes down, polkadot never tries to bring it up again.
shadewolf
@will Obviously it is up to W3F and Parity as to how they nominate their stake but in this instance I would suggest they consider resetting the weekly offline accumulated time of affected validators.
sebytza05
shadewolf: yup, i found too in the validator logs 2020-05-14 13:15:40โ ๏ธ Disconnected from /dns4/telemetry-backend.w3f.community/tcp/443/x-parity-wss/%2Fsubmit%2F: Sink(Custom { kind: Other, error: B(Custom { kind: Other, error: Io(Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" }) }) })
offlineSinceand has been offline n minutes this week are so useless right now
It looks like the problem is as mentioned above, that telemetry is kicking off validators and polkadot
does not reconnect.
All api calls should be batched in order to reduce the amount of "over the air" calls we do, as well as reduce the room for async failures and bugs.
At the moment a lot of the nominator accounts distribute stake unevenly.
_doNominations
should be revised so that each nominator accounts will nominate(account_balance / lowest_staked_validator * 1.05)
amount of candidates.
Currently we use a custom built "fast substrate" in order to do the testing. Ideally we can use a mocked substrate to do the testing, but that's probably a whole project in itself.
The least we can do is dockerize the fast substrate so that tests can reliably run on different architecture and CI.
If the API is inconsistent when the ChronJob goes to endRound
or startRound
then the transactions will not be made. The script should have a way to recover from an inconsistent API connection.
In order to limit validators of a single identity to running only a specified number of validator candidates in the program, we need a new parameter and check to ensure that only a maximum of the same identity (including sub-identities) are registered in the program.
Hitting the /candidates endpoint and others often will take a few seconds to resolve.
Right now the backend is pretty specific to the 1k-v use case, however if we abstracted the requirements of the validators into its own constraints.js
file and allowed this to be passed in as an option it could allow the backend to be used by other nominator services.
scorekeeper.ts
nodes = nodes.filter((node: any) => node.offlineAccumulated / WEEK <= 0.02);
constant variable for the UP_TIME
index.ts & scorekeeper.ts
const scorekeeperFrequency = Config.global.test? '0 0-59/3 * * * *' : '0 0 0 * * *';
scorekeeper.begin(scorekeeperFrequency);
the crobJob will re-run for everyday, and the nomination transaction logic in the nominator.ts
that could possible fail due to the RPC node stucks / tx fail or something like that.
say the sitation like : we have 5 validators would like to nominate (A,B,C,D,E)
A - Success
B - Fail
C - Success
How could we handle validator B nomination ? (based on the current situation, the validator might need to wait 1 day). Suggest to have the logic to handle that.
Also
await nominator.nominate(toNominate);
this.db.newTargets(nominator.address, toNominate);
this.db.newTargets(nominator.address, toNominate); <--- this should only update when nomination successful call.
By making the points more useful, it would be great to design the game like
More better design would also consider the era points element as well, this can ensure the validator has done some actal works.
The above design would require we have multiple accounts for holding different amounts since we cannot change the amount as we want immediately. so would be like
Basic nomination amount: have 20 addresses contain 3000 KSM
Medium nomination amount: have 20 addresses contain 6000 KSM
and so on.
Move these to the config so they can be adjusted for testing
For a period of every 4 eras on Kusama and every era on Polkadot (ie eraPeriod or floor(currentEra / 4)
), we should make historical Rank events, indicating that an address has gone up a rank for the period of time.
It may look something like the following:
(say the current era is 1000)
{
address: "<address>"
eraPeriod: 250,
erasActive: [996, 997, 999],
newRank: 27
}
In this case it would be easier to keep track of previous events, and also compensate for times when the backend misses the times to increment rank. In this case it can backfill previous missed ranks appropriately by looking at the last rank event.
Hi, I am running a validator based on the latest client code which is 0.8.24-5cbc418a-x86_64-linux-gnu right now. However, I keep getting "xxx is not running the latest client cod" message.
I thought the reason is the networkId (i.e., Sentry Node Network ID) of these validators is null. Therefore, the ensureUpgrades process will bypass these validators (
1k-validators-be/src/monitor.ts
Line 65 in 220bade
1k-validators-be/src/db/index.ts
Line 437 in 220bade
Right now the service expects to have access to the controller account's seeds, we should also support having access to the proxy account's seeds.
Not long after running yarn docker
, I get the following error:
1kv_1 | (node:28) UnhandledPromiseRejectionWarning: TypeError: this.api.query.staking.activeEra is not a function
1kv_1 | at ChainData.<anonymous> (/code/src/chaindata.ts:13:52)
1kv_1 | at Generator.next (<anonymous>)
1kv_1 | at /code/src/chaindata.ts:8:71
1kv_1 | at new Promise (<anonymous>)
1kv_1 | at __awaiter (/code/src/chaindata.ts:4:12)
1kv_1 | at ChainData.getActiveEraIndex (/code/src/chaindata.ts:12:51)
1kv_1 | at ScoreKeeper.<anonymous> (/code/src/scorekeeper.ts:198:27)
1kv_1 | at Generator.next (<anonymous>)
1kv_1 | at /code/src/scorekeeper.ts:8:71
1kv_1 | at new Promise (<anonymous>)
1kv_1 | (node:28) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 2)
1kv_1 | (node:28) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
This is with:
node v13.10.1 (npm v6.14.2),
yarn 1.22.0
This is on a machine running ubuntu 19.10 with a fresh clone of this repo, and ranks don't end up increasing.
Strangely I don't get the error while on another machine (also ubuntu 19.10), with similar versions of node and yarn. Not really sure what to make of it.
We should relax the requirement that stash != controller as long as there is a staking proxy instead.
The README should be updated with any new information and the differences between the Kusama program and the Polkadot program.
Right now the docker-compose setup for testing things locally doesn't quite work. The docker images are a bit out of date.
These should be updated, and also the telemetry frontend should be added as well for double checking things. Perhaps it might also be helpful to include another node or two.
https://github.com/wpank/polkadot-local-network/tree/master/scripts/testing
One thing I've also added to a similar approach in the above repo is having a bunch of scripts to do things to the docker containers. So having some of these might be a good way to test things out as well.
In creating a details page, it would be nice to have an endpoint to query by address that return only the individual candidate data.
So something like /candidates/<validator_address/
.
By itself, ranks inconsistently update. As a stopgap, retroactive ranks have been introduced. Retroactive ranks should be removed and regular rank increases should be fixed.
Right now validators will accumulate faults, however there aren't may clear indicators as to what those faults were for.
It would be nice to have an endpoint to query with the validator, time, and fault reason.
This can be listed under an individual candidate endpoint ideally.
related: #460
Right now we all the data exposed as the /nodes
or the /nominators
endpoint. One thing that would be helpful is to expose the /rounds
endpoint with historical data of nominators and their targets and whether the targets ended up doing good or bad. It should expose this for all prior rounds.
The service will be occasionally or routinely restarted in order to add more validators to the configuration. This means that all state that is held in the program will be lost unless it's persisted in the database. Currently, we only persist node data in the database and keep nominator state in memory. We should add additional methods on the database to allow for saving nominator data.
... instead of 24 hour intervals.
Right now the script gives the validators the benefit of a doubt - i.e. if they weren't slashed then they must have done a good job. However, it should also check that our nominator was actually the one nominating it during the prior eras and that the validator was producing blocks.
Add a chronjob that resets offline accumulated to 0.
InvalidityReasons
For Polkadot:
Right now, checkSingleCandidate
is very kusama centric. This should be changed to be more general.
Adding PM2 would be great to monitor the process
https://www.npmjs.com/package/pm2
These aren't needed anymore and can be removed.
1k-validators-be/src/scorekeeper.ts
Line 216 in 62b8e08
As this will use the controller seeds to nominate, some precautions should be taken so that code can't exploit these. Adding in Static Code Analysis tooling as part of the CI pipeline can help mitigate this.
Some possible solutions:
We need a tool to track if rewards are being paid out by all validators which are nominated as part of this programme.
We should probably expect validators to handle calling their own payouts so if we detect that a validator does not do this we can give a warning.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.