w3f / 1k-validators-be Goto Github PK

View Code? Open in Web Editor NEW

68.0 68.0 81.0 17.73 MB

Thousand Validators Program backend.

License: Apache License 2.0

TypeScript 96.18% Dockerfile 0.07% Smarty 0.02% Shell 0.49% JavaScript 0.48% HTML 0.35% CSS 2.41%

1k-validators-be's People

Contributors

Stargazers

Watchers

Forkers

krzysztof-jelski benwhitejam itrouble arifkalayci m-saxemberg stjordanis mxinden iisaint lamafab openweb3-foundation metrocoindev randyshu2018 paradox-tt joepetrowski mmagician hitchhooker turboflakes isabella232 noc2 vjgaur lgtm-migrator senseless paulormart mathcryptodoc stakerspace blockseeker queiroz tugytur steaknoodles prostakers-com helikon-labs simonkraus shawn-eng-888 exoticstake lesnikutsa gbuchukuri stepanlav havikware btwiuse yaogmbh meistermike2 blockseekerio agx10k dgranzow altorder wpank rotkonetworks yayoi-k stampedecrypto ddozen koltigin alexw3f arshamteymouri kureus miabarbir michalisfr lobis stakeworld aperture-sandi ddorgan khastor24 arthurhoeke legendnodes tech-sumit bitoffabyte validorange matherceg analog-labs stakedtech vovacha miloskriz mutantcornholio paramit0 lazam ultra-nodes gauth8z helixstreet leostake toufeeqp

1k-validators-be's Issues

Add maximum accounts per identity

In order to limit validators of a single identity to running only a specified number of validator candidates in the program, we need a new parameter and check to ensure that only a maximum of the same identity (including sub-identities) are registered in the program.

Parameterize Chron Jobs

Move these to the config so they can be adjusted for testing

Track if rewards are getting paid out

We need a tool to track if rewards are being paid out by all validators which are nominated as part of this programme.

We should probably expect validators to handle calling their own payouts so if we detect that a validator does not do this we can give a warning.

False nominations?

Hi,

In my understanding, only the valid candidates will be nominated by the system. However, some invalid candidates also have been nominated. The following data is retrieved from the '/nominators' endpoint in era 1616 and might be the false nominations. Here is the API to check false nominations. https://onekv.herokuapp.com/falseNominations

[
  {
    "stash": "HhcrzHdB5iBx823XNfBUukjj4TUGzS9oXS8brwLm4ovMuVp",
    "name": "KIRA Staking",
    "elected": false,
    "nominatorAddress": "5C8ZU7zugMubgENdcyiZouHcVYSoeWbF8TpXSdWjStzYbFZW",
    "reason": "KIRA Staking has an identity but is not verified by registrar."
  },
  {
    "stash": "EtJ4HxHYEDvYWRJAdmV4hYpTbGMJCmEgnLC8zAf6u5ZyT7C",
    "name": "WolfEdge-Capital",
    "elected": false,
    "nominatorAddress": "5C8ZU7zugMubgENdcyiZouHcVYSoeWbF8TpXSdWjStzYbFZW",
    "reason": "WolfEdge-Capital does not have an identity set."
  },
  {
    "stash": "Dcw5vVBmon1PCERJXkYLvvMVmAE8xdqytUwNQLE8p1Hm33J",
    "name": "robonomics_team-01",
    "elected": false,
    "nominatorAddress": "5DZN69GLFZbm7cF65QBSHC7Ndeqwgjsq7XptnvYbSHHxe7aa",
    "reason": "robonomics_team-01 has an identity but is not verified by registrar."
  },
  {
    "stash": "J7Z1bxUB7qhxjqT5js6yAkCZoU1VYNxPvTdg9mtyNNbU845",
    "name": "Cube3-KSM-Val1-ValidatorA",
    "elected": false,
    "nominatorAddress": "5DZN69GLFZbm7cF65QBSHC7Ndeqwgjsq7XptnvYbSHHxe7aa",
    "reason": "Cube3-KSM-Val1-ValidatorA does not have an identity set."
  },
  {
    "stash": "CgpV58FSvuzGmfZXfiAQfkdDMVcFtpMq91ahk2zNYZdjdR9",
    "name": "LunaNova-KSM-Val1-ValidatorA",
    "elected": false,
    "nominatorAddress": "5GgyyiDPHNKSoE2sWCn5dMuJmAgoXnM4dzrmPmBucakiqPYh",
    "reason": "LunaNova-KSM-Val1-ValidatorA does not have an identity set."
  },
  {
    "stash": "FrQ4W8Bo6wgXzkaGHLzVFSsfbWWHvqGGNP1YkRmTPSkN17J",
    "name": "otter-sv-validator-1",
    "elected": false,
    "nominatorAddress": "5GgyyiDPHNKSoE2sWCn5dMuJmAgoXnM4dzrmPmBucakiqPYh",
    "reason": "otter-sv-validator-1 offline. Offline since 0."
  },
  {
    "stash": "HRYTEruAjwDD46kkgaTYpGHQC6uea3AkeLJg4iterSmmjo2",
    "name": "Tornado-V1",
    "elected": false,
    "nominatorAddress": "5GgyyiDPHNKSoE2sWCn5dMuJmAgoXnM4dzrmPmBucakiqPYh",
    "reason": "Tornado-V1 offline. Offline since 0."
  },
  {
    "stash": "DAexrmQxJ8TKiqpcU2QSn2QiGppGCpWZkJ9p7Nyhm7DW6nB",
    "name": "liberty-sv-validator-0",
    "elected": false,
    "nominatorAddress": "5GgyyiDPHNKSoE2sWCn5dMuJmAgoXnM4dzrmPmBucakiqPYh",
    "reason": "liberty-sv-validator-0 does not have an identity set."
  }
]

Add endpoint to fetch individual candidates

In creating a details page, it would be nice to have an endpoint to query by address that return only the individual candidate data.

So something like /candidates/<validator_address/.

Prepare a second deployment pipeline

Improve nominations to happen every set amount of eras

... instead of 24 hour intervals.

Docs

Add some documentation to explain how everything works.

Add a "Bio" field for each candidate

Add a "Bio" field for each candidate, which can be specified via conifg.

Add Dependency Vulnerability checking to CI Pipeline

Like #18, having dependency vulnerability checking for all the libraries used can help mitigate exploits in dealing with seeds.

Some possible solutions:

RetireJS
Snyk
OWASP
Adding npm audit / yarn audit

CC: @kirushik @dud1337

Remove Sentries from Kusama Config

These aren't needed anymore and can be removed.

Abstract the constraints to be more modular

Right now the backend is pretty specific to the 1k-v use case, however if we abstracted the requirements of the validators into its own constraints.js file and allowed this to be passed in as an option it could allow the backend to be used by other nominator services.

Enable better fault detection

Right now faults are not always given for behaviour that should induce a fault. These fault events should be more strictly enforced.

Fix Docker-Compose setup

Right now the docker-compose setup for testing things locally doesn't quite work. The docker images are a bit out of date.

These should be updated, and also the telemetry frontend should be added as well for double checking things. Perhaps it might also be helpful to include another node or two.

https://github.com/wpank/polkadot-local-network/tree/master/scripts/testing

One thing I've also added to a similar approach in the above repo is having a bunch of scripts to do things to the docker containers. So having some of these might be a good way to test things out as well.

Parameterize all values in preparation of a second deployment

Add mongo-express to the deployment

Recover from inconsistent API connection

If the API is inconsistent when the ChronJob goes to endRound or startRound then the transactions will not be made. The script should have a way to recover from an inconsistent API connection.

It should detect if the connection is inconsistent.
If it's inconsistent it should wait until the API connecting is good before trying to send transactions.
It should have reliable monitoring of transactions.

Update README

The README should be updated with any new information and the differences between the Kusama program and the Polkadot program.

Rank Reform

For a period of every 4 eras on Kusama and every era on Polkadot (ie eraPeriod or floor(currentEra / 4)), we should make historical Rank events, indicating that an address has gone up a rank for the period of time.

It may look something like the following:

(say the current era is 1000)

{
    address: "<address>"
    eraPeriod: 250,
    erasActive: [996, 997, 999],
    newRank: 27
}

In this case it would be easier to keep track of previous events, and also compensate for times when the backend misses the times to increment rank. In this case it can backfill previous missed ranks appropriately by looking at the last rank event.

Ensure that the validator was actually active during the era as one of our nominations

Right now the script gives the validators the benefit of a doubt - i.e. if they weren't slashed then they must have done a good job. However, it should also check that our nominator was actually the one nominating it during the prior eras and that the validator was producing blocks.

Revise Nominations to Efficiently Distribute Stake

At the moment a lot of the nominator accounts distribute stake unevenly.

_doNominations should be revised so that each nominator accounts will nominate(account_balance / lowest_staked_validator * 1.05) amount of candidates.

clients aren't being tracked properly when updated

When a client updates they aren't being registered as running the latest version, and the backend requires a restart. I think the node registering logic is missing the field to update the client version.

TypeError: is not a function when running yarn docker

Not long after running yarn docker, I get the following error:

1kv_1         | (node:28) UnhandledPromiseRejectionWarning: TypeError: this.api.query.staking.activeEra is not a function
1kv_1         |     at ChainData.<anonymous> (/code/src/chaindata.ts:13:52)
1kv_1         |     at Generator.next (<anonymous>)
1kv_1         |     at /code/src/chaindata.ts:8:71
1kv_1         |     at new Promise (<anonymous>)
1kv_1         |     at __awaiter (/code/src/chaindata.ts:4:12)
1kv_1         |     at ChainData.getActiveEraIndex (/code/src/chaindata.ts:12:51)
1kv_1         |     at ScoreKeeper.<anonymous> (/code/src/scorekeeper.ts:198:27)
1kv_1         |     at Generator.next (<anonymous>)
1kv_1         |     at /code/src/scorekeeper.ts:8:71
1kv_1         |     at new Promise (<anonymous>)
1kv_1         | (node:28) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 2)
1kv_1         | (node:28) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

This is with:
node v13.10.1 (npm v6.14.2),
yarn 1.22.0

This is on a machine running ubuntu 19.10 with a fresh clone of this repo, and ranks don't end up increasing.

Strangely I don't get the error while on another machine (also ubuntu 19.10), with similar versions of node and yarn. Not really sure what to make of it.

Update Constraints to be Chain Agnostic

Right now, checkSingleCandidate is very kusama centric. This should be changed to be more general.

log when account balances are low

Add ability to use proxy account instead of controller account

Right now the service expects to have access to the controller account's seeds, we should also support having access to the proxy account's seeds.

batch API calls

All api calls should be batched in order to reduce the amount of "over the air" calls we do, as well as reduce the room for async failures and bugs.

Split nomination amounts between nominators

Review

scorekeeper.ts
nodes = nodes.filter((node: any) => node.offlineAccumulated / WEEK <= 0.02);
constant variable for the UP_TIME
index.ts & scorekeeper.ts

      const scorekeeperFrequency = Config.global.test? '0 0-59/3 * * * *' : '0 0 0 * * *';
      scorekeeper.begin(scorekeeperFrequency);

the crobJob will re-run for everyday, and the nomination transaction logic in the nominator.ts that could possible fail due to the RPC node stucks / tx fail or something like that.
say the sitation like : we have 5 validators would like to nominate (A,B,C,D,E)
A - Success
B - Fail
C - Success
How could we handle validator B nomination ? (based on the current situation, the validator might need to wait 1 day). Suggest to have the logic to handle that.
Also

      await nominator.nominate(toNominate);
      this.db.newTargets(nominator.address, toNominate);

  this.db.newTargets(nominator.address, toNominate);   <--- this should only update when nomination successful call.

By making the points more useful, it would be great to design the game like

Basic nomination amount (say 3,000 KSM) at the begining
As the validators' uptime keep consistenly stable say > 99% (increase 5% of the nomination)
if the validator is not that stable at all, we can just reduce the amount of nominations by certain percentage.

More better design would also consider the era points element as well, this can ensure the validator has done some actal works.
The above design would require we have multiple accounts for holding different amounts since we cannot change the amount as we want immediately. so would be like
Basic nomination amount: have 20 addresses contain 3000 KSM
Medium nomination amount: have 20 addresses contain 6000 KSM
and so on.

Dockerize the "fast substrate" executable

Currently we use a custom built "fast substrate" in order to do the testing. Ideally we can use a mocked substrate to do the testing, but that's probably a whole project in itself.

The least we can do is dockerize the fast substrate so that tests can reliably run on different architecture and CI.

Candidate Endpoint Improvements

Remove SentryId, SentryOnlineSince, SentryOfflineSince
Include an array of InvalidityReasons

For Polkadot:

Include Kusama 1kv Address

Telemetry connection never reconnects and leads to offline accumulated miscalculation

Transcript from Riot:

@will My validator has been up continuously since 9th May 2am BST but nominations have been inconsistent as of late i.e. on 1 day, then off 1 day, then on 1 day, then off 4 days, then on 3 days, then off since a day ago.
So I've dug through https://github.com/w3f/1k-validators-be, queried the backend URL mentioned above, can see a seemingly errnoneous "offlineAccumulated" value of 76104897 (ms) and then the text in "/invalid" that my node has been offline 1268 minutes this week.
As checkSingleCandidate() imposes a 98% weekly uptime requirement, this would explain why nominations were apparently pulled yesterday, and perhaps some of the other occasions as well.
As there happen to be 18 other nodes who also appear in the in the "/invalid" list, all with 1268 minutes of offline time this week, this would appear to be a problem on the 1K backend side.
i.e.
$ curl -s 'https://otv-backend.w3f.community/invalid'|grep 'has been offline 1268\.' -c
19
I did notice this in the validator logs:
2020-05-14 12:15:40 ?? Disconnected from /dns4/telemetry-backend.w3f.community/tcp/443/x-parity-wss/%2Fsubmit: Sink(Custom { kind: Other, error: B(Custom { kind: Other, error: Io(Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" }) }) })
2020-05-14 12:22:30 ?? Pre-sealed block for proposal at 2304220. Hash now 0x37bf872e064f3a1523dce3390a50c4a93256697106215d3c860a896ffc436b95, previously 0x4b645e02cc97fab6db108603d2c7bcff2a802405fe939e1```
Also running lsof on the validator process I don't see any connections to the w3f telemetry server, so it looks like once the telemetry connection goes down, polkadot never tries to bring it up again.
shadewolf
@will Obviously it is up to W3F and Parity as to how they nominate their stake but in this instance I would suggest they consider resetting the weekly offline accumulated time of affected validators.
sebytza05
shadewolf: yup, i found too in the validator logs 2020-05-14 13:15:40 ⚠️ Disconnected from /dns4/telemetry-backend.w3f.community/tcp/443/x-parity-wss/%2Fsubmit%2F: Sink(Custom { kind: Other, error: B(Custom { kind: Other, error: Io(Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" }) }) })
offlineSinceand has been offline n minutes this week are so useless right now

Summary

It looks like the problem is as mentioned above, that telemetry is kicking off validators and polkadot does not reconnect.

Update API for latest substrate changes

Add Static Code Analysis tooling to CI pipeline

As this will use the controller seeds to nominate, some precautions should be taken so that code can't exploit these. Adding in Static Code Analysis tooling as part of the CI pipeline can help mitigate this.

Some possible solutions:

@kirushik @dud1337 any thoughts on these?

Reset accumulated offline time every week

Add a chronjob that resets offline accumulated to 0.

Take either separate stash and controller or staking proxy

We should relax the requirement that stash != controller as long as there is a staking proxy instead.

Add accounting

Create FaultEvent's

Right now validators will accumulate faults, however there aren't may clear indicators as to what those faults were for.

It would be nice to have an endpoint to query with the validator, time, and fault reason.

This can be listed under an individual candidate endpoint ideally.

related: #460

add try/catch to nominate transactions

Make the nominating states persistent across restarts

The service will be occasionally or routinely restarted in order to add more validators to the configuration. This means that all state that is held in the program will be lost unless it's persisted in the database. Currently, we only persist node data in the database and keep nominator state in memory. We should add additional methods on the database to allow for saving nominator data.

Add Riot alerting

The ensureUpgrades procedure might be broken

Hi, I am running a validator based on the latest client code which is 0.8.24-5cbc418a-x86_64-linux-gnu right now. However, I keep getting "xxx is not running the latest client cod" message.
I thought the reason is the networkId (i.e., Sentry Node Network ID) of these validators is null. Therefore, the ensureUpgrades process will bypass these validators (

1k-validators-be/src/monitor.ts

Line 65 in 220bade

const nodes = await this.db.allNodes();

).
The allCandidates and allNodes should be the same lists now since the sentry node is no longer required. (

1k-validators-be/src/db/index.ts

Line 437 in 220bade

async allNodes(): Promise<any[]> {

Register candidates to a network ID of validator instead of name

Points are added to a bad validator instead of docking

1k-validators-be/src/scorekeeper.ts

Line 216 in 62b8e08

await this.addPoint(stash);

Slow Queries To Backend

Hitting the /candidates endpoint and others often will take a few seconds to resolve.

Fix ranks incosistently updating

By itself, ranks inconsistently update. As a stopgap, retroactive ranks have been introduced. Retroactive ranks should be removed and regular rank increases should be fixed.

Add monitoring tool

Adding PM2 would be great to monitor the process
https://www.npmjs.com/package/pm2

Add round records and the round server endpoint

Right now we all the data exposed as the /nodes or the /nominators endpoint. One thing that would be helpful is to expose the /rounds endpoint with historical data of nominators and their targets and whether the targets ended up doing good or bad. It should expose this for all prior rounds.

Monitor offline reports

and alert the Riot room when a validator was reported offline. Decide on the tolerance of offline reports before docking the rank of the validator.

w3f / 1k-validators-be Goto Github PK

1k-validators-be's People

Contributors

Stargazers

Watchers

Forkers

1k-validators-be's Issues

Summary

Recommend Projects

Recommend Topics

Recommend Org

Jobs