jamescowens / gridcoin-research Goto Github PK
View Code? Open in Web Editor NEWThis project forked from gridcoin-community/gridcoin-research
Gridcoin-Research
License: MIT License
This project forked from gridcoin-community/gridcoin-research
Gridcoin-Research
License: MIT License
Each superblock contains a ZERO field which holds a number of "0,15;" entries. This is a legacy compatibility issue which sole purpose was so the number of CPIDs sent to the VB scraper matched the number of CPIDs returned in the contract.
In the new scraper we should see if we can either remove the ZERO field completely or just replace it with <ZERO></ZERO>
. The former would be better while the latter might be required for compatibility.
We need to implement signals for state changes in the scraper/NN to facilitate the updating of the new UI screen for the NN (when it is built), without needing a timer based poll from the UI side.
The new scraper/NN is designed to provide the maximum possible compliance with GDPR requirements, within the limits imposed by blockchain technology. In particular here are some salient points:
The statistics data indexed by CPID, once the account names are filtered out and discarded, becomes pseudo-anonymized data for the purposes of the GDPR. For stats not preserved in a superblock, this pseudoanonymized data is deleted within 48 hours as stated above. For stats recorded in a superblock, we have the same challenge we have today regarding the immutability of that information once it is in the blockchain.
Perhaps we should consider requiring an acknowledgment of the inability to delete SB statistics data as part of the process of advertising a beacon, which is the starting point for Gridcoin statistics collection.
Currently while testing, anyone can add or remove a key in the SCRAPER section, which holds the list of authorized scrapers (by address).
This should remain this way during the middle stages of testing the new scraper/NN.
As we move towards final testing and then inclusion in master, we need to add this to the masterkey requirement.
Need to implement the capability to filter by team just in case we need to retain team filtering.
Magnitude vales above 32k are incorrectly handled by PackBinarySuperblock(std::string sBlock).
struct BinaryResearcher
{
std::array<unsigned char, 16> cpid;
int16_t magnitude;
};
Uses a signed 16 bit integer, but htobe16 is unsigned. This means that values above 32767 will actually be represented as negative BinaryResearcher.magnitudes. This corrects itself in the unpack though, but is still not a good idea.
To maintain protocol compatibility of the new NN with the old one to allow for a smoother switchover, even a "soft fork" of the SB formation, the md5 neural hash calculations from the contract data (SB core data) have been maintained by having ScraperGetNeuralHash() wrap the older hash function. This is only intended to be temporary during co-operation of the the old and new NN, and must be changed to a more secure hash function, preferably the native Hash() function as is indicated in the code comments as part of the next mandatory after the rollout of the new NN.
Alternatives, or improvments to this approach should be documented here and implementation changes reference this issue.
We currently use a homebrew, Gridcoin specific format for superblocks which mix tagged syntax with binary data. This format is cumbersome to use and it does not leverage existing serialization mechanisms.
We should replace all the contents in Pack/UnpackBinarySuperblock with serialization of classes using the functions in serialize.h. This has several benefits:
This may be a future issue as it changes the format of the superblocks and will require a mandatory.
This may lead to a slow memory leak. Should this do more? Destructor?
The non-scraper mode actually is an all in-memory operation. During testing, the stats map is saved to disk in the Scraper directory even in non-scraper mode, however the Scraper directory is not checked for existence/created. Review the directory sanity workflow and correct as required.
The following are the constant defined in the scraper...
// Define 48 hour retention time for stats files, current or not.
static int64_t SCRAPER_FILE_RETENTION_TIME = 48 * 3600;
// Define whether prior CScraperManifests are kept.
static bool SCRAPER_CMANIFEST_RETAIN_NONCURRENT = true;
// Define CManifest scraper object retention time.
static int64_t SCRAPER_CMANIFEST_RETENTION_TIME = 48 * 3600;
static bool SCRAPER_CMANIFEST_INCLUDE_NONCURRENT_PROJ_FILES = false;
static const double MAG_ROUND = 0.01;
static const double NEURALNETWORKMULTIPLIER = 115000;
static const double CPID_MAG_LIMIT = 32767;
// This settings below are important. This sets the minimum number of scrapers
// that must be available to form a convergence. Above this minimum, the ratio
// is followed. For example, if there are 4 scrapers, a ratio of 0.6 would require
// CEILING(0.6 * 4) = 3. See NumScrapersForSupermajority below.
// If there is only 1 scraper available, and the mininum is 2, then a convergence
// will not happen. Setting this below 2 will allow convergence to happen without
// cross checking, and is undesirable, because the scrapers are not supposed to be
// trusted entities.
static const unsigned int SCRAPER_SUPERMAJORITY_MINIMUM = 2;
// 0.6 seems like a reasonable standard for agreement. It will require...
// 2 out of 3, 3 out of 4, 3 out of 5, 4 out of 6, 5 out of 7, 5 out of 8, etc.
static const double SCRAPER_SUPERMAJORITY_RATIO = 0.6;
// By Project Fallback convergence rule as a ratio of projects converged vs whitelist.
// For 20 whitelisted projects this means up to five can be excluded and a contract formed.
static const double CONVERGENCE_BY_PROJECT_RATIO = 0.75;
Some of these probably deserve not to be static and have appccache entry overrides. Thoughts?
The IsScraperAuthorized() and IsScraperAuthorizedToBroadcastManifests() are currently stubbed out to true for final development and early testing. These will have to be fleshed out during middle and final testing on testnet before production. Some comments from the code...
// The idea here is that there are two levels of authorization. The first level is whether any
// node can operate as a "scraper", in other words, download the stats files themselves.
// The second level, which is the IsScraperAuthorizedToBroadcastManifests() function,
// is to authorize a particular node to actually be able to publish manifests.
// The second function is intended to override the first, with the first being a network wide
// policy. So to be clear, if the network wide policy has IsScraperAuthorized() set to false
// then ONLY nodes that have IsScraperAuthorizedToBroadcastManifests() can download stats at all.
// If IsScraperAuthorized() is set to true, then you have two levels of operation allowed.
// Nodes can run -scraper and download stats for themselves. They will only be able to publish
// manifests if for that node IsScraperAuthorizedToBroadcastManifests() evaluates to true.
// This allows flexibility in network policy, and will allow us to convert from a scraper based
// approach to convergence back to individual node stats download and convergence without a lot of
// headaches.
I think we need to identify the scraper via a public key to either the default address or a specific addresses/key generated for the purpose of scraper authorization.
For the IsScraperAuthorizedToBroadcastManifests() function, the public keys for the authorized scrapers should be injected into the appcache via signed admin message.
For the IsScraperAuthorized(), a policy value (flag) should be injected into the appcache via signed administrative message to either allow or disallow by network policy individual nodes from downloading stats.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.