jamescowens / gridcoin-research Goto Github PK

This project forked from gridcoin-community/gridcoin-research

Gridcoin-Research

License: MIT License

C 22.67% HTML 60.74% Python 0.11% Shell 0.91% C++ 8.11% Makefile 0.19% M4 0.25% Objective-C 0.01% Objective-C++ 0.01% Assembly 0.12% Perl 0.59% CMake 0.12% Dockerfile 0.01% Sage 0.04% Roff 0.22% Tcl 3.49% CSS 0.07% ASP.NET 0.02% JavaScript 0.08% C# 2.24%

gridcoin-research's People

Contributors

Stargazers

Watchers

gridcoin-research's Issues

Currently CScraperManifest::DeleteManifest(const uint256& nHash) just deletes the map entry

This may lead to a slow memory leak. Should this do more? Destructor?

ScraperGetNeuralHash() is currently a wrapper around GetNeuralHash()

To maintain protocol compatibility of the new NN with the old one to allow for a smoother switchover, even a "soft fork" of the SB formation, the md5 neural hash calculations from the contract data (SB core data) have been maintained by having ScraperGetNeuralHash() wrap the older hash function. This is only intended to be temporary during co-operation of the the old and new NN, and must be changed to a more secure hash function, preferably the native Hash() function as is indicated in the code comments as part of the next mandatory after the rollout of the new NN.

Alternatives, or improvments to this approach should be documented here and implementation changes reference this issue.

Remove ZERO from superblocks.

Each superblock contains a ZERO field which holds a number of "0,15;" entries. This is a legacy compatibility issue which sole purpose was so the number of CPIDs sent to the VB scraper matched the number of CPIDs returned in the contract.

In the new scraper we should see if we can either remove the ZERO field completely or just replace it with <ZERO></ZERO>. The former would be better while the latter might be required for compatibility.

Scraper authorization code is currently a stub

The IsScraperAuthorized() and IsScraperAuthorizedToBroadcastManifests() are currently stubbed out to true for final development and early testing. These will have to be fleshed out during middle and final testing on testnet before production. Some comments from the code...

// The idea here is that there are two levels of authorization. The first level is whether any
// node can operate as a "scraper", in other words, download the stats files themselves.
// The second level, which is the IsScraperAuthorizedToBroadcastManifests() function,
// is to authorize a particular node to actually be able to publish manifests.
// The second function is intended to override the first, with the first being a network wide
// policy. So to be clear, if the network wide policy has IsScraperAuthorized() set to false
// then ONLY nodes that have IsScraperAuthorizedToBroadcastManifests() can download stats at all.
// If IsScraperAuthorized() is set to true, then you have two levels of operation allowed.
// Nodes can run -scraper and download stats for themselves. They will only be able to publish
// manifests if for that node IsScraperAuthorizedToBroadcastManifests() evaluates to true.
// This allows flexibility in network policy, and will allow us to convert from a scraper based
// approach to convergence back to individual node stats download and convergence without a lot of
// headaches.

I think we need to identify the scraper via a public key to either the default address or a specific addresses/key generated for the purpose of scraper authorization.

For the IsScraperAuthorizedToBroadcastManifests() function, the public keys for the authorized scrapers should be injected into the appcache via signed admin message.

For the IsScraperAuthorized(), a policy value (flag) should be injected into the appcache via signed administrative message to either allow or disallow by network policy individual nodes from downloading stats.

Implement UI signals

We need to implement signals for state changes in the scraper/NN to facilitate the updating of the new UI screen for the NN (when it is built), without needing a timer based poll from the UI side.

Allowing magnitude entries above 32k blows up SB binary pack

Magnitude vales above 32k are incorrectly handled by PackBinarySuperblock(std::string sBlock).

struct BinaryResearcher
{
    std::array<unsigned char, 16> cpid;
    int16_t magnitude;
};

Uses a signed 16 bit integer, but htobe16 is unsigned. This means that values above 32767 will actually be represented as negative BinaryResearcher.magnitudes. This corrects itself in the unpack though, but is still not a good idea.

Scraper/new NN audit for GDPR compliance

The new scraper/NN is designed to provide the maximum possible compliance with GDPR requirements, within the limits imposed by blockchain technology. In particular here are some salient points:

Once the user stats are downloaded, they are immediately filtered and only the minimum fields required for stats computation are retained. The original files are deleted. In particular the account name is eliminated and only the CPID's and pure stats are retained.
For the scraper nodes, the system provides for a defined retention period for statistics on disk, nominally set for 48 hours. Files aging beyond the retention period are deleted automatically.
For non-scraper nodes not directly downloading stats, the current stats retention is in memory only, with no on-disk storage, and the in-memory retention period is also nominally set for 48 hours.
The superblock production is similar to today, with only CPID's and magnitudes recorded in the superblock.

The statistics data indexed by CPID, once the account names are filtered out and discarded, becomes pseudo-anonymized data for the purposes of the GDPR. For stats not preserved in a superblock, this pseudoanonymized data is deleted within 48 hours as stated above. For stats recorded in a superblock, we have the same challenge we have today regarding the immutability of that information once it is in the blockchain.

Perhaps we should consider requiring an acknowledgment of the inability to delete SB statistics data as part of the process of advertising a beacon, which is the starting point for Gridcoin statistics collection.

Add team filtering functionality

Need to implement the capability to filter by team just in case we need to retain team filtering.

Consideration of making certain constants appcache entries instead.

The following are the constant defined in the scraper...

// Define 48 hour retention time for stats files, current or not.
static int64_t SCRAPER_FILE_RETENTION_TIME = 48 * 3600;
// Define whether prior CScraperManifests are kept.
static bool SCRAPER_CMANIFEST_RETAIN_NONCURRENT = true;
// Define CManifest scraper object retention time.
static int64_t SCRAPER_CMANIFEST_RETENTION_TIME = 48 * 3600;
static bool SCRAPER_CMANIFEST_INCLUDE_NONCURRENT_PROJ_FILES = false;
static const double MAG_ROUND = 0.01;
static const double NEURALNETWORKMULTIPLIER = 115000;
static const double CPID_MAG_LIMIT = 32767;
// This settings below are important. This sets the minimum number of scrapers
// that must be available to form a convergence. Above this minimum, the ratio
// is followed. For example, if there are 4 scrapers, a ratio of 0.6 would require
// CEILING(0.6 * 4) = 3. See NumScrapersForSupermajority below.
// If there is only 1 scraper available, and the mininum is 2, then a convergence
// will not happen. Setting this below 2 will allow convergence to happen without
// cross checking, and is undesirable, because the scrapers are not supposed to be
// trusted entities.
static const unsigned int SCRAPER_SUPERMAJORITY_MINIMUM = 2;
// 0.6 seems like a reasonable standard for agreement. It will require...
// 2 out of 3, 3 out of 4, 3 out of 5, 4 out of 6, 5 out of 7, 5 out of 8, etc.
static const double SCRAPER_SUPERMAJORITY_RATIO = 0.6;
// By Project Fallback convergence rule as a ratio of projects converged vs whitelist.
// For 20 whitelisted projects this means up to five can be excluded and a contract formed.
static const double CONVERGENCE_BY_PROJECT_RATIO = 0.75;

Some of these probably deserve not to be static and have appccache entry overrides. Thoughts?

Implement signing requirement for addkey into AppCache SCRAPER section

Currently while testing, anyone can add or remove a key in the SCRAPER section, which holds the list of authorized scrapers (by address).

This should remain this way during the middle stages of testing the new scraper/NN.

As we move towards final testing and then inclusion in master, we need to add this to the masterkey requirement.

Scraper fails to create Scraper directory in non-scraper mode on Windows

The non-scraper mode actually is an all in-memory operation. During testing, the stats map is saved to disk in the Scraper directory even in non-scraper mode, however the Scraper directory is not checked for existence/created. Review the directory sanity workflow and correct as required.

Superblock (de)serialization using internal mechanisms.

We currently use a homebrew, Gridcoin specific format for superblocks which mix tagged syntax with binary data. This format is cumbersome to use and it does not leverage existing serialization mechanisms.

We should replace all the contents in Pack/UnpackBinarySuperblock with serialization of classes using the functions in serialize.h. This has several benefits:

Less code
More readable
Easier to maintain
Faster
Builtin endian handling
Smaller contracts

This may be a future issue as it changes the format of the superblocks and will require a mandatory.

jamescowens / gridcoin-research Goto Github PK

gridcoin-research's People

Contributors

Stargazers

Watchers

gridcoin-research's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs