GithubHelp home page GithubHelp logo

jamescowens / gridcoin-research Goto Github PK

View Code? Open in Web Editor NEW

This project forked from gridcoin-community/gridcoin-research

2.0 2.0 0.0 309.05 MB

Gridcoin-Research

License: MIT License

C 22.67% HTML 60.74% Python 0.11% Shell 0.91% C++ 8.11% Makefile 0.19% M4 0.25% Objective-C 0.01% Objective-C++ 0.01% Assembly 0.12% Perl 0.59% CMake 0.12% Dockerfile 0.01% Sage 0.04% Roff 0.22% Tcl 3.49% CSS 0.07% ASP.NET 0.02% JavaScript 0.08% C# 2.24%

gridcoin-research's People

Contributors

a123b avatar acey1 avatar barton2526 avatar cybertailor avatar cyrossignol avatar denravonska avatar div72 avatar erkan-yilmaz avatar fanquake avatar git-jiro avatar gridcoin avatar ifoggz avatar jamescowens avatar laanwj avatar lederstrumpf avatar letol avatar michalkania avatar minafarhan avatar nathanielcwm avatar opsinphark avatar peppernrino avatar personthingman2 avatar pythonix avatar roboticmind avatar scribblemaniac avatar sitiom avatar skcin avatar thecharlatan avatar themarix avatar tomasbrod avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

gridcoin-research's Issues

Remove ZERO from superblocks.

Each superblock contains a ZERO field which holds a number of "0,15;" entries. This is a legacy compatibility issue which sole purpose was so the number of CPIDs sent to the VB scraper matched the number of CPIDs returned in the contract.

In the new scraper we should see if we can either remove the ZERO field completely or just replace it with <ZERO></ZERO>. The former would be better while the latter might be required for compatibility.

Implement UI signals

We need to implement signals for state changes in the scraper/NN to facilitate the updating of the new UI screen for the NN (when it is built), without needing a timer based poll from the UI side.

Scraper/new NN audit for GDPR compliance

The new scraper/NN is designed to provide the maximum possible compliance with GDPR requirements, within the limits imposed by blockchain technology. In particular here are some salient points:

  1. Once the user stats are downloaded, they are immediately filtered and only the minimum fields required for stats computation are retained. The original files are deleted. In particular the account name is eliminated and only the CPID's and pure stats are retained.
  2. For the scraper nodes, the system provides for a defined retention period for statistics on disk, nominally set for 48 hours. Files aging beyond the retention period are deleted automatically.
  3. For non-scraper nodes not directly downloading stats, the current stats retention is in memory only, with no on-disk storage, and the in-memory retention period is also nominally set for 48 hours.
  4. The superblock production is similar to today, with only CPID's and magnitudes recorded in the superblock.

The statistics data indexed by CPID, once the account names are filtered out and discarded, becomes pseudo-anonymized data for the purposes of the GDPR. For stats not preserved in a superblock, this pseudoanonymized data is deleted within 48 hours as stated above. For stats recorded in a superblock, we have the same challenge we have today regarding the immutability of that information once it is in the blockchain.

Perhaps we should consider requiring an acknowledgment of the inability to delete SB statistics data as part of the process of advertising a beacon, which is the starting point for Gridcoin statistics collection.

Implement signing requirement for addkey into AppCache SCRAPER section

Currently while testing, anyone can add or remove a key in the SCRAPER section, which holds the list of authorized scrapers (by address).

This should remain this way during the middle stages of testing the new scraper/NN.

As we move towards final testing and then inclusion in master, we need to add this to the masterkey requirement.

Allowing magnitude entries above 32k blows up SB binary pack

Magnitude vales above 32k are incorrectly handled by PackBinarySuperblock(std::string sBlock).

struct BinaryResearcher
{
    std::array<unsigned char, 16> cpid;
    int16_t magnitude;
};

Uses a signed 16 bit integer, but htobe16 is unsigned. This means that values above 32767 will actually be represented as negative BinaryResearcher.magnitudes. This corrects itself in the unpack though, but is still not a good idea.

ScraperGetNeuralHash() is currently a wrapper around GetNeuralHash()

To maintain protocol compatibility of the new NN with the old one to allow for a smoother switchover, even a "soft fork" of the SB formation, the md5 neural hash calculations from the contract data (SB core data) have been maintained by having ScraperGetNeuralHash() wrap the older hash function. This is only intended to be temporary during co-operation of the the old and new NN, and must be changed to a more secure hash function, preferably the native Hash() function as is indicated in the code comments as part of the next mandatory after the rollout of the new NN.

Alternatives, or improvments to this approach should be documented here and implementation changes reference this issue.

Superblock (de)serialization using internal mechanisms.

We currently use a homebrew, Gridcoin specific format for superblocks which mix tagged syntax with binary data. This format is cumbersome to use and it does not leverage existing serialization mechanisms.

We should replace all the contents in Pack/UnpackBinarySuperblock with serialization of classes using the functions in serialize.h. This has several benefits:

  • Less code
  • More readable
  • Easier to maintain
  • Faster
  • Builtin endian handling
  • Smaller contracts

This may be a future issue as it changes the format of the superblocks and will require a mandatory.

Scraper fails to create Scraper directory in non-scraper mode on Windows

The non-scraper mode actually is an all in-memory operation. During testing, the stats map is saved to disk in the Scraper directory even in non-scraper mode, however the Scraper directory is not checked for existence/created. Review the directory sanity workflow and correct as required.

Consideration of making certain constants appcache entries instead.

The following are the constant defined in the scraper...

// Define 48 hour retention time for stats files, current or not.
static int64_t SCRAPER_FILE_RETENTION_TIME = 48 * 3600;
// Define whether prior CScraperManifests are kept.
static bool SCRAPER_CMANIFEST_RETAIN_NONCURRENT = true;
// Define CManifest scraper object retention time.
static int64_t SCRAPER_CMANIFEST_RETENTION_TIME = 48 * 3600;
static bool SCRAPER_CMANIFEST_INCLUDE_NONCURRENT_PROJ_FILES = false;
static const double MAG_ROUND = 0.01;
static const double NEURALNETWORKMULTIPLIER = 115000;
static const double CPID_MAG_LIMIT = 32767;
// This settings below are important. This sets the minimum number of scrapers
// that must be available to form a convergence. Above this minimum, the ratio
// is followed. For example, if there are 4 scrapers, a ratio of 0.6 would require
// CEILING(0.6 * 4) = 3. See NumScrapersForSupermajority below.
// If there is only 1 scraper available, and the mininum is 2, then a convergence
// will not happen. Setting this below 2 will allow convergence to happen without
// cross checking, and is undesirable, because the scrapers are not supposed to be
// trusted entities.
static const unsigned int SCRAPER_SUPERMAJORITY_MINIMUM = 2;
// 0.6 seems like a reasonable standard for agreement. It will require...
// 2 out of 3, 3 out of 4, 3 out of 5, 4 out of 6, 5 out of 7, 5 out of 8, etc.
static const double SCRAPER_SUPERMAJORITY_RATIO = 0.6;
// By Project Fallback convergence rule as a ratio of projects converged vs whitelist.
// For 20 whitelisted projects this means up to five can be excluded and a contract formed.
static const double CONVERGENCE_BY_PROJECT_RATIO = 0.75;

Some of these probably deserve not to be static and have appccache entry overrides. Thoughts?

Scraper authorization code is currently a stub

The IsScraperAuthorized() and IsScraperAuthorizedToBroadcastManifests() are currently stubbed out to true for final development and early testing. These will have to be fleshed out during middle and final testing on testnet before production. Some comments from the code...

// The idea here is that there are two levels of authorization. The first level is whether any
// node can operate as a "scraper", in other words, download the stats files themselves.
// The second level, which is the IsScraperAuthorizedToBroadcastManifests() function,
// is to authorize a particular node to actually be able to publish manifests.
// The second function is intended to override the first, with the first being a network wide
// policy. So to be clear, if the network wide policy has IsScraperAuthorized() set to false
// then ONLY nodes that have IsScraperAuthorizedToBroadcastManifests() can download stats at all.
// If IsScraperAuthorized() is set to true, then you have two levels of operation allowed.
// Nodes can run -scraper and download stats for themselves. They will only be able to publish
// manifests if for that node IsScraperAuthorizedToBroadcastManifests() evaluates to true.
// This allows flexibility in network policy, and will allow us to convert from a scraper based
// approach to convergence back to individual node stats download and convergence without a lot of
// headaches.

I think we need to identify the scraper via a public key to either the default address or a specific addresses/key generated for the purpose of scraper authorization.

For the IsScraperAuthorizedToBroadcastManifests() function, the public keys for the authorized scrapers should be injected into the appcache via signed admin message.

For the IsScraperAuthorized(), a policy value (flag) should be injected into the appcache via signed administrative message to either allow or disallow by network policy individual nodes from downloading stats.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.