brave-intl / bat-publisher Goto Github PK

View Code? Open in Web Editor NEW

11.0 11.0 14.0 1.19 MB

Routines to identify publishers for the BAT. ( deprecated )

License: Mozilla Public License 2.0

JavaScript 100.00%

bat-publisher's People

Contributors

Stargazers

Watchers

Forkers

rue-foundation monadicus riastradh-brave ryanml fardog cmjeong browser-privacy akkun2 knkgun isabella232 kdenhartog sathishvjd

bat-publisher's Issues

the first occurrence of provider-prefix should be provider-identity

in https://github.com/brave-intl/bat-publisher#syntax

Document stochastic voting procedure

The procedure implemented by Synopsis.prototype.weights in index.js determines how Brave distributes its votes to publishers according to the caller-supplied attention weights w_i and publisher pin weights p_i.

The procedure to distribute n votes is currently simply to draw a multinomial sample of publishers weighted by the w_i, ignoring the p_i. (Presumably the caller just sets w_i = p_i where p_i is specified, but I don't know; the caller is somewhere in browser-laptop or something.)

After #24 as it stands right now, the procedure to distribute n votes will become something like this:

Permute the list of publishers uniformly at random.
For each publisher i in the permuted order of step (1), distribute round(n*p_i) votes to the ith publisher (where round is IEEE 754 round-to-nearest/ties-to-even?), or stop if we have run out.
If any ballots remain after step (2), which happens if the sum of the p_i is below (one floating-point epsilon beneath) 1 - 1/n, randomly draw a multinomial sample of the publishers weighted by the w_i.

The attention weights and the pin weights are independent: a pinned publisher can have votes from step (2), arising from pinning, and from step (3), arising from attention.

(An n-way multinomial sample with k prescribed weights is an array of k counts adding up to n, equivalent to counting up a sequence of n independent categorical samples with the k prescribed weights -- i.e., an n-way multinomial sample is an array of counts from rolling a k-sided die n times.)

From what I have gathered, there are a few criteria constraining the procedure:

Votes are expensive: each one requires an independent round-trip to the server. So we want to minimize the number of votes to maintain low communication cost.
Votes are the granularity available to us. So we want to maximize the number of votes to maintain high granularity for user allocation of contributions.
We want the expected fraction of non-pinned votes for the ith publisher, integrated over all users over time, to be the caller's attention weight w_i.
- This should allow even infrequently attended sites to have the opportunity to profit from attention, even if w_i is so small it might be rounded away by some procedures.
- Limit? If there were ten billion users using Brave every second of the day for a year and used a site for a single second that day and the weights were the duration spent, that would be about 1/(1e10 * 60 * 60 * 24 * 365) ~= 2^-58. If we cut the weights off at 2^-64 as I expect we do right now with the uniform [0,1] sampler we're using, that's more than adequate for this extreme case so we don't have to worry about very small numbers in floating-point uniform [0,1] sampling (although it's not that hard to drive the limit below anything anyone including cryptographers should care about -- JavaScript code).
We want the expected fraction of pinned votes for the ith publisher to be the caller's pin weight p_i.
We want the pinned votes for a single user each settlement to be close to the user's pin weights times the number of ballots the user exchanged BAT for. Otherwise they may look at their receipts and be dismayed.
For example, if there are a million users each pinning half their contribution to a single publisher of their choice, and each user gets 10 votes, and if the votes are sampled by a naive categorical as in the first procedure, the expected number of users who will see none of their votes go to their 50%-pinned publisher is about 1000. This likely means some nontrivial fraction of 1000 unhappy users posting flames about Brave on the internet.

All this should be written down in a living document that we can use to assess (a) whether Brave is intended to do something sensible and (b) whether Brave actually implements what we intend.

Response to publisher/identity should be normalized

Requests to identity which return the same publisher should also return the same properties.

https://ledger.mercury.basicattentiontoken.org/v3/publisher/identity?publisher=washingtonpost.com returns publisher: "washingtonpost.com" and the property verified: true.

https://ledger.mercury.basicattentiontoken.org/v3/publisher/identity?publisher=www.washingtonpost.com returns also publisher: "washingtonpost.com" but the property verified is missing.

publisher voting is slightly skewed

random-lib doesn't generate a perfectly uniform float distribution, see brave/browser-laptop#6944 for details.

this has been discussed before; opening here so that it doesn't get lost.

https://github.com/brave-intl/bat-publisher/blob/master/index.js#L309

upgrade deps to clear npm audit

add a new synopsisOption "longtail"

it should be used to tailor the behavior of Synopsis.winners ... it should be a number from 0 to 100 (if outside that range, it is ignored). it specifies where the long tail starts when switching from proportional to statistical voting. the default value is 0, indicating statistical voting throughout.

Be less strict about the version of node and npm

See:

update random-lib -> 3.0.0

See brave/browser-laptop#6944 for context; random-lib<3.0.0 has bias in its random integer generation.

Sum weights in ascending order, not random or descending order

bat-publisher/index.js

Line 376 in 13686d8

upper += results[i].weight

All of the weights presumably have the same magnitude. The current logic sums them in descending order, adding highest weights first, which maximizes the approximation error of floating-point arithmetic. With #24 as it stands now, they will be summed in randomized order. To minimize the approximation error, the smallest terms should be added first.

oembed info may return user URL rather than channel URL

the scraper finds the channel URL, which is what the creator has authenticated with.

with https://www.youtube.com/watch?v=7CFlMfkEYsg we get youtube#channel:alexwykoff when we should get youtube#channel:UCOHbXC47OvFe-BCU2J0HerA which is the actual YouTube identifier.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble