GithubHelp home page GithubHelp logo

web-discovery-project's Introduction

Web Discovery Project

This repository contains the client (extension) code for Web Discovery Project which runs in the Brave browser.

Setup

Linux

If you don't have Brave browser installed on your system: ``.

$ npm install
$ ./update-brave.sh
$ BRAVE_PATH=./brave/brave npm run start

You can also set the BRAVE_PATH environment variable to your global Brave binary $(which brave). The last command will build the extension and start Brave with the extension loaded. Everything should work locally with this setup. By default it will rely on the sandbox environment deployed on AWS.

Mac

$ BRAVE_PATH="/Applications/Brave Browser.app/Contents/MacOS/Brave Browser" npm run start

Documentation

For more information about the Web Discovery methodology, privacy and security guarantees as well as examples of messages sent, visit this README.

Manual setup

Yarn

$ yarn install --frozen-lock
$ yarn start:build # build extension
$ yarn start:brave # start Brave with extension loaded

Npm

$ npm ci
$ npm run start:build # build extension
$ npm run start:brave # start Brave with extension loaded

Patterns

There are prod and test versions of the patterns file. Test patterns are used for tests only. Prod patterns are fetched from CDN (https://patterns.hpn.brave.com/patterns.gz). If you have to change patterns during development you need to:

  1. Serve a gzipped patterns file locally using an HTTP server.
  2. Update patterns URL for your environment in the config file to point to your locally served file.
  3. Disable the signature verification of a patterns file by setting WDP_PATTERNS_SIGNING option to true in the config file for your environment. For sandbox environment such file is /configs/sandbox.js.

Useful commands

Open extension dev tools (burger menu > extensions > developer mode toggle > background page) then switch to console tab.

For query messages

Force updating WebDiscoveryProject patterns:

WDP.app.modules['web-discovery-project'].background.webDiscoveryProject.patternsLoader.resourceWatcher.forceUpdate()

After visiting a SERP page, force double-fetch to happen:

WDP.app.modules['web-discovery-project'].background.webDiscoveryProject.strictQueries.map(x=>x.tDiff=0)

For page messages

Open a new tab and visit https://www.marca.com/ (or another URL, and replace the occurrences in the following commands).

Force an active page (tab is still open) to the database to be double-fetched...

WDP.app.modules['web-discovery-project'].background.webDiscoveryProject._debugRemoveFromActivePages('https://www.marca.com/')

After forcing this, https://www.marca.com/ will no longer be in dict at:

WDP.app.modules['web-discovery-project'].background.webDiscoveryProject.state['v']

See URLs on database waiting to be double-fetched:

WDP.app.modules['web-discovery-project'].background.webDiscoveryProject.listOfUnchecked(1000000000000, 0, null, function(x) {console.log(x)})

Force a double-fetch of a single URL, (URL as appears in the table above, it might have been canonized)

WDP.app.modules['web-discovery-project'].background.webDiscoveryProject.forceDoubleFetch("https://www.marca.com/")

Tests

There are two kinds of tests in WDP: unit and integration. All of them run in CI and you can run then on your computer too.

Unit tests

$ ./fern.js test configs/ci/unit-tests.js

You should now get live feedback about the running tests. If you change the code, a rebuild will be triggered and tests will restart.

Fixtures

Some unit tests rely on fixtures which are directories containing page HTML and expected extracted data. Fixture names contain a search query and a date it was created on. For example if you want to add a new google fixture for the query george washington run the following commands:

cd ./modules/web-discovery-project/tests/unit
mkdir ./fixtures/content-extractor/go/george-washington-2023-10-04
./generate-fixtures.sh

Integration tests

Integration tests (in Brave):

./fern.js test configs/ci/integration-tests.js -l brave-web-ext --brave /opt/brave.com/brave/brave-browser

Regression tests

Regression tests (in Brave):

./fern.js test configs/ci/integration-tests.js -l brave-web-ext --grep UtilityRegression --brave /opt/brave.com/brave/brave-browser

Note that you should replace the path to Brave in the command above.

You can also use the --keep-open flag so that the test runner keeps watching for code changes and will restart the tests whenever that happens.

Another useful flag is --grep, which allows you to select a subset of tests to run based on their names. For example:

./fern.js test configs/ci/integration-tests.js -l brave-web-ext --brave /opt/brave.com/brave/brave-browser --keep-open --grep registerContentScript

Integration tests in Docker:

./run_tests_in_docker.sh "configs/ci/integration-tests.js -l brave-web-ext --brave /opt/brave.com/brave/brave-browser"

Copyright

Copyright ยฉ 2021 Brave Software. All rights reserved. Copyright ยฉ 2014 Cliqz GmbH. All rights reserved.

This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at https://mozilla.org/MPL/2.0/.

web-discovery-project's People

Contributors

artyuum avatar dependabot[bot] avatar diracdeltas avatar fmarier avatar jonathansampson avatar kdenhartog avatar lorenzominto avatar mihaiplesa avatar petemill avatar remusao avatar renovate[bot] avatar solso avatar thypon avatar yshym avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

web-discovery-project's Issues

Build-module issue: Function.prototype.apply was called on undefined

Hi,

while building brave from source on an Ubuntu 22.04 server, the process stopped with an error in the web-discovery-project. I basically followed the steps described in the Linux Development Environment Guide. You can find below the error that build-module command throws.

Do you know what goes wrong here? Is it an issue in my setup or a (dependency) bug?

Thank you!

# in web-discovery-project
 npm run build-module

> [email protected] build-module
> node fern.js build configs/module.js --environment prod

Starting build
(node:3305921) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
/path/brave-browser/src/brave/vendor/web-discovery-project/node_modules/esm/esm.js:1

[...]

TypeError: Function.prototype.apply was called on undefined, which is a undefined and not a function
    at $o (/path/brave-browser/src/brave/vendor/web-discovery-project/node_modules/esm/esm.js:1:224377)
    at wu (/path/brave-browser/src/brave/vendor/web-discovery-project/node_modules/esm/esm.js:1:227324)
    at Eu (/path/brave-browser/src/brave/vendor/web-discovery-project/node_modules/esm/esm.js:1:227999)
    at Module.<anonymous> (/path/brave-browser/src/brave/vendor/web-discovery-project/node_modules/esm/esm.js:1:295976)
    at n (/path/brave-browser/src/brave/vendor/web-discovery-project/node_modules/esm/esm.js:1:279589)
    at requireBrocfile (/path/brave-browser/src/brave/vendor/web-discovery-project/node_modules/broccoli/dist/load_brocfile.js:37:20)
    at Object.loadBrocfile (/path/brave-browser/src/brave/vendor/web-discovery-project/node_modules/broccoli/dist/load_brocfile.js:63:22)
    at getBroccoliBuilder (/path/brave-browser/src/brave/vendor/web-discovery-project/fern/common.js:80:29)
    at Command.<anonymous> (/path/brave-browser/src/brave/vendor/web-discovery-project/fern/commands/build.js:70:23)

Node.js v21.4.0

`npm run build-module` is generating different output every time it's run

The content from npm run build-module should be the same. It seems it isn't due to the use of temp directories to perform some compilation and then bundling. Ideally we should either not use temp directories (and keep the path constant) or tell its bundler to ignore the temp directory path part of each module's name when bundling, like we can with webpack (by providing it a context path).

This issue is causing a build problem when needing intel and arm resources for Brave on macos to be identical brave/brave-browser#38435

Donate crawl data to the Internet Archive

Hello, this is more related to Brave Search itself. but can you get in contact with the Internet Archive and donate crawl data to the wayback machine? Alexa Internet did that until it's disintegration in 2020 by Amazon. The wayback machine is an extremely useful resource that is used all across the world by researchers, journalists, and basically anyone on YouTube doing an investigation related to something online, like the origin of an urban legend, for instance. Since you already have the Wayback Machine integrated into the browser, the chance of a link completely lost to time should decrease if you donate the crawl data. The crawl data donated by Brave would be extremely helpful, and ask the Archive staff to give you a list of all archived URLs on the wayback machine, deduplicate them, and add the links that are both not crawled by Brave and are still up to the search results, to make a third search engine to rival Google and Bing. Other good sources of links could be https://ODCrawler.xyz , and many AI image datasets.

CI should be running npm/yarn audit on PRs

We are seeing increasing builds fail in brave-core due to npm audit issues introduced by this repo. To surface these easier, PRs here should fail whenever there is an error in npm audit.

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Rate-Limited

These updates are currently rate-limited. Click on a checkbox below to force their creation now.

  • chore(deps): update actions/checkout action to v4.1.6
  • chore(deps): update dependency @types/punycode to v2.1.4
  • chore(deps): update dependency chai-as-promised to v7.1.2
  • chore(deps): update dependency eslint_d to v13.1.2
  • chore(deps): update dependency fake-indexeddb to v5.0.2
  • chore(deps): update dependency tough-cookie to v4.1.4
  • chore(deps): update dependency eslint-config-prettier to v9.1.0
  • chore(deps): update dependency mocha to v10.4.0 (mocha, @types/mocha)
  • chore(deps): update dependency ramda to v0.30.1
  • chore(deps): update dependency testem to v3.14.0
  • chore(deps): update dependency typescript to v5.4.5
  • chore(deps): update dependency web-ext to v7.12.0
  • chore(deps): update dependency webpack to v5.91.0
  • chore(deps): update dependency ws to v8.17.0
  • fix(deps): update dependency tldts-experimental to v6.1.24
  • fix(deps): update dependency webextension-polyfill to v0.12.0
  • chore(deps): update actions/cache action to v4
  • ๐Ÿ” Create all rate-limited PRs at once ๐Ÿ”

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

dockerfile
Dockerfile.ci
github-actions
.github/workflows/build.yml
  • actions/checkout v4.1.1@b4ffde65f46336ab88eb53be808477a3936bae11
  • actions/setup-node v4.0.1@b39b52d1213e96004bfcb1c61a8a6fa8ab84f3e8
  • actions/cache v3.3.2@704facf57e6136b1bc63b828d79edcd491f0ee84
  • ubuntu 22.04
.github/workflows/codeql-analysis.yml
  • actions/checkout v4.1.1@b4ffde65f46336ab88eb53be808477a3936bae11
  • github/codeql-action v3.23.0@e5f05b81d5b6ff8cfa111c80c22c5fd02a384118
  • github/codeql-action v3.23.0@e5f05b81d5b6ff8cfa111c80c22c5fd02a384118
  • github/codeql-action v3.23.0@e5f05b81d5b6ff8cfa111c80c22c5fd02a384118
.github/workflows/integration-tests.yml
  • actions/checkout v4.1.1@b4ffde65f46336ab88eb53be808477a3936bae11
  • ubuntu 22.04
.github/workflows/regression-tests.yml
  • actions/checkout v4.1.1@b4ffde65f46336ab88eb53be808477a3936bae11
  • ubuntu 22.04
.github/workflows/unit-tests.yml
  • actions/checkout v4.1.1@b4ffde65f46336ab88eb53be808477a3936bae11
  • actions/setup-node v4.0.1@b39b52d1213e96004bfcb1c61a8a6fa8ab84f3e8
  • actions/cache v3.3.2@704facf57e6136b1bc63b828d79edcd491f0ee84
  • ubuntu 22.04
npm
package.json
  • @cliqz/url-parser 1.1.5
  • abortcontroller-polyfill 1.7.5
  • dexie 3.2.4
  • linkedom 0.14.12
  • pako 2.1.0
  • punycode 2.3.1
  • tldts-experimental 6.0.19
  • webextension-polyfill 0.9.0
  • @babel/core 7.23.3
  • @babel/eslint-parser 7.23.3
  • @babel/plugin-proposal-class-properties 7.18.6
  • @babel/plugin-proposal-dynamic-import 7.18.6
  • @babel/plugin-transform-modules-commonjs 7.23.3
  • @babel/plugin-transform-modules-systemjs 7.23.3
  • @babel/plugin-transform-optional-chaining 7.23.3
  • @babel/plugin-transform-regenerator 7.23.3
  • @babel/plugin-transform-template-literals 7.23.3
  • @babel/preset-env 7.23.3
  • @babel/preset-typescript 7.23.3
  • @types/chrome 0.0.251
  • @types/mocha 10.0.4
  • @types/node 20.9.0
  • @types/punycode 2.1.2
  • @typescript-eslint/eslint-plugin 6.11.0
  • @typescript-eslint/parser 6.11.0
  • @xmldom/xmldom 0.8.10
  • body-parser 1.20.2
  • broccoli 3.5.2
  • broccoli-babel-transpiler 8.0.0
  • broccoli-concat 4.2.5
  • broccoli-file-creator 2.1.1
  • broccoli-funnel 3.0.8
  • broccoli-imagemin 2.0.1
  • broccoli-merge-trees 4.2.0
  • broccoli-plugin 4.0.7
  • broccoli-source 3.0.1
  • camelcase 6.3.0
  • chai 4.3.10
  • chai-as-promised 7.1.1
  • chai-dom 1.12.0
  • colors 1.4.0
  • commander 11.1.0
  • concurrently 8.2.2
  • console-ui 3.1.2
  • cookie-parser 1.4.6
  • eslint_d 13.1.0
  • eslint-config-prettier 9.0.0
  • eslint-plugin-compat 4.2.0
  • express 4.19.2
  • fake-indexeddb 5.0.1
  • filehound 1.17.6
  • full-icu 1.5.0
  • git-describe 4.1.1
  • git-rev 0.2.1
  • glob 8.1.0
  • jsdom 22.1.0
  • mocha 10.2.0
  • mockdate 3.0.5
  • node-notifier 10.0.1
  • path-browserify 1.0.1
  • ramda 0.29.1
  • sinon 17.0.1
  • sinon-chai 3.7.0
  • strip-json-comments 3.1.1
  • systemjs-plugin-json 0.3.0
  • testem 3.10.1
  • tree-sync 2.1.0
  • typescript 5.2.2
  • watch-detector 1.0.2
  • web-ext 7.11.0
  • webpack 5.89.0
  • ws 8.14.2
  • ansi-html 0.0.9
  • express 4.19.2
  • tough-cookie 4.1.3

  • Check this box to trigger a request for Renovate to run again on this repository

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.