GithubHelp home page GithubHelp logo

isabella232 / slim-list-lambda Goto Github PK

View Code? Open in Web Editor NEW

This project forked from brave/slim-list-lambda

0.0 0.0 0.0 232 KB

Lambda function for reducing EasyList + EasyPrivacy for use in iOS clients

License: Mozilla Public License 2.0

Makefile 5.85% JavaScript 93.69% Shell 0.47%

slim-list-lambda's Introduction

Slim List System

Slim List is a AWS lambda based crawling system for evaluating which EasyList and EasyPrivacy rules are the most useful. The main goal of the system is to shrink EasyList and EasyPrivacy so that they can be shipped in the iOS client.

Slim list consists of many AWS parts: S3 for scratch and final resuls, SQS for job queing, and multiple lambdas included in this repo.

How the Lambdas in the system interact

This lambda function is the entry point to the whole system. While its implemented as a single lambda function, its performs five distinct tasks. In order:

  1. brave/lambda_actions/crawl-dispatch.js fetches a new Alexa 10k list and then queues up the sites to crawl in SQS. This function is called once per crawl.
  2. brave/lambda_actions/crawl.js is called per page that needs to be crawled. It triggers a chrome instance to crawl a page, records everything thats fetched, writes a description of it to S3, and possibly kicks off more brave/lambda_actions/crawl.js instances to crawl child pages
  3. brave/lambda_actions/record.js is also called once for each page that is mesured. This invocation reads all the seralized data from the crawl.js invocation, and writes it to postgres. (This is a separate step to reduce the number of parallel jobs triggered in 1.iii, to avoid sinking the DB).
  4. brave/lambda_actions/build.js does the DB side analysis to determine which filter lists rules are popular enough to be included in “slim list”. It is also called once per crawl.
  5. brave/lambda_actions/assemble.js combines the slim list data with brave owned/authored lists, and produces an iOS content blocking rule file, as well as a corresponding DAT file to be loaded by adblock-rust browser-side. It will do this for each regional list as well. All of the outputs are stored in S3.

Structure of S3 Crawl Data

    <batch>
      domains.json
      rules.dat
      manifest.json
      data
        <domain>
          <depth-breath>.json
            {url: url crawled,
            data: urls requested,
            depth: depth of this report,
            breath: breath of this report,
            timestamp: ISO timestamp}

Deployment

Slim List lambdas are deployed into a staging and production account. In order to deploy to the staging environment, perform merges/pushes on the staging branch. To deploy to production, perform merges/pushes on the main branch. In order to gain access to these AWS environments, please ping DevOps team in #devops Brave Slack channel.

slim-list-lambda's People

Contributors

antonok-edm avatar hspencer77 avatar linhkikuchi avatar pes10k avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.