GithubHelp home page GithubHelp logo

serverless-dns / blocklists Goto Github PK

View Code? Open in Web Editor NEW
78.0 5.0 23.0 375 KB

An opinionated collection of blocklists for RethinkDNS.

Home Page: https://rethinkdns.com/configure

License: Mozilla Public License 2.0

Python 43.19% JavaScript 52.46% Dockerfile 0.89% Shell 3.46%
doh privacy blocklists trackers hostfile adblock ads pi-hole

blocklists's Introduction

It's a bird, it's a plane, it's... a self-hosted, pi-hole esque, DNS resolver

serverless-dns is a Pi-Hole esque content-blocking, serverless, stub DNS-over-HTTPS (DoH) and DNS-over-TLS (DoT) resolver. Runs out-of-the-box on Cloudflare Workers, Deno Deploy, Fastly Compute@Edge, and Fly.io. Free tiers of all these services should be enough to cover 10 to 20 devices worth of DNS traffic per month.

The RethinkDNS resolver

RethinkDNS runs serverless-dns in production at these endpoints:

Cloud platform Server locations Protocol Domain Usage
⛅ Cloudflare Workers 280+ (ping) DoH sky.rethinkdns.com configure
🦕 Deno Deploy 30+ (ping) DoH private beta
⏱️ Fastly Compute@Edge 80+ (ping) DoH private beta
🪂 Fly.io 30+ (ping) DoH and DoT max.rethinkdns.com configure

Server-side processing takes from 0 milliseconds (ms) to 2ms (median), and end-to-end latency (varies across regions and networks) is between 10ms to 30ms (median).

Self-host

Cloudflare Workers is the easiest platform to setup serverless-dns:

Deploy to Cloudflare Workers

Deploy to Fastly

For step-by-step instructions, refer:

Platform Difficulty Runtime Doc
⛅ Cloudflare Easy v8 Isolates Hosting on Cloudflare Workers
🦕 Deno.com Moderate Deno Isolates Hosting on Deno.com
⏱️ Fastly Compute@Edge Easy Fastly JS Hosting on Fastly Compute@Edge
🪂 Fly.io Hard Node MicroVM Hosting on Fly.io

To setup blocklists, visit https://<my-domain>.tld/configure from your browser (it should load something similar to RethinkDNS' configure page).

For help or assistance, feel free to open an issue or submit a patch.


Development

OpenSSF Scorecard

Setup

Code:

# navigate to work dir
cd /my/work/dir

# clone this repository
git clone https://github.com/serverless-dns/serverless-dns.git

# navigate to serverless-dns
cd ./serverless-dns

Node:

# install node v19+ via nvm, if required
# https://github.com/nvm-sh/nvm#installing-and-updating
wget -qO- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash
nvm install --lts

# download dependencies
npm i

# (optional) update dependencies
npm update

# run serverless-dns on node
./run n

# run a clinicjs.org profiler
./run n [cpu|fn|mem]

Deno:

# install deno.land v1.22+
# https://github.com/denoland/deno/#install
curl -fsSL https://deno.land/install.sh | sh

# run serverless-dns on deno
./run d

Fastly:

# install node v18+ via nvm, if required
# install the Fastly CLI
# https://developer.fastly.com/learning/tools/cli

# run serverless-dns on Fastly Compute@Edge
./run f

Wrangler:

# install Cloudflare Workers (cli) aka Wrangler
# https://developers.cloudflare.com/workers/cli-wrangler/install-update
npm i wrangler --save-dev

# run serverless-dns on Cloudflare Workers (cli)
# Make sure to setup Wrangler first:
# https://developers.cloudflare.com/workers/cli-wrangler/authentication
./run w

# profile wrangler with Chrome DevTools
# blog.cloudflare.com/profiling-your-workers-with-wrangler

Code style

Commits on this repository enforces the Google JavaScript style guide (ref: .eslintrc.cjs). A git pre-commit hook that runs linter (eslint) and formatter (prettier) on .js files. Use git commit --no-verify to bypass this hook.

Pull requests are also checked for code style violations and fixed automatically where possible.

Env vars

Configure env.js if you need to tweak the defaults. For Cloudflare Workers, setup env vars in wrangler.toml, instead. For Fastly Compute@Edge, setup env vars in fastly.toml, instead.

Request flow

  1. The request/response flow: client <-> src/server-[node|workers|deno] <-> doh.js <-> plugin.js
  2. The plugin.js flow: user-op.js -> cache-resolver.js -> cc.js -> resolver.js

Auth

serverless-dns supports authentication with an alpha-numeric bearer token for both DoH and DoT. For a token, msg-key (secret), append the output of hex(hmac-sha256(msg-key|domain.tld), msg) to ACCESS_KEYS env var in csv format. Note: msg is currently fixed to sdns-public-auth-info.

  1. DoH: place the msg-key at the end of the blockstamp, like so: 1:1:4AIggAABEGAgAA:<msg-key> (here, 1 is the version, 1:4AIggAABEGAgAA is the blockstamp, <msg-key> is the auth secret, and : is the delimiter).
  2. DoT: place the msg-key at the end of the SNI (domain-name) containing the blockstamp: 1-4abcbaaaaeigaiaa-<msg-key> (here 1 is the version, 4abcbaaaaeigaiaa is the blockstamp, <msg-key> is the auth secret, and - is the delimeter).

If the intention is to use auth with DoT too, keep msg-key shorter (8 to 24 chars), since subdomains may only be 63 chars long in total.

You can generate the access keys for your fork from max.rethinkdns.com, like so:

msgkey="ShortAlphanumericSecret"
domain="my-serverless-dns-domain.tld"
curl 'https://max.rethinkdns.com/genaccesskey?key='"$msgkey"'&dom='"$domain"
# output
# {"accesskey":["my-serverless-dns-domain.tld|deadbeefd3adb33fa2bb33fd3eadf084beef3b152beefdead49bbb2b33fdead83d3adbeefdeadb33f"],"context":"sdns-public-auth-info"}

Logs and Analytics

serverless-dns can be setup to upload logs via Cloudflare Logpush.

  1. Setup a Logpush job:
    CF_ACCOUNT_ID=<hex-cloudflare-account-id>
    CF_API_KEY=<api-key-with-logs-edit-permission-at-account-level>
    R2_BUCKET=<r2-bucket-name>
    R2_ACCESS_KEY=<r2-access-key-for-the-bucket>
    R2_SECRET_KEY=<r2-secret-key-with-read-write-permissions>
    # optional, setup a filter such that only logs form this worker ends up being pushed; but if you
    # do not need a filter on Worker name (script-name), edit the "filter" field below accordingly.
    SCRIPT_NAME=<name-of-the-worker-as-in-wrangler-toml>
    # for more options, ref: developers.cloudflare.com/logs/get-started/api-configuration
    # Logpush API with cURL: developers.cloudflare.com/logs/tutorials/examples/example-logpush-curl
    # Available Logpull fields: developers.cloudflare.com/logs/reference/log-fields/account/workers_trace_events
    curl -s -X POST "https://api.cloudflare.com/client/v4/accounts/${CF_ACCOUNT_ID}/logpush/jobs" \
        -H "Authorization: Bearer ${CF_API_KEY}" \
        -H 'Content-Type: application/json' \
        -d '{
            "name": "dns-logpush",
            "logpull_options": "fields=EventTimestampMs,Outcome,Logs,ScriptName&timestamps=rfc3339",
            "destination_conf": "r2://'"$R2_BUCKET"'/{DATE}?access-key-id='"${R2_ACCESS_KEY}"'&secret-access-key='"${R2_SECRET_KEY}"'&account-id='"{$CF_ACCOUNT_ID}"',
            "dataset": "workers_trace_events",
            "filter": "{\"where\":{\"and\":[{\"key\":\"ScriptName\",\"operator\":\"contains\",\"value\":\"'"${SCRIPT_NAME}"'\"},{\"key\":\"Outcome\",\"operator\":\"eq\",\"value\":\"ok\"}]}}",
            "enabled": true,
            "frequency": "low"
        }'
  2. Set wrangler.toml property logpush = true, which enables Logpush.
  3. (Optional) env var LOG_LEVEL = "logpush", which raises the log-level such that only request and error logs are emitted.
  4. (Optional) Set env var LOGPUSH_SRC = "csv,of,subdomains", which makes log-pusher.js emit request logs only if Workers hostname contains one of the subdomains.

Logs published to R2 can be retrieved either using R2 Workers, the R2 API, or the Logpush API.

Workers Analytics, if enabled, is pushed against a log-key, lid, which if unspecified is set to hostname of the serverless deployment with periods, ., replaced with underscores, _. Auth must be setup when querying for Analytics via the API which returns a json; ex: https://max.rethinkdns.com/1:<optional-stamp>:<msg-key>/analytics?t=<time-interval-in-mins>&f=<field-name>. Possible fields are ip (client ip), qname (dns query name), region (resolver region), qtype (dns query type), dom (top-level domains), ansip (dns answer ips), and cc (ans ip country codes).

Log capture and analytics isn't yet implemented for Fly and Deno Deploy.


A note about runtimes

Deno Deploy (cloud) and Deno (the runtime) do not expose the same API surface (for example, Deno Deploy only supports HTTP/S server-listeners; whereas, Deno suports raw TCP/UDP/TLS in addition to plain HTTP and HTTP/S).

Except on Node, serverless-dns uses DoH upstreams defined by env vars, CF_DNS_RESOLVER_URL / CF_DNS_RESOLVER_URL_2. On Node, the default DNS upstream is 1.1.1.2 (ref) or the recursive DNS resolver at fdaa::3 when running on Fly.io.

The entrypoints for Node and Deno are src/server-node.js, src/server-deno.ts respectively, and both listen for TCP-over-TLS, HTTP/S connections; whereas, the entrypoint for Cloudflare Workers, which only listens over HTTP (cli) or over HTTP/S (prod), is src/server-workers.js; and for Fastly its src/server-fastly.js.

Local (non-prod) setups on Node, key (private) and cert (public chain) files, by default, are read from paths defined in env vars, TLS_KEY_PATH and TLS_CRT_PATH.

Whilst for prod setup on Node (on Fly.io), either TLS_OFFLOAD must be set to true or key and cert must be base64 encoded in env var TLS_CERTKEY (ref), like so:

# EITHER: offload tls to fly.io and set tls_offload to true
TLS_OFFLOAD="true"
# OR: base64 representation of both key (private) and cert (public chain)
TLS_CERTKEY="KEY=b64_key_content\nCRT=b64_cert_content"

For Deno, key and cert files are read from paths defined in env vars, TLS_KEY_PATH and TLS_CRT_PATH (ref).

Process bringup is different for each of these runtimes: For Node, src/core/node/config.js governs the bringup; while for Deno, it is src/core/deno/config.ts, and for Workers it is src/core/workers/config.js. src/system.js pub-sub co-ordinates the bringup phase among various modules.

On Node and Deno, in-process DNS caching is backed by @serverless-dns/lfu-cache; Cloudflare Workers is backed by both Cache Web API and in-process lfu caches. To disable caching altogether on all three platfroms, set env var, PROFILE_DNS_RESOLVES=true.

Cloud

Cloudflare Workers, and Deno Deploy are ephemeral, as in, the "process" that serves client requests is not long-lived, and in fact, two back-to-back requests may be served by two different isolates ("processes"). Fastly Compute@Edge is the also ephemeral but does not use isolates, instead Fastly creates and destroys a wasmtime sandbox for each request. Resolver on Fly.io, running Node, is backed by persistent VMs and is hence longer-lived, like traditional "serverfull" environments.

For Deno Deploy, the code-base is bundled up in a single javascript file with deno bundle and then handed off to Deno.com.

Cloudflare Workers build-time and runtime configurations are defined in wrangler.toml. Webpack5 bundles the files in an ESM module which is then uploaded to Cloudflare by Wrangler.

Fastly Compute@Edge build-time and runtime configurations are defined in fastly.toml. Webpack5 bundles the files in an ESM module which is then compiled to WASM by npx js-compute-runtime and subsequently packaged and published to Fastly Compute@Edge with the Fastly CLI.

For Fly.io, which runs Node, the runtime directives are defined in fly.toml (used by dev and live deployment-types), while deploy directives are in node.Dockerfile. flyctl accordingly sets up serverless-dns on Fly.io's infrastructure.

# build and deploy for cloudflare workers.dev
npm run build
# usually, env-name is prod
npx wrangler publish [-e <env-name>]

# bundle, build, and deploy for fastly compute@edge
# developer.fastly.com/reference/cli/compute/publish
fastly compute publish

# build and deploy to fly.io
npm run build:fly
flyctl deploy --dockerfile node.Dockerfile --config <fly.toml> [-a <app-name>] [--image-label <some-uniq-label>]

For deploys offloading TLS termination to Fly.io (B1 deployment-type), the runtime directives are instead defined in fly.tls.toml, which sets up HTTP2 Cleartext and HTTP/1.1 on port 443, and DNS over TCP on port 853.

Ref: github/workflows.

Blocklists

190+ blocklists are compressed in a Succinct Radix Trie (based on Steve Hanov's impl) with modifications to speed up string search (lookup) at the expense of "succintness". The blocklists are versioned with unix timestamp (defined in src/basicconfig.json downloaded by pre.sh), which is generated once every week, but we'd like to generate 'em daily / hourly, if possible see), and hosted on Cloudflare R2 (env var: CF_BLOCKLIST_URL).

serverless-dns downloads 3 blocklist files required to setup the radix-trie during runtime bring-up or, downloads them lazily, when serving a DNS request.

serverless-dns compiles around ~13M entries (as of Jan 2023) from around 190+ blocklists. These are defined in the serverless-dns/blocklists repository.

blocklists's People

Contributors

ahmed-tasaly avatar anuraag488 avatar arfshl avatar badmojr avatar bongochong avatar elliotwutingfeng avatar hub8888 avatar ignoramous avatar masterkia avatar mtxadmin avatar peterdavehello avatar santhosh-ponnusamy avatar shuvashish76 avatar signed-log avatar spencerisgiddy avatar tim-hub avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

blocklists's Issues

Update readme with some minor details

Please mention following details on README page

  1. To double check blocklist that if it already exist in the list before submitting a PR
    (few duplicate examples : #13, #26, #30)
  2. Details about "pack": [ ]

Incorrect behavior from ABP regex

When using the EasyPrivacy list, the resulting domain set includes soundcloud.com, which is incorrect.

This is the result of the rule: ||soundcloud.com^$ping

Test case:

import re

# https://github.com/serverless-dns/blocklists/blob/f57b1d/download.py#L206
rexpr = re.compile(r"^(\|\||[a-zA-Z0-9])([a-zA-Z0-9][a-zA-Z0-9-_.]+)((\^[a-zA-Z0-9\-\|\$\.\*]*)|(\$[a-zA-Z0-9\-\|\.])*|(\\[a-zA-Z0-9\-\||\^\.]*))$", re.M)

txt = "||soundcloud.com^$ping"

print(re.search(rexpr, txt).groups())

Result:

('||', 'soundcloud.com', '^$ping', '^$ping', None, None)

Remove "malwaredomains.com" list and replace it with Urlhaus Filterlists

Hello

Malwaredomains.com is dead.

If you want to support a cached version of it, it is your choice.

But to replace its protection, most adblockers including uBo have switched to Urlhaus Filterlists.

uBlockOrigin/uBlock-issues#1116

https://gitlab.com/curben/urlhaus-filter

There also other malware lists to consider in that issue of uBo (i haven't checked that if you already have them ir not but you had not urlhaus so i wanted to suggest it at least)

More links to the matter:

StevenBlack/hosts#1455

notracking/hosts-blocklists#398

AdguardTeam/FiltersRegistry#364

Best wishes.

ThreatIntelligence: TBLP, UrlHaus, PupFilter, MalwareFilter, HostsVN, Red.Flag

Remove Energized Ultimate & Unified for the time being.

I don’t see the point to having Energized ultimate and energized unified since they have plenty of false positives and rethinkdns doesn’t supplement with a whitelist. Currently, some of the energized lists are conflicting with nextdns’s service and I think the inconsistency of these 2 lists aren’t worth it for a service like rethinkdns which is limited by cloudflare’s limit. Another reason is that there are currently 230 open issues for the energized lists which are seemingly being ignored. Anyways, that’s my suggestion.

Delta updates

Update more frequently (once every hour?), and see if delta updates are possible? Ship only bytes that have changed since the previous version. Like a git-patch.

https://archive.is/RiUEl

blocklistConfig.json in today's reality

Now that the blocklists cat is out of the bag, time to get rid of an unused field and introduce a new one.

  • Remove unused field: uname. It was included for book-keeping purposes from when we tracked nextdns/metadata for reasons that don't exist anymore.

  • Add new optional string field: status, which can assume dead, alive, unknown to depict the state of the blocklist. Since we can't recycle values once assigned (unless version is bumped up), we'd have to keep those entries around for book-keeping purposes.

  • Add fields for homepage and a short description of the blocklist.

  • Overall, though, blocklistConfig.json must also capture its own version, so that's another field to introduce at the top-level.

Introduce packs in v2

Today, packs like Gambling, Porn, Dating, Social Networks are implicit maintained aloof from the configuration (shown in the simple view of the RethinkDNS configure webpage).

In v2 of the json, introduce a new field viz. packs which is a set of all packs a particular list belongs to. This supersedes the current field subg, which is to be removed from version 2.

ex:

"packs": ["facebook", "social-network"]

Separate out s3 upload and radix-trie generation

Today, createTrie.js does both s3 upload and creating the compact, compressed radix-trie from the downloaded blocklists defined in blocklistConfig.json.

Split the upload part into its own file (uploadTrie.js or upload.py and rename downloadFromBlocklistConfig.py to download.py?) and creation can remain with createTrie.js as the name makes it apparent.

Attempt to move as many lists from original source to Ultimate Hosts Blocklist

I was wondering if I can try to move a bunch of lists from their original source to Ultimate Hosts Blocklist. What this does is cuts down on the amount of domains/hosts that are part of the lists by a light amount. An example of what I mean is Wally3k's list has 350 lines of domains in the original domains form and then has 313 lines of domains when cleaning it. I can probably move around 15 lists over and save a few thousand rules hopefully

PR #36

Fix PR #36 and merge it in as a separate commit.

Unblock instagram/make a whitelist for instagram

So I’ve noticed that almost any Adblock list blocks instagram. I’ve used the search tool to see what lists includes graph.instagram.com but it seems to be misleading as other lists have been breaking instagram for me. If there is some way/whitelist to unblock specifically instagram and Facebook. That would be awesome for my use case

OISD Extra

A user says,

What about DNS Rebinding protection?

Also please add OISD Extra to your blocklists please.

A double please. Surely, we are including it? Must get confirmation from Stefan (OISD), first.

Energized gone for good?

So it seems energized.pro is just showing a blank page now the domain files are like what the host files were doing. Should we find replacements to slot into where the energized lists were? Additionally, replace the newly added energized adult lite😭

More blocklist tags, easier configure ui

A user says,

  • advanced view is overwhelming the lists should be categorized.

  • using the simple view i see no ads section for lists, it's conclusive with privacy. i think should be split into ads and trackers or be one lists "ads&privacy"

  • some sort of description should be included for the user why or why not he wants this list and what makes it different. seeing a list of : khost, yhost, lightswitch, oisd.... user has no idea on what basis to choose

The configure webpage is due an overhaul...

  1. Packs will address lists that can be tagged with multiple labels (ads, privacy, mobile-ads etc): #8
  2. Also, description and homepage may also be important additions to the metadata, for v2 #2

Blocklist validation broken

hosts files were marked domains while one another file was neither hosts nor domains. The validation run succeeded anyway when it shouldn't have.

Exhibit A: Every blocklist file in the serverless-dns/blocklists/runs/3372744250 should have been a validation failure, but isn't:

177 : Downloading From : https://ngosang.github.io/trackerslist/trackers_all.txt
Download Location : ./blocklistfiles/privacy/176.txt
178 : Downloading From : https://blocklistproject.github.io/Lists/torrent.txt
Download Location : ./blocklistfiles/parentalcontrol/177.txt
179 : Downloading From : https://blocklistproject.github.io/Lists/drugs.txt
Download Location : ./blocklistfiles/parentalcontrol/178.txt
180 : Downloading From : https://blocklistproject.github.io/Lists/ransomware.txt
Download Location : ./blocklistfiles/security/179.txt
181 : Downloading From : https://blocklistproject.github.io/Lists/tiktok.txt
Download Location : ./blocklistfiles/parentalcontrol/social-networks/180.txt
182 : Downloading From : https://blocklistproject.github.io/Lists/malware.txt
Download Location : ./blocklistfiles/security/181.txt
183 : Downloading From : https://blocklistproject.github.io/Lists/phishing.txt
Download Location : ./blocklistfiles/security/182.txt
184 : Downloading From : https://blocklistproject.github.io/Lists/ads.txt
Download Location : ./blocklistfiles/privacy/183.txt
185 : Downloading From : https://blocklistproject.github.io/Lists/crypto.txt
Download Location : ./blocklistfiles/security/cryptojacking/184.txt
186 : Downloading From : https://blocklistproject.github.io/Lists/porn.txt
Download Location : ./blocklistfiles/parentalcontrol/porn/185.txt
187 : Downloading From : https://blocklistproject.github.io/Lists/whatsapp.txt
Download Location : ./blocklistfiles/parentalcontrol/social-networks/186.txt
188 : Downloading From : https://blocklistproject.github.io/Lists/vaping.txt
Download Location : ./blocklistfiles/parentalcontrol/187.txt
189 : Downloading From : https://blocklistproject.github.io/Lists/piracy.txt
Download Location : ./blocklistfiles/parentalcontrol/piracy/188.txt
190 : Downloading From : https://blocklistproject.github.io/Lists/facebook.txt
Download Location : ./blocklistfiles/parentalcontrol/social-networks/189.txt
191 : Downloading From : https://blocklistproject.github.io/Lists/gambling.txt
Download Location : ./blocklistfiles/parentalcontrol/gambling/190.txt
192 : Downloading From : https://blocklistproject.github.io/Lists/basic.txt
Download Location : ./blocklistfiles/privacy/191.txt
193 : Downloading From : https://blocklistproject.github.io/Lists/everything.txt
Download Location : ./blocklistfiles/privacy/192.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.