GithubHelp home page GithubHelp logo

mmotti / pihole-regex Goto Github PK

View Code? Open in Web Editor NEW
1.3K 77.0 176.0 728 KB

Custom regex filter list for use with Pi-hole.

Python 100.00%
pi-hole pi-hole-blocklists pi-hole-ftl hostsfile blocking regex filter adblocking

pihole-regex's People

Contributors

armujahid avatar dwo avatar kyle95wm avatar mbrydak avatar mmotti avatar wally3k avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pihole-regex's Issues

Python bug?

root@raspberrypi:~# curl -sSl https://raw.githubusercontent.com/mmotti/pihole-regex/master/install.py | sudo python3 File "<stdin>", line 65 print(f'{path_pihole} was not found') ^ SyntaxError: invalid

is what i get when i try to use your Script (btw all of them ^^ Could you tell me why they are not working? :3 Thank you and Marry Christmess. :3

i need help

I'm not very familiar with Regex yet. :/

Can anyone tell me how to create regex for the following pages?

Adware https:// sites

lp.c1vk.online

lp.searchdimension.com/12/?v=399#sdapp93

lp.searchmulty.com/9/?p=2733&ver=399#!

lp.powerapp.download/ready/?p=91142&v=400#spalp2020ejgohilkbhndmaacckgpghjbhnpgpamd

Lack of license

I would like to include use of this regex list in my blocklist generation project, but as there currently isn't any specified license, then it isn't clear if I can.

False positive: ad.st (Redirect to angeldu.st)

Matched by: ^(.+[_.-])?ad([sxv]?[0-9]*|system)[_.-]
Suggested fix: ^(.+[_.-])?ad([sxv]?[0-9]*|system)[_.-]*\..*\. (this also fixes the ad.nl issue.)

Credits go to the CEO of Metagaming B.V. for finding this FP and contacting me about it.

Maybe "traff(ic)" is a bit broad?

Found that "^traff(ic)?[.-]" blocked some feeds (e.g traffic.libsyn.com and traffic.omny.fm) in my podcast app.
Didn't realize that pi-hole was the culprit until I noticed those podcasts suddenly updating when connected to a different network

Sure, it's easy enough to whitelist but maybe "traffic" causes to many false positives?
What is the rationale of blocking it?

I'd accept if this issue is closed without changes, just thought I'd bring this to your attention.

Regex filters are unneessarily complicated

I see you're having many lines like

^((.*)(1stok\.com))$

Why don't you write this filter in the much simpler form?

1stok\.com$

Similarly,

^((ad\.)(.*))$

could just be

^ad\.

to match anything that starts in ad.

Although the regex compiler in FTLDNS will take care of simplifying filters as much as possible, it can only be beneficial to already provide simple filters and don't rely on the simplifier in cases where this is not necessary. It would also make your list much more readable.

sqlite3.OperationalError

Hello!

Looks like DB structure has changed.

Anyway, thank you for this script and the hard work you have done. :)

root@pihole:~# curl -sSl https://raw.githubusercontent.com/mmotti/pihole-regex/master/install.py | sudo python3
[i] Root user detected
[i] Pi-hole path exists
[i] DB detected
[i] Fetching: https://raw.githubusercontent.com/mmotti/pihole-regex/master/regex.list
[i] 16 regexps collected from https://raw.githubusercontent.com/mmotti/pihole-regex/master/regex.list
[i] Connecting to /etc/pihole/gravity.db
[i] Adding / updating regexps in the DB
Traceback (most recent call last):
  File "<stdin>", line 108, in <module>
sqlite3.OperationalError: near "ON": syntax error
root@pihole:~#

Write access is not available for /etc/pihole. Please run as root or other privileged user

when i run the command from the crontab I get this error. The cron results in an error.

pi@raspberrypi4:~/pijarr $ /usr/bin/curl -sSl https://raw.githubusercontent.com/mmotti/pihole-regex/master/install.py | /usr/bin/python3
[i] Checking for "pihole" docker container
[i] Running in physical installation mode
[i] Pi-hole path exists
[e] Write access is not available for /etc/pihole. Please run as root or other privileged user

pi@raspberrypi4:~/pijarr $ sudo /usr/bin/curl -sSl https://raw.githubusercontent.com/mmotti/pihole-regex/master/install.py | /usr/bin/python3
[i] Checking for "pihole" docker container
[i] Running in physical installation mode
[i] Pi-hole path exists
[e] Write access is not available for /etc/pihole. Please run as root or other privileged user

How to use this with adguard?

I was using pihole for a long time and have switched to adguard. My favorite feature of pihole was this script. Is there a way to adapt it for use with adguard?

Thank you!

Some Windows Azure cloud URLs are blocked

Hello,
Please consider white listing

support.iam.ad.azure.com
and
main.iam.ad.ext.azure.com

In the URLs above, IAM means Identity and Access Management. AD is active directory, not an advertisement. These two URLS are needed to connect to Windows Azure and see the complete active directory blade.

Thx!

Missing some FB non-US FQDN

$ host facebook.co.uk 8.8.8.8
Using domain server:
Name: 8.8.8.8
Address: 8.8.8.8#53
Aliases: 

facebook.co.uk has address 216.239.34.21
facebook.co.uk has address 216.239.32.21
facebook.co.uk has address 216.239.38.21
facebook.co.uk has address 216.239.36.21

Facebook has DNS for more countries than just the UK.

Probably:

^(.+[_.-])?(facebook|fb(cdn|sbx)?|tfbnw)\.(co\.)?[^.]+$

Doesn't work - syntax error

When I run the command specified:
curl -sSl https://raw.githubusercontent.com/mmotti/pihole-regex/master/install.py | sudo python3

This is the result:

File "<stdin>", line 100 print(f'[e] {path_pihole} was not found') SyntaxError: invalid syntax

I'm running Pi-Hole 5.2.1 in a docker container.

still false positive for www.ad.nl

It's mostly in reply of: 8eab5f8

Exact match found in regex blacklist
^ad([sxv]?[0-9]*|system)[_.-]([^.[:space:]]+\.){1,}|^.+[_.-]ad([sxv]?[0-9]*|system)[_.-]

I workaround this by whitelisting www.ad.nl, but I thought I'd mention it. Probably can't be done else I guess, so just keep or add the exception to the rule, right?

With kind regards

Regex list breaks downloads from Windows Update Catalog, add "scdn28557.wpc.ad629.nucdn.net" to whitelist

After trying to manually download Windows updates from the Windows Update Catalog and getting page not found errors, i found out it's blocked by your list...at least i think it is because Pi-hole's log status says "Blocked (regex blacklist, CNAME)" and yours is the only regex list i use. The whole log domain entry is "catalog.s.download.windowsupdate.com (blocked scdn28557.wpc.ad629.nucdn.net)".

Anyway, the domain that needs to be whitelisted is
scdn28557.wpc.ad629.nucdn.net

Detrimental performance

Need to look at really optimising this list. Response times consistently ~80ms+ compared to ~10-20ms

Breaks Tidal streaming service

Tidal uses a range of subdomains to provide their content. The first regex in this project prevents Tidal applications from working (domain sp-ad-fa.audio.tidal.com). Adding this regex to whitelist solves the issue: (\.|^)audio\.tidal\.com$

The offending regex:

^ad([sxv]?[0-9]*|system)[_.-]([^.[:space:]]+\.){1,}|[_.-]ad([sxv]?[0-9]*|system)[_.-]

False positive ad.nl

^(.+[.-])?ad[sxv]?[0-9]*[.-] - gives a false positive on dutch newspaper website ad.nl

sxv RegEx blocks shows from Hulu

The ^ad([sxv]?[0-9]|system)_.-{1,}|[_.-]ad([sxv]?[0-9]|system)[_.-] regex blocks (for me it was a handful episodes of Lost, not all of them) shows on Hulu. Not sure why Lost, the Finals worked fine, not 100% sure how to fully DEBUG, but I was able to get Lost working again by disabling the above regex after some googling.

I did try playing around with the regex to filter out Hulu requests, but ended up just disabling it.

SyntaxError: invalid syntax

Issue: After running curl -sSl https://raw.githubusercontent.com/mmotti/pihole-regex/master/install.py | sudo python3 I got an error:

  File "<stdin>", line 65
    print(f'{path_pihole} was not found')
                                       ^
SyntaxError: invalid syntax

Pihole Version:
Pi-hole version is v4.3.2 (Latest: v4.3.2)
AdminLTE version is v4.3.3 (Latest: v4.3.3)
FTL version is v4.3.1 (Latest: v4.3.1)

white-list?

I saw a link from reddit to here.. I don't know if pihole has a white list or not, but regex.list looks like it will block

tracking-protection.cdn.mozilla.net
tracker.debian.org

that most probably don't want blocked.

Restart not required

FTLDNS reloads the regex filters (and all other files) on receipt of SIGHUP. You may want to replace the hard restart command you suggest in the README as handling a reload through SIGHUP will always be much faster.

Try

killall -SIGHUP pihole-FTL

False positive on all IDN domains

The blacklist regex entry ^(www[0-9]*\.)?xn-- will falsely block all IDN domains as their internal Punycode representation in the DNS starts with the xn-- LDH label (see RFC 5890).

I know IDN domains are used for homograph attacks, but it seems quite harsh to blacklist all IDNs as many of them are legit and registered in good faith.

Besides, this would result in all domains registered in languages not using a latin alphabet to be blacklisted by the mentioned regex.

possible suggested improvement

the list says: ^adtrack(er|ing)?[0-9]*[_.-]

maybe this could be changed to ^(ad)?track(er|ing)?[0-9]*[_.-]

which would also cover tracking.somedomain.com

ad.nl is blocked

A news website in the Netherlands is blocked by the first regex. It is ad.nl / www.ad.nl.

^(.+[_.-])?ad[sxv]?[0-9]*[_.-]

I tried white listing, but then it loads the website only partially. Is there anything you can do to make the regex so, that this website is not blocked?

Blocks Twitch.tv stream servers

^(.+[-_.])??m?ad[sxv]?[0-9]*[-_.] blocks servers, especially video-edge-c2a2dc.mad01.abs.hls.ttvnw.net.
Impossible to whitelist as there is no wildcard whitelisting currently and there is a lot of various strings before mad01.

Twitch streams mistakenly blocked

The domain video-edge-6ea608.mrs01.abs.hls.ttvnw.net, used to deliver TwitchTV livestreams was blocked, which caused the livestream I was watching to not load. Whitelisting it fixed the issue.

Regex will block Azure Active Directory management portal

The regex ^ad([sxv]?[0-9]*|system)[_.-]([^.[:space:]]+\.){1,}|[_.-]ad([sxv]?[0-9]*|system)[_.-] blocks Microsoft Azure Active Directory management portal (portal.azure.com), because it calls (valid!) APIs located at support.iam.ad.azure.com etc.

^telemetry[-.] doesn't block all telemetry sites

The option ^telemetry[-.] doesn't block sites like:

reports.wes.df.telemetry.microsoft.com
services.wes.df.telemetry.microsoft.com
sqm.df.telemetry.microsoft.com
sqm.telemetry.microsoft.com
sqm.telemetry.microsoft.com.nsatc.net
telecommand.telemetry.microsoft.com
telecommand.telemetry.microsoft.com.nsatc.net
watson.telemetry.microsoft.com
watson.telemetry.microsoft.com.nsatc.net
etc.

Beter is to use
(^|.)telemetry[-.]
which also covers the examples above.

Regex doesn't capture all FB sites

^([^.]+.)?(facebook|fb(cdn|sbx)?|tfbnw).[^.]+$
This doesn't capture sites like 0.channel15.facebook.com and 1000068744-facebook.com and there are more.
Solution: ^(.+[_.-])?(facebook|fb(cdn|sbx)?|tfbnw).[^.]+$

The same kind of problem is for the both user suggested regex
Replace ^([^.]+.)? by ^(.+[_.-])?

Regex for blocking all of twitter

^(.+[_.-])?(twitter|twimg|cms-twdigitalassets)\.(co\.)?[^.]+$

Handles

twitter.co.uk
twitter.co.in
cdn.cms-twdigitalassets.com
twitter.com
pbs.twimg.com
anything.twitter.com
any.thing.twitter.com

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.