mmotti / pihole-regex Goto Github PK
View Code? Open in Web Editor NEWCustom regex filter list for use with Pi-hole.
Custom regex filter list for use with Pi-hole.
root@raspberrypi:~# curl -sSl https://raw.githubusercontent.com/mmotti/pihole-regex/master/install.py | sudo python3 File "<stdin>", line 65 print(f'{path_pihole} was not found') ^ SyntaxError: invalid
is what i get when i try to use your Script (btw all of them ^^ Could you tell me why they are not working? :3 Thank you and Marry Christmess. :3
I'm not very familiar with Regex yet. :/
Can anyone tell me how to create regex for the following pages?
Adware https:// sites
lp.c1vk.online
lp.searchdimension.com/12/?v=399#sdapp93
lp.searchmulty.com/9/?p=2733&ver=399#!
lp.powerapp.download/ready/?p=91142&v=400#spalp2020ejgohilkbhndmaacckgpghjbhnpgpamd
I would like to include use of this regex list in my blocklist generation project, but as there currently isn't any specified license, then it isn't clear if I can.
Matched by: ^(.+[_.-])?ad([sxv]?[0-9]*|system)[_.-]
Suggested fix: ^(.+[_.-])?ad([sxv]?[0-9]*|system)[_.-]*\..*\.
(this also fixes the ad.nl issue.)
Credits go to the CEO of Metagaming B.V. for finding this FP and contacting me about it.
Found that "^traff(ic)?[.-]" blocked some feeds (e.g traffic.libsyn.com and traffic.omny.fm) in my podcast app.
Didn't realize that pi-hole was the culprit until I noticed those podcasts suddenly updating when connected to a different network
Sure, it's easy enough to whitelist but maybe "traffic" causes to many false positives?
What is the rationale of blocking it?
I'd accept if this issue is closed without changes, just thought I'd bring this to your attention.
I see you're having many lines like
^((.*)(1stok\.com))$
Why don't you write this filter in the much simpler form?
1stok\.com$
Similarly,
^((ad\.)(.*))$
could just be
^ad\.
to match anything that starts in ad.
Although the regex compiler in FTLDNS will take care of simplifying filters as much as possible, it can only be beneficial to already provide simple filters and don't rely on the simplifier in cases where this is not necessary. It would also make your list much more readable.
Hello!
Looks like DB structure has changed.
Anyway, thank you for this script and the hard work you have done. :)
root@pihole:~# curl -sSl https://raw.githubusercontent.com/mmotti/pihole-regex/master/install.py | sudo python3
[i] Root user detected
[i] Pi-hole path exists
[i] DB detected
[i] Fetching: https://raw.githubusercontent.com/mmotti/pihole-regex/master/regex.list
[i] 16 regexps collected from https://raw.githubusercontent.com/mmotti/pihole-regex/master/regex.list
[i] Connecting to /etc/pihole/gravity.db
[i] Adding / updating regexps in the DB
Traceback (most recent call last):
File "<stdin>", line 108, in <module>
sqlite3.OperationalError: near "ON": syntax error
root@pihole:~#
when i run the command from the crontab I get this error. The cron results in an error.
pi@raspberrypi4:~/pijarr $ /usr/bin/curl -sSl https://raw.githubusercontent.com/mmotti/pihole-regex/master/install.py | /usr/bin/python3
[i] Checking for "pihole" docker container
[i] Running in physical installation mode
[i] Pi-hole path exists
[e] Write access is not available for /etc/pihole. Please run as root or other privileged user
pi@raspberrypi4:~/pijarr $ sudo /usr/bin/curl -sSl https://raw.githubusercontent.com/mmotti/pihole-regex/master/install.py | /usr/bin/python3
[i] Checking for "pihole" docker container
[i] Running in physical installation mode
[i] Pi-hole path exists
[e] Write access is not available for /etc/pihole. Please run as root or other privileged user
I was using pihole for a long time and have switched to adguard. My favorite feature of pihole was this script. Is there a way to adapt it for use with adguard?
Thank you!
Hello,
Please consider white listing
support.iam.ad.azure.com
and
main.iam.ad.ext.azure.com
In the URLs above, IAM means Identity and Access Management. AD is active directory, not an advertisement. These two URLS are needed to connect to Windows Azure and see the complete active directory blade.
Thx!
$ host facebook.co.uk 8.8.8.8
Using domain server:
Name: 8.8.8.8
Address: 8.8.8.8#53
Aliases:
facebook.co.uk has address 216.239.34.21
facebook.co.uk has address 216.239.32.21
facebook.co.uk has address 216.239.38.21
facebook.co.uk has address 216.239.36.21
Facebook has DNS for more countries than just the UK.
Probably:
^(.+[_.-])?(facebook|fb(cdn|sbx)?|tfbnw)\.(co\.)?[^.]+$
When I run the command specified:
curl -sSl https://raw.githubusercontent.com/mmotti/pihole-regex/master/install.py | sudo python3
This is the result:
File "<stdin>", line 100 print(f'[e] {path_pihole} was not found') SyntaxError: invalid syntax
I'm running Pi-Hole 5.2.1 in a docker container.
The command doesn't work in Ubuntu 20.10.
tiktok not working
In install.py in line 39 is the follwing written: path_pihole = r'/etc/pihole'.
The "r" causes an error and breaks the installation.
I removed the "r" and run the script locally without any error.
It's mostly in reply of: 8eab5f8
Exact match found in regex blacklist
^ad([sxv]?[0-9]*|system)[_.-]([^.[:space:]]+\.){1,}|^.+[_.-]ad([sxv]?[0-9]*|system)[_.-]
I workaround this by whitelisting www.ad.nl, but I thought I'd mention it. Probably can't be done else I guess, so just keep or add the exception to the rule, right?
With kind regards
After trying to manually download Windows updates from the Windows Update Catalog and getting page not found errors, i found out it's blocked by your list...at least i think it is because Pi-hole's log status says "Blocked (regex blacklist, CNAME)" and yours is the only regex list i use. The whole log domain entry is "catalog.s.download.windowsupdate.com (blocked scdn28557.wpc.ad629.nucdn.net)".
Anyway, the domain that needs to be whitelisted is
scdn28557.wpc.ad629.nucdn.net
This regex:
^track(ers?|ing)?[0-9]*[_.-]
breaks a lot of bittorrent trackers.
"tracker.hostname.com"
Need to look at really optimising this list. Response times consistently ~80ms+ compared to ~10-20ms
Tidal uses a range of subdomains to provide their content. The first regex in this project prevents Tidal applications from working (domain sp-ad-fa.audio.tidal.com
). Adding this regex to whitelist solves the issue: (\.|^)audio\.tidal\.com$
The offending regex:
Line 18 in 22c7d62
^(.+[.-])?ad[sxv]?[0-9]*[.-] - gives a false positive on dutch newspaper website ad.nl
The ^ad([sxv]?[0-9]|system)_.-{1,}|[_.-]ad([sxv]?[0-9]|system)[_.-] regex blocks (for me it was a handful episodes of Lost, not all of them) shows on Hulu. Not sure why Lost, the Finals worked fine, not 100% sure how to fully DEBUG, but I was able to get Lost working again by disabling the above regex after some googling.
I did try playing around with the regex to filter out Hulu requests, but ended up just disabling it.
Hey,
The host url ad-client.mediafe-prd.s.joyn.de
Is necassary for watching joyn on smart tvs.
Please add this domain on your whitelist.
Issue: After running curl -sSl https://raw.githubusercontent.com/mmotti/pihole-regex/master/install.py | sudo python3
I got an error:
File "<stdin>", line 65
print(f'{path_pihole} was not found')
^
SyntaxError: invalid syntax
Pihole Version:
Pi-hole version is v4.3.2 (Latest: v4.3.2)
AdminLTE version is v4.3.3 (Latest: v4.3.3)
FTL version is v4.3.1 (Latest: v4.3.1)
I saw a link from reddit to here.. I don't know if pihole has a white list or not, but regex.list looks like it will block
tracking-protection.cdn.mozilla.net
tracker.debian.org
that most probably don't want blocked.
Looks like the error is tracking back on this...
Line 91 in 22c7d62
Is there supposed to be a 'r' before the '/etc/pihole'
/etc/pihole/ is the location when not running docker/etc.
In starting to use and review this, it would be a great enhancement to allow for command line updates to the script to allow for paths to feeds as well as a whitelist feed.
FTLDNS reloads the regex filters (and all other files) on receipt of SIGHUP
. You may want to replace the hard restart command you suggest in the README as handling a reload through SIGHUP
will always be much faster.
Try
killall -SIGHUP pihole-FTL
Blocklist Whitelist
Propose the following change to capture mads (mobile ads). ie: mads.amazon.com
^(.+[-_.])??[m]?ad[sxv]?[0-9]*[-_.]
Pi-hole DB structure has changed a little since I last used it so need to accommodate for this
The blacklist regex entry ^(www[0-9]*\.)?xn--
will falsely block all IDN domains as their internal Punycode representation in the DNS starts with the xn--
LDH label (see RFC 5890).
I know IDN domains are used for homograph attacks, but it seems quite harsh to blacklist all IDNs as many of them are legit and registered in good faith.
Besides, this would result in all domains registered in languages not using a latin alphabet to be blacklisted by the mentioned regex.
The regex ^stat(s|istics)?[0-9]*[_.-]
blocks stats.stackexchange.com
.
This could be added to the whitelist.
Please consider creating an AdGuard Home Syntax version for regex.list.
(see https://github.com/AdguardTeam/AdGuardHome/wiki/Hosts-Blocklists#regular-expressions)
Essentially it needs each regex to start and end with a / character.
the list says: ^adtrack(er|ing)?[0-9]*[_.-]
maybe this could be changed to ^(ad)?track(er|ing)?[0-9]*[_.-]
which would also cover tracking.somedomain.com
A news website in the Netherlands is blocked by the first regex. It is ad.nl / www.ad.nl.
^(.+[_.-])?ad[sxv]?[0-9]*[_.-]
I tried white listing, but then it loads the website only partially. Is there anything you can do to make the regex so, that this website is not blocked?
The script will obviously not work in this case, as the /etc/pihole. How would one be able to use this script in this case?
^stat(s|istics)?[0-9]*[_.-]
blocks https://stats.gallery/ that's why I think it should be added in whitelist
Can you add stats.foldingathome.org to the whitelist?
From https://public.dns.iij.jp/. (DoH/DoT service)
Link to their privacy policy blocked i.e https://www.iij.ad.jp/privacy/
^(.+[-_.])??m?ad[sxv]?[0-9]*[-_.]
blocks servers, especially video-edge-c2a2dc.mad01.abs.hls.ttvnw.net
.
Impossible to whitelist as there is no wildcard whitelisting currently and there is a lot of various strings before mad01
.
The domain video-edge-6ea608.mrs01.abs.hls.ttvnw.net
, used to deliver TwitchTV livestreams was blocked, which caused the livestream I was watching to not load. Whitelisting it fixed the issue.
when pasting this into the shell, Pihole returns an error.
SHELL=/bin/bash
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
30 2 * * 1 /usr/bin/curl -sSl https://raw.githubusercontent.com/mmotti/pihole-regex/master/install.py | /usr/bin/python3
error is:
-bash: 30: command not found
presume user error or is this an error in the script ? on the instructions located at
The regex ^ad([sxv]?[0-9]*|system)[_.-]([^.[:space:]]+\.){1,}|[_.-]ad([sxv]?[0-9]*|system)[_.-]
blocks Microsoft Azure Active Directory management portal (portal.azure.com), because it calls (valid!) APIs located at support.iam.ad.azure.com
etc.
The option ^telemetry[-.] doesn't block sites like:
reports.wes.df.telemetry.microsoft.com
services.wes.df.telemetry.microsoft.com
sqm.df.telemetry.microsoft.com
sqm.telemetry.microsoft.com
sqm.telemetry.microsoft.com.nsatc.net
telecommand.telemetry.microsoft.com
telecommand.telemetry.microsoft.com.nsatc.net
watson.telemetry.microsoft.com
watson.telemetry.microsoft.com.nsatc.net
etc.
Beter is to use
(^|.)telemetry[-.]
which also covers the examples above.
^([^.]+.)?(facebook|fb(cdn|sbx)?|tfbnw).[^.]+$
This doesn't capture sites like 0.channel15.facebook.com and 1000068744-facebook.com and there are more.
Solution: ^(.+[_.-])?(facebook|fb(cdn|sbx)?|tfbnw).[^.]+$
The same kind of problem is for the both user suggested regex
Replace ^([^.]+.)? by ^(.+[_.-])?
^(.+[_.-])?(twitter|twimg|cms-twdigitalassets)\.(co\.)?[^.]+$
Handles
twitter.co.uk
twitter.co.in
cdn.cms-twdigitalassets.com
twitter.com
pbs.twimg.com
anything.twitter.com
any.thing.twitter.com
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.