everywall / ladder Goto Github PK
View Code? Open in Web Editor NEWSelfhosted alternative to 12ft.io. and 1ft.io bypass paywalls with a proxy ladder and remove CORS headers from any URL
License: GNU General Public License v3.0
Selfhosted alternative to 12ft.io. and 1ft.io bypass paywalls with a proxy ladder and remove CORS headers from any URL
License: GNU General Public License v3.0
Try visiting https://medium.com/@asad_5112/top-10-kubernetes-ci-cd-tools-ede05a55ffd0 with ladder. You'll be greeted by "You have been blocked" page like this:
> You can use mine If you want (mind the basic auth): ladder.nerdvpn.de
Thank you ! i don't seem to be able to even load the webpage on my phone. Is that the "basic auth" you're talking about ?
We should probably discuss this elsewhere as to not pollute this issue.
Originally posted by @Cwpute in #79 (comment)
I've installed ladder with docker compose as mentioned in the readme, just changed the port for my nginx reverse proxy. Worked fine, but every url I try to use, the respone is the same:
Get "https://www.exampledomain.tld": http: no Host in request URL
Thank you for your help in advance
Hi!!
First of all, thanks for this project!
I'm getting 403 error from a website, which I can connect to using CURL. I added the user-agent and all the required headers to my custom ruleset.yaml
and it's still failing.
My rule:
- domain: mydomain.com
headers:
user-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:121.0) Gecko/20100101 Firefox/121.0
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
accept-language: en-US;q=0.7,en;q=0.3
My curl command:
curl https://mywebsite.com/xyz --compressed -H 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:121.0) Gecko/20100101 Firefox/121.0' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8' -H 'Accept-Language: en-US;q=0.7,en;q=0.3'
If I remove --compressed
from Curl then I get the 403 unauthorized. It might be like an anti-bot measure? Does it make sense? If yes, how can I enable "compressed" in Ladder?
Thanks!!
The title says it all, really. Would that be possible?
We used corsproxy.io for a while to import RSS feeds into Grafana; but that isn't working anymore unfortunately... So, we decided to try hosting a solution ourselves. And thus, I found this project.
I even got it to work with our firewall's CA, but we now get no CORS back at all, which Grafana is mad about o.o
Any idea?
Thanks and kind regards!
When I try something like:
I get:
Sorry, you have been blocked
You are unable to access medium.com
Why have I been blocked?
This website is using a security service to protect itself from online attacks. The action you just performed triggered the security solution. There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.
What can I do to resolve this?
You can email the site owner to let them know you were blocked. Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.
From Cloudflare.
I'm using the following docker-compose config:
version: '3'
services:
ladder:
image: ghcr.io/kubero-dev/ladder:latest
container_name: ladder
build: .
#restart: always
#command: sh -c ./ladder
environment:
- PORT=8080
#- PREFORK=true
#- X_FORWARDED_FOR=66.249.66.1
#- USER_AGENT=Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
#- USERPASS=foo:bar
#- LOG_URLS=true
#- GODEBUG=netdns=go
ports:
- "8080:8080"
deploy:
resources:
limits:
cpus: "0.50"
memory: 512M
reservations:
cpus: "0.25"
memory: 128Mroot@ladder01:/opt# docker-compose up -d
Do you know of a workaround for Cloudflare?
It's common for Golang packages to list URL in the 'package' line, for example see here.
This allows Golang servers to pick up and list the package under this URL name.
Using an article from New Yorker, I am noticing in the terminal many errors reported in the form:
2023/11/06 16:08:36 GET /verso/static/assets/fonts/TNYAdobeCaslonPro-SemiBoldItalic.woff2
2023/11/06 16:08:36 ERROR: Get "verso/static/assets/fonts/TNYAdobeCaslonPro-SemiBoldItalic.woff2": unsupported protocol scheme ""
I haven't looked at the code yet and wondering what the error means.
Hi, I am running ladder behind a traefik reverse proxy with an Authentik ForwardAuth middleware.
When I open the app to the base url, I get the following error: Request Header Fields Too Large
None of my other forwardauth applications experience this. Is it something in my configuration possibly?
Chrome console returns this: Failed to load resource: the server responded with a status of 431 ()
I am running ladder in a container
The docker container (created via docker-compose) ignores at least the following env variables:
LOG_URLS=false
PREFORK=true
docker log:
docker-compose up
[+] Building 0.0s (0/0)
[+] Running 1/0
✔ Container ladder Created 0.0s
Attaching to ladder
ladder | 2023/11/08 11:08:36 Loading rules
ladder | 2023/11/08 11:08:36 Loaded rules for 3 Domains
ladder |
ladder | ┌───────────────────────────────────────────────────┐
ladder | │ Fiber v2.50.0 │
ladder | │ http://127.0.0.1:8080 │
ladder | │ (bound on host 0.0.0.0 and port 8080) │
ladder | │ │
ladder | │ Handlers ............ 14 Processes ........... 1 │
ladder | │ Prefork ....... Disabled PID ................. 7 │
ladder | └───────────────────────────────────────────────────┘
ladder |
ladder | 2023/11/08 11:08:44 GET /
ladder | 2023/11/08 11:08:48 GET /https://test.de
Are there public instances of Ladder available in the wild ? and if so, could we have a list of them available somewhere ?
I'm interested in using Ladder but i don't have the means to run it myself :/
The Tailwind Play CDN isn't recommended for production since it includes the whole Tailwind class library (large bundle). Proposal is to use the Tailwind CLI instead so that there is a css build step resulting in minified css of only the classes that are used (small bundle).
I think the list of rulset modifications will grow very large over time. Just check out what happened to magnolia1234's bypass-paywalls-firefox-clean ruleset. There are over 1000 individual domains.
A single file would be difficult to maintain.
I propose the ability to fetch rulesets from an open directory, so that they could be organized better.
For example:
rulesets
├── de
│ └── tagesspiegel-de.yaml
├── it
└── us
└── nytimes.yaml
Ladder should never return resources from reserved IP addresses. This is a security risk.
I have deployed Ladder behind a reverse proxy (in my case I used Caddy), both inside docker. I have other services deployed on the same instance that is accessed via Caddy. The problem is that Ladder is able to bypass the reverse proxy and directly make requests on the local machine (e.g. https://ladder.example.com/http://192.168.0.1
, where Ladder is hosted behind reverse-proxy at ladder.example.com
and a different service, which is normally accessed from the reverse proxy, is hosted at 192.168.0.1
). This should never be allowed as the internal connection does not use SSL/TLS and bypasses the reverse proxy where the certs are deployed. Of course there may be certain edge cases where this function is needed, in which case it should be explicitly allowed from a ruleset.
Trying to run the service on Ubuntu 21.10, using Docker version 20.10.12, build e91ed57.
I get this error:
☁ ~/dev curl https://raw.githubusercontent.com/kubero-dev/ladder/main/docker-compose.yaml --output docker-compose.yaml
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 707 100 707 0 0 3399 0 --:--:-- --:--:-- --:--:-- 3432
☁ ~/dev docker-compose up
WARNING: The following deploy sub-keys are not supported and have been ignored: resources.reservations.cpus
Creating network "dev_default" with the default driver
Building ladder
[+] Building 0.1s (2/2) FINISHED
=> [internal] load build definition from Dockerfile 0.1s
=> => transferring dockerfile: 2B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
failed to solve with frontend dockerfile.v0: failed to read dockerfile: open /var/lib/docker/tmp/buildkit-mount669939737/Dockerfile: no such file or directory
ERROR: Service 'ladder' failed to build
As per everywall/ladder-rules#3, we'll need to implement some robust testing.
The main challenge is to test after client-side JS rendering happens, which will probably mean we'll need a headless browser.
A test could look like this: https://github.com/everywall/ladder/blob/ladder_tests/tests/tests/www-wellandtribune-ca.spec.ts
And the results like this.
Perhaps we'll need some codegen in order to go from ruleset to test?
Do you have a good ruleset you want to share?
Leave a link here. Ill add it to the README.md
I am looking for a person hosting the default ruleset (Here on Github under GPLv3 or MIT) . Is anyone interested in maintaining it?
Hey y'all, I've designed a pretty simple tailwindCSS theme that I think might fit better as a default theme for ladder compared to the current one. It's fairly simple like the original so adding any buttons or text boxes are just as easy as before, but now it uses all tailwind classes.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>ladder</title>
<script src="https://cdn.tailwindcss.com"></script>
</head>
<body class="antialiased text-slate-500 dark:text-slate-400 bg-white dark:bg-slate-900">
<div class="grid grid-cols-1 gap-4 max-w-3xl mx-auto pt-10">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100%" height="250" viewBox="0 0 512 512">
<path fill="#7AA7D1" d="M262.074 485.246C254.809 485.265 247.407 485.534 240.165 484.99L226.178 483.306C119.737 468.826 34.1354 383.43 25.3176 274.714C24.3655 262.975 23.5876 253.161 24.3295 241.148C31.4284 126.212 123.985 31.919 238.633 24.1259L250.022 23.8366C258.02 23.8001 266.212 23.491 274.183 24.1306C320.519 27.8489 366.348 45.9743 402.232 75.4548L416.996 88.2751C444.342 114.373 464.257 146.819 475.911 182.72L480.415 197.211C486.174 219.054 488.67 242.773 487.436 265.259L486.416 275.75C478.783 352.041 436.405 418.1 369.36 455.394L355.463 462.875C326.247 477.031 294.517 484.631 262.074 485.246ZM253.547 72.4475C161.905 73.0454 83.5901 144.289 73.0095 234.5C69.9101 260.926 74.7763 292.594 83.9003 317.156C104.53 372.691 153.9 416.616 211.281 430.903C226.663 434.733 242.223 436.307 258.044 436.227C353.394 435.507 430.296 361.835 438.445 267.978C439.794 252.442 438.591 236.759 435.59 221.5C419.554 139.955 353.067 79.4187 269.856 72.7052C264.479 72.2714 258.981 72.423 253.586 72.4127L253.547 72.4475Z"/>
<path fill="#7AA7D1" d="M153.196 310.121L133.153 285.021C140.83 283.798 148.978 285.092 156.741 284.353L156.637 277.725L124.406 278.002C123.298 277.325 122.856 276.187 122.058 275.193L116.089 267.862C110.469 260.975 103.827 254.843 98.6026 247.669C103.918 246.839 105.248 246.537 111.14 246.523L129.093 246.327C130.152 238.785 128.62 240.843 122.138 240.758C111.929 240.623 110.659 242.014 105.004 234.661L97.9953 225.654C94.8172 221.729 91.2219 218.104 88.2631 214.005C84.1351 208.286 90.1658 209.504 94.601 209.489L236.752 209.545C257.761 209.569 268.184 211.009 285.766 221.678L285.835 206.051C285.837 197.542 286.201 189.141 284.549 180.748C280.22 158.757 260.541 143.877 240.897 135.739C238.055 134.561 232.259 133.654 235.575 129.851C244.784 119.288 263.680 111.990 277.085 111.105C288.697 109.828 301.096 113.537 311.75 117.703C360.649 136.827 393.225 183.042 398.561 234.866C402.204 270.253 391.733 308.356 367.999 335.1C332.832 374.727 269.877 384.883 223.294 360.397C206.156 351.388 183.673 333.299 175.08 316.6C173.511 313.551 174.005 313.555 170.443 313.52L160.641 313.449C158.957 313.435 156.263 314.031 155.122 312.487L153.196 310.121Z"/>
</svg>
<header>
<h1 class="text-center text-3xl sm:text-4xl font-extrabold text-slate-900 tracking-tight dark:text-slate-200">ladddddddder</h1>
</header>
<form id="inputForm" method="get" class="mx-4">
<div>
<input type="text" id="inputField" placeholder="Proxy Search" name="inputField" class="w-full text-sm leading-6 text-slate-400 rounded-md ring-1 ring-slate-900/10 shadow-sm py-1.5 pl-2 pr-3 hover:ring-slate-300 dark:bg-slate-800 dark:highlight-white/5 dark:hover:bg-slate-700" required>
</div>
</form>
<footer class="mt-10 text-center text-slate-600 dark:text-slate-400">
<p>
Code Licensed Under GPL v3.0 |
<a href="https://github.com/kubero-dev/ladder" class="hover:text-blue-500 hover:underline underline-offset-2 transition-colors duration-300">View Source</a> |
<a href="https://www.kubero.dev/" class="hover:text-blue-500 hover:underline underline-offset-2 transition-colors duration-300">kubero.dev</a>
</p>
</footer>
</div>
<script src="https://code.jquery.com/jquery-3.6.0.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/materialize/1.0.0/js/materialize.min.js"></script>
<script>
document.addEventListener('DOMContentLoaded', function () {
M.AutoInit();
});
document.getElementById('inputForm').addEventListener('submit', function (e) {
e.preventDefault();
let url = document.getElementById('inputField').value;
if (url.indexOf('http') === -1) {
url = 'https://' + url;
}
window.location.href = '/' + url;
return false;
});
</script>
<style>
@media (prefers-color-scheme: light) {
body {
background-color: #ffffff;
color: #333333;
}
}
@media (prefers-color-scheme: dark) {
body {
background-color: #1a202c;
color: #ffffff;
}
}
</style>
</body>
</html>
Eventually, we’ll want to test on realistic data for benchmarking and finding edge cases in the code. I’m thinking we could use a warc player and the common crawl C4 database (realnewslike) which is about 34GB.
Hello, I've just installed ladder on my synology, ladder launches correctly https://mysyno.dsmynas.com.
But as soon as I add a link to a journal I get a timeout.
Get "https://www.lavoixdunord.fr/": dial tcp: lookup www.lavoixdunord.fr on 127.0.0.11:53: read udp 127.0.0.1:59926->127.0.0.11:53: i/o timeout
It does not work with the column service called "longblack".
https://1ft.io/proxy?q=https%3A%2F%2Fwww.longblack.co%2Fnote%2F875
As an example, here is what the AdGuardHome macOS menu bar looks like:
Turning the app "on" would:
ladder
ladder
Turning the app "off" would:
ladder
Thoughs? 😄
[Oh just noticed you have Discussions - feel free to move this there, sorry.]
If you use Ladder on a mobile device, it's a bit fiddly to paste your desired URL into the field. At least, on my device you have to work out how to invoke the keyboard after pasting the URL into the form so that you can submit it.
Might it be an idea to show a button on the screen (when in a viewport less than a certain size) that pastes what's in the clipboard into the field, and submits the form?
There may of course be other approaches. BTW I note that the bookmarklet trick doesn't seem to work on mobile as far as I can tell.
Hey everyone
I moved the ladder to its own Github organization, to avoid conflicts with my other side project Kubero.
Pulling docker images from the old organization is still possible. But you better use the new organization:
docker pull ghcr.io/everywall/ladder:latest
I also opened the Discussions to have a more convenient method to address topics an general discussions.
It would be great if there was a browser extension that could redirect sites to your locally running ladder instance!
I tried to deploy on my own VPS and changed the port number to 8211 (and I have already opened the firewall). But after the deployment was successful (the Docker logs show normal), I cannot open the work page by entering http://vpsaddress:8211 in the browser's address bar. When deploying with Docker, is it only possible to use port 8080?
Thx in advance
I currently self-host ladder on my powerful dedicated server.
When I disable PREFORK, only 1 worker is spawned.
When I enable PREFORK, 32 workers (due to having 32 cpu threads) are spawned and the memory usage also sky-rockets.
Please implement a way to precisely set the number of threads ladder uses.
Cloudflare is blocking again. I tried to change the user agent, then I get the paywal. With the default settings I get blocked.
ladder: 0.0.17
Test-URL: tagesspiegel.de
Screenshot:
Could someone explain how the problem can be solved with the ruleset? Then I can try to find a solution myself.
Using some “reading mode” algorithm (such as DOM-distiller) I think the API could return a json blob representing just the source URL, title, author, date and text content of the article, without the extra HTML.
This would make it feasible for web scraping tasks, for non-JS heavy sites.
In addition, this would open up the possibility of an endpoint that returned the cleaned content of the site, much like the old outline.org.
To reproduce:
You will notice that links on the page to techcrunch are malformed like below with '/' placed before the scheme and after the top level domain.
<a href="/https://techcrunch.com//author/aria-alamalhodaei/">
The environment var LOG_URLS does not apply anymore:
cat .env
LOG_URLS=false
docker logs ladder
2024/05/25 09:17:27 GET /https://www.tagesschau.de/inland/innenpolitik/hausarztpraxen-versorgung-gesetz-100.html
I want the users of my ladder instance to be able to fully privately use the service, so I want ladder ot not log urls.
Thanks :)
I consistently encounter the following error message while attempting to access content, and I suspect it may be due to a network firewall. I am utilizing a binary program for this purpose.
2023/11/13 16:54:28 ERROR: Get "https://medium.com/mapresearch/does-the-fourth-industrial-revolution-have-a-new-path-818723d06030": dial tcp 104.244.46.165:443: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
I'm not going fork and PR just for a little typo, but in your README, it should be consider
instead of concider
I wrote a Kubernetes deployment if you want to add it!
https://github.com/RamboRogers/kubernetes/blob/master/ladder.yml
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.