richardg867 / waybackproxy Goto Github PK
View Code? Open in Web Editor NEWHTTP proxy for tunneling requests through the Internet Archive Wayback Machine
License: GNU General Public License v3.0
HTTP proxy for tunneling requests through the Internet Archive Wayback Machine
License: GNU General Public License v3.0
I'd love to use this for browsing the NHL's website as it looked in the late 1990s and early 2000s... if config.py even worked! As you can see in the screenshot I have put in this issue, config won't even open because of a syntax error in line 1, resulting in me not really being able to change any sort of settings. What can I do in this situation?
How can I set up wayback proxy for VMs to send requests to another VM with proxy?
I'm using VirtualBox on Windows 11
The Proxy worked for about 20 seconds and then after that failed to work with "ERR_TUNNEL_CONNECTION_FAILED"
The Config.py doesn't actually do anything. You change values rebuild and then docker it and it will still try to run on the default port.
WaybackProxy is an awesome project, I love it!
The people of ProtoWeb maintain a proxy service for manually restored websites from around 1996 to early 2000's. Tested it and seems like a a nice service, but don't know much more about the project.
Can you please consider adding an option to use ProtoWeb in WaybackProxy? It would be off by default, but fun to turn on to navigate a selection of more functional and better looking websites from that time. Could be also useful to get some files from old FTP's, that ProtoWeb archived and provide access to.
I tried to disable the date tolerance on the program but unless i put back to 365 the program won't start due to various errors.
Right, now its broken.
Nothing from the dev but now its not working at all.
Been at this crap for 2 straight days.
2022-09-22
Raspberry pi 32bit bullseye
Everything I load any page up I get this:
Any help at all?
Wops! Error opening config.json
Traceback (most recent call last):
File "c:\Users\User\Downloads\WaybackProxy-master\WaybackProxy-master\waybackproxy.py", line 26, in
shared_state = SharedState()
^^^^^^^^^^^^^
File "c:\Users\User\Downloads\WaybackProxy-master\WaybackProxy-master\waybackproxy.py", line 17, in init
self.availability_cache = lrudict.LRUDict(maxduration=86400, maxsize=1024) if WAYBACK_API else None
^^^^^^^^^^^
NameError: name 'WAYBACK_API' is not defined
It seems the archive.org headers have changed and a new subdomain is now used. I will submit a fix once I mange to get it working. In the meanwhile I disabled JavaScript in my Netscape 4.0 settings :)
<html>
<head><script src="//archive.org/includes/analytics.js?v=cf34f82" type="text/javascript"></script>
<script type="text/javascript">window.addEventListener('DOMContentLoaded',function(){var v=archive_analytics.values;v.service='wb';v.server_name='wwwb-app228.us.archive.org';v.server_ms=536;archive_analytics.send_pageview({});});</script>
<script type="text/javascript" src="//web-static.archive.org/_static/js/bundle-playback.js?v=6XRi73ky" charset="utf-8"></script>
<script type="text/javascript" src="//web-static.archive.org/_static/js/wombat.js?v=txqj7nKC" charset="utf-8"></script>
<script>window.RufflePlayer=window.RufflePlayer||{};window.RufflePlayer.config={"autoplay":"on","unmuteOverlay":"hidden"};</script>
<script type="text/javascript" src="//web-static.archive.org/_static/js/ruffle.js"></script>
<script type="text/javascript">
__wm.init("https://web.archive.org/web");
__wm.wombat("http://www.arnes.si:80/","19970131060208","https://web.archive.org/","web","//web-static.archive.org/_static/",
"854690528");
</script>
<link rel="stylesheet" type="text/css" href="//web-static.archive.org/_static/css/banner-styles.css?v=S1zqJCYt" />
<link rel="stylesheet" type="text/css" href="//web-static.archive.org/_static/css/iconochive.css?v=qtvMKcIJ" />
<!-- End Wayback Rewrite JS Include -->
I'm really not sure if it's just me doing something wrong, but everytime I attempt to access a webpage using the WaybackProxy on Win 95, it simply throws an error like "The connection with webpage could not initiated".
It works flawlessly on Windows 98, but refuses to work on Windows 95.
Hi, if the YouTube video have been deleted, And the video link is not saved from past.
Here it is.
I am using waybackproxy to crawl pages saved on the wayback machine (because it was the easiest and fastest thing to set up)
However, I've noticed some random "leakage" from newer dates.
The proxy is set to January 1st, 2003. However some pages from years after are randomly appearing.
For example:
As we all know, Youtube released in 2005, and was bought by Google in 2006, but here it is in my data, showing up on a google support page (Which I doubt even existed in 2003!)
...
{
"url": "https://support.google.com/",
"title": "Google Help",
"tags": [
"center",
"search",
"youtube"
]
},
...
(The tags are based on the most common words on a page)
This isn't just a one off thing, as a bit further down...
...
{
"url": "https://accounts.google.com/ServiceLogin?hl=en&passive=true&continue=http://support.google.com/&ec=GAZAdQ",
"title": "Sign in - Google Accounts",
"tags": [
"use",
"account",
"email"
]
},
...
... we see a Google accounts page, which definetly was NOT a thing in 2003.
There are 41 occurences of this after running the proxy & crawler for just two-ish minutes.
I don't see any other pages experience this "leakage", only Google pages. Is there any way to fix this?
Today whitelists only allow specific domains, but not all the subdirectories and subdomains are included, so it would be very nice if one could include a domain wildcard like:
anything.apple.com or apple.com/anything
and not just apple.com
So it would bypass everything for that domain.
So, as of today and possibly tonight, anytime I try to connect to a website it will immediately throw up a bunch of errors with the error "urllib.error.URLError: <urlopen error [WinError 10061] No connection could be made because the target machine actively refused it", loading like 2 or 3 images before everything errors out. It's apparently also having errors with request_url.
Update, heres the whole error.
[>] http://twitter.com/
[!] Failed to fetch Wayback availability data
[!] Fetch exception:
Traceback (most recent call last):
File "C:\Users\theci\AppData\Local\Programs\Python\Python312\Lib\urllib\request.py", line 1344, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "C:\Users\theci\AppData\Local\Programs\Python\Python312\Lib\http\client.py", line 1319, in request
self._send_request(method, url, body, headers, encode_chunked)
File "C:\Users\theci\AppData\Local\Programs\Python\Python312\Lib\http\client.py", line 1365, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "C:\Users\theci\AppData\Local\Programs\Python\Python312\Lib\http\client.py", line 1314, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "C:\Users\theci\AppData\Local\Programs\Python\Python312\Lib\http\client.py", line 1074, in _send_output
self.send(msg)
File "C:\Users\theci\AppData\Local\Programs\Python\Python312\Lib\http\client.py", line 1018, in send
self.connect()
File "C:\Users\theci\AppData\Local\Programs\Python\Python312\Lib\http\client.py", line 984, in connect
self.sock = self._create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\theci\AppData\Local\Programs\Python\Python312\Lib\socket.py", line 852, in create_connection
raise exceptions[0]
File "C:\Users\theci\AppData\Local\Programs\Python\Python312\Lib\socket.py", line 837, in create_connection
sock.connect(sa)
ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\WaybackProxy-master\waybackproxy.py", line 197, in handle
conn = urllib.request.urlopen(request_url)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\theci\AppData\Local\Programs\Python\Python312\Lib\urllib\request.py", line 215, in urlopen
return opener.open(url, data, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\theci\AppData\Local\Programs\Python\Python312\Lib\urllib\request.py", line 515, in open
response = self._open(req, data)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\theci\AppData\Local\Programs\Python\Python312\Lib\urllib\request.py", line 532, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\theci\AppData\Local\Programs\Python\Python312\Lib\urllib\request.py", line 492, in _call_chain
result = func(*args)
^^^^^^^^^^^
File "C:\Users\theci\AppData\Local\Programs\Python\Python312\Lib\urllib\request.py", line 1373, in http_open
return self.do_open(http.client.HTTPConnection, req)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\theci\AppData\Local\Programs\Python\Python312\Lib\urllib\request.py", line 1347, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [WinError 10061] No connection could be made because the target machine actively refused it>
I was trying to use WaybackProxy with IE4, but got an error about unknown http response. So, I tried to use curl and got this:
* Unsupported response code in HTTP response
* Closing connection 0
curl: (1) Unsupported response code in HTTP response
Using netcat, I found that it was answering HTTP/1.1200 OK
, instead of HTTP/1.1 200 OK
(that is, it was missing an space before the 200).
After adding the space, in waybackproxy.py#L415, it works fine.
I created pull request #5
Some JavaScript scripts only work on browsers about as new as Internet Explorer 11 on Windows 8.1. They don't work at all in any browser older than IE 11. Is there a fix for this or will this be fixed in a future update?
For example, go to WWE.com in the year 2002 on IE 6 on Windows XP. When you hover over a menu, no options appear. Go to the same website on Pale Moon on Windows 11. The menus bring out options as you hover over them.
I know that for some reason the WaybackProxy deals only with GET requests and that this is not compatible with POST requests.
However, it would be interesting to support POST requests on the Bypassed URLs (I host some old sites on my local cache and in some of them I often restore search mechanisms for example, and they do not work with waybackproxy bypass, so I always need to disable and then re-enable).
Does it sound doable?
Hi, I setup Waybackproxy on a Docker container and while it's service up pages from the date specified, it seems to insert some JS at the start which is breaking things on IE 5 and Netscape 3. (The two browsers I tried on Windows 98 and Mac OS 7.5.3 respectively.)
I looked through the docs I don't immediately see a way to stop this JS insertion -- and I missing a config option?
Thanks!
I set up the proxy on windows, do py -m pip install --user -r requirements.txt
Then I launch the proxy and get:
"WaybackProxy now requires urllib3 to be installed. Follow setup step 3 on the readme to fix this."
urllib3 IS installed, the proxy just cant use it.
Hi,
First, thanks for the effort, did not know somebody went with making this awesome thing :D
Found it on that yt video, pretty cool :D
Anyway, I tried running it on my Pi Zero and now running it in a debian VM, but I still have the issue of images not being loaded.
Seems like the URLs for images are not being forwarded to browser properly.
This is what my output looks like when the script is running and while loading apple.com
[>] http://statse.webtrendslive.com/S139226/button6.asp?tagver=6&si=139226&offset=-800&fw=0&js=No&
[>] http://www.apple.com/main/css/fonts.css
[>] http://statse.webtrendslive.com/S130376/button6.asp?tagver=6&si=130376&offset=-800&fw=0&js=No&
[>] http://images.apple.com/t/2002/us/en/i/1bg.gif
[f] http://www.apple.com/main/css/fonts.css
[f] http://images.apple.com/t/2002/us/en/i/1bg.gif
[f] http://statse.webtrendslive.com/S139226/button6.asp?tagver=6&si=139226&offset=-800&fw=0&js=No&
[f] http://statse.webtrendslive.com/S130376/button6.asp?tagver=6&si=130376&offset=-800&fw=0&js=No&
[>] http://images.apple.com/t/2003/us/en/i/3.gif
[>] http://images.apple.com/t/2002/us/en/i/4.gif
[f] http://images.apple.com/t/2003/us/en/i/3.gif
[f] http://images.apple.com/t/2002/us/en/i/4.gif
[>] http://images.apple.com/t/2002/us/en/i/2.gif
[>] http://images.apple.com/t/2002/us/en/i/5.gif
[>] http://images.apple.com/t/2002/us/en/i/6.gif
[f] http://images.apple.com/t/2002/us/en/i/5.gif
Seems it does not use the original Wayback Machine URL but direct to the original source...same goes for google or yahoo...could you provide a little assistance? :)
I am using Safari on my PB G4 :)
I don't understand how to get this running on Windows. Whenever I open the proxy, it just closes instantly. What moronic thing am I doing wrong?
hello,
i would like to know how do you install this proxy on a raspberry pi 4
Thanks !
I have the date set to October 31 2005.
I type in http://www.youtube.com and it immediately doesn't load, citing "HTTP 501 Not Implemented/HTTP 505 Version Not Supported".
When trying to load a page like RuneScape, it will load a page from years earlier.
When loading Google.com it works just fine, but when I type YouTube into the search box and press Search, it says there are no snapshots within the date range in my config. I had it set to 365, and tried updating it to 500 but that didn't fix the problem.
I'm using the proxy on Internet Explorer 6 running under Windows XP Service Pack 2. The proxy is running on Windows 10 21H2 and Python 3.6.8. My config is here: https://gist.github.com/wertercatt/c575c8cbf389eb7a6f8baac859c91457
When attempting to download a binary file through the proxy, such as http://download.microsoft.com:80/download/9/f/f/9ffc346d-55e9-420a-89fd-22d10d8f803f/ZooCardFlip.msi for example, the proxy kills the connection early and prevents the browser from actually completing the download.
I'm willing to provide any more details you need.
I dont know if its only me but most links in web.archive.org sites are in www, example: youtube in 2005-2012, but when i open a website with HTTPS or WWW, it doesnt work, could you please fix this?
K thats all
UPDATE: Nevermind, it was a issue with chrome
Unable to determine where this is originating, but I'm not able to use the current build due to it.
colin@Colins-MacBook-Air~> curl -v http://apple.com -x http://192.168.1.5:8888
* Trying 192.168.1.5:8888...
* Connected to 192.168.1.5 (192.168.1.5) port 8888 (#0)
> GET http://apple.com/ HTTP/1.1
> Host: apple.com
> User-Agent: curl/7.79.1
> Accept: */*
> Proxy-Connection: Keep-Alive
>
* Unsupported HTTP version in response
* Closing connection 0
curl: (1) Unsupported HTTP version in response
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.