Comments (8)
Some relevant lines from logs etc. for one of the affected files (vsg-small.png
):
hts-log.txt
22:50:25 Warning: Unexpected 412/416 error (Requested Range Not Satisfiable) for https://www.openscenegraph.com/images/vsg-small.png, 'C:/Source/OpenSceneGraphDotCom/OpenSceneGraph/www.openscenegraph.com/images/vsg-small.html' could not be found on disk
new.lst
[www.openscenegraph.com/images/vsg-small.png]
[www.openscenegraph.com/images/vsg-small.html]
new.txt
17:38:50 314/314 ------ 416 error ('Requested%20Range%20Not%20Satisfiable') text/html date:Wed,%2022%20Nov%202023%2016:57:05%20GMT https://www.openscenegraph.com/images/vsg-small.png C:/Source/OpenSceneGraphDotCom/OpenSceneGraph/www.openscenegraph.com/images/vsg-small.png (from https://www.openscenegraph.com/)
23:00:37 6895/6895 ---M-- 200 added ('OK') image/png etag:%221aef-57b663281bf20%22 https://www.openscenegraph.com/images/vsg-small.png C:/Source/OpenSceneGraphDotCom/OpenSceneGraph/www.openscenegraph.com/images/vsg-small.html (from https://www.openscenegraph.com/images/vsg-small.png)
You can view everything from the time it didn't do this (and I included a bunch of stuff from domains I didn't need) at https://github.com/AnyOldName3/OpenSceneGraphDotComBackup/tree/main/OpenSceneGraph/hts-cache
from httrack.
I've done some digging, and haven't determined whether the 416 was caused by HTTrack submitting a dodgy request or by the server doing something wrong. As the file has the right contents and the log mentions the right MIME type, I think it's plausible that the response body was still the correct file and response header still had the right MIME type, and HTTrack automatically changed the file extension because the status code represented an error.
from httrack.
Actually, that's not right - the line that mentions the 416 error mentions text/html
and it's only the later one that has a 200 status that mentions image/png
. That would suggest to me that the server's setting the wrong MIME type when it generates an error. I still don't know whether the error is the server's fault or HTTrack's.
from httrack.
Definitely down to an intermittent fault as it's a different set of files affected when rerunning the mirroring process again. I've still not managed to access the server logs, so have no more information about whether it's down to malformed requests or incorrect handling of well-formed requests.
from httrack.
I've looked a bit more, and it's apparently also affecting loads of gzips that didn't have the problem the first time I attempted the mirroring process. As far as I can tell (I committed to a Git repo after the first attempt, so it should be accurate), the only things that changed on HTTrack's end are the following winprofile.ini
changes:
HTMLFirst=1
(wasHTMLFirst=0
, which the GUI says is slower)WildCardFilters=+*.css +*.js -ad.doubleclick.net/* -mime:application/foobar
(wasWildCardFilters=+*.png +*.gif +*.jpg +*.jpeg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar
, which caused images from other domains to be included)CurrentUrl=https://www.openscenegraph.com/%0d%0alists.openscenegraph.org%0d%0alists.openscenegraph.org/pipermail/osg-users-openscenegraph.org%0d%0awww.openscenegraph.com%0d%0awww.openscenegraph.org
(wasCurrentUrl=https://www.openscenegraph.com/
, which required me to manually include more URLs in interactive mode).
I should probably clarify that the old.txt
and old.zip
mentioned below are for a previous partially successful run of the process with the newer settings, not the run with the old settings that I linked to a few posts ago.
Some of these tie in with lines in new.txt
like
20:20:30 76352/76352 ---M-- 200 added ('OK') application/gzip etag:%2212a40-574b8b6a9f35d%22 http://lists.openscenegraph.org/pipermail/osg-users-openscenegraph.org/2018-August.txt.gz C:/Source/OpenSceneGraphDotCom/OpenSceneGraph/lists.openscenegraph.org/pipermail/osg-users-openscenegraph.org/2018-August.txt.html (from http://lists.openscenegraph.org/pipermail/osg-users-openscenegraph.org/)
In the old.txt
, the same URL is mentioned in these two lines:
17:40:31 76352/76352 ---M-- 200 added ('OK') application/gzip etag:%2212a40-574b8b6a9f35d%22 http://lists.openscenegraph.org/pipermail/osg-users-openscenegraph.org/2018-August.txt.gz C:/Source/OpenSceneGraphDotCom/OpenSceneGraph/lists.openscenegraph.org/pipermail/osg-users-openscenegraph.org/2018-August.txt.gzip (from http://lists.openscenegraph.org/pipermail/osg-users-openscenegraph.org/)
22:50:33 314/314 UR-MC- 416 error ('Requested%20Range%20Not%20Satisfiable') text/html date:Wed,%2022%20Nov%202023%2022:50:33%20GMT http://lists.openscenegraph.org/pipermail/osg-users-openscenegraph.org/2018-August.txt.gz C:/Source/OpenSceneGraphDotCom/OpenSceneGraph/lists.openscenegraph.org/pipermail/osg-users-openscenegraph.org/2018-August.txt.gzip (from http://lists.openscenegraph.org/pipermail/osg-users-openscenegraph.org/)
Looking in new.zip
, there's an empty file with the same name, but in old.zip
, it's there twice (yay, weird quirks of the zip underspecified zip format), once as an empty file, and once containing:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>416 Requested Range Not Satisfiable</title>
</head><body>
<h1>Requested Range Not Satisfiable</h1>
<p>None of the range-specifier values in the Range
request-header field overlap the current extent
of the selected resource.</p>
</body></html>
I guess that debunks the theory that the 416 errors still led to the same file contents being served. It also makes it look like the presence of these files in the cache zip from the previous run poisons the next run, even if the file becomes available again.
from httrack.
I've just noticed that I've been using Download web site(s) + questions instead of * Update existing download, which I imagine might not have been the best idea for the runs where I didn't delete everything first to start with a clean slate.
from httrack.
Doing a fresh run with the same settings generated no 416 errors at first, then towards the end of the process, the first number in the Links scanned: 12345/12345 (+1234) bit reached the same value as the second number, and started counting again from zero. During this phase, a significant percentage of the files fetched generated 416 errors.
As it was a clean run, this can't have been caused by the cache from a previous run poisoning the next run. I don't think this was the setting to deal with HTML files first as there are plenty of non-HTML files before this point in new.txt
.
from httrack.
Today I tried running this again with the option to fetch HTML files first disabled, like it had been for my initial, successful run. I hit no HTTP 416 errors, and all files were given the correct extension. I still don't know whether the 416 errors were caused by malformed requests from HTTrack or by the server misbehaving, but at least this is no longer giving me grief. I also found it ran in about half the time with the option disabled, which isn't what the tooltip or documentation suggested.
from httrack.
Related Issues (20)
- Can not bear crazy server (Moved Permanently) for https://max.skyrock.com/ HOT 1
- Question | What program can I use if I only want a list of files/links in a text file? HOT 7
- The splash screen only loads HOT 3
- Can you please update the download links for 3.49.4, i am having issues with the current version HOT 5
- download issue HOT 2
- Hello , please give a link from where I can download the latest apk (not play store)
- You should REALLY let us set custom download limits. HOT 2
- WinHTTrack interactive mode - allow mirroring/ignoring all subdomains
- New release? HOT 7
- Continue interrupted download HOT 1
- Enhancement Request: Improve CLI Base Path Handling for Unix Path Aliases on Mac
- Consider separating out the various parts of *URL Hacks* into separate options HOT 1
- Minizip update request HOT 2
- file "missing" is referenced in configure but is missing HOT 1
- www.httrack.com site is not loading HOT 3
- git clone https://github.com/xroche/httrack.git --recurse cd httrack ./configure --prefix=$HOME/usr && make -j8 && make install
- hypertrack
- "Download website with questions" pop up shows only the part of the link that fits in its small window. HOT 4
- Cloudfare restricting
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from httrack.