Comments (8)
I included the whole statement in download_book in an if-statement (if req.headers['Content-Type'] == 'application/pdf':
) and got the following warnings:
- Warning: content type of download url is text/html;charset=utf-8; please download manually: https://link.springer.com/content/pdf/10.1007/978-3-319-75771-1.pdf
- Warning: content type of download url is text/html;charset=utf-8; please download manually: https://link.springer.com/content/pdf/10.1007/978-3-319-75502-1.pdf
- Warning: content type of download url is text/html;charset=utf-8; please download manually: https://link.springer.com/content/pdf/10.1007/978-3-030-25943-3.pdf
So those are indeed the only three books that have been revoked.
from springer_free_books.
I solved this by adding KeyError
to the errors caught in download_books(books, folder, patches):
function in line 134 of helper.py.
from
except (OSError, IOError) as e:
to:
except (OSError, IOError, KeyError) as e:
This way when the KeyError is encountered it is caught and I get
`Overall Progress: 85%|█████████████████████████████████████████████▋ | 329/389 [1:02:36<1:29:43, 89.72s/it]'content-length'
- Problem downloading: Introduction to Programming with Fortran, so skipping it.`
and the download continues with the next book
from springer_free_books.
Replaced
chunk_size = 1024
file_size = int(req.headers['Content-Length'])
num_bars = file_size // chunk_size
with
chunk_size = 1024
if 'Content-Length' in req.headers:
file_size = int(req.headers['Content-Length'])
num_bars = file_size // chunk_size
else:
print("Warning: missing key 'Content-Length' in request headers; taking default length of 100 for progress bar.")
num_bars = 100
`
, but I got security errors when trying to push my local branch (it would be my first time contritbuting).
from springer_free_books.
Just run into that error too, not sure what book it was trying to download at the time.
Traceback (most recent call last):
File "main.py", line 88, in <module>
download_books(books, folder, patches)
File "/usr/home/pokui/code/springer_free_books/helper.py", line 133, in download_books
libunwind: EHHeaderParser::decodeTableEntry: bad fde: CIE ID is not zero
download_book(request, output_file, patch)
File "/usr/home/pokui/code/springer_free_books/helper.py", line 87, in download_book
file_size = int(req.headers['Content-Length'])
File "/home/pokui/.local/lib/python3.7/site-packages/requests/structures.py", line 54, in __getitem__
return self._store[key.lower()][1]
KeyError: 'content-length'
from springer_free_books.
Investigated a little bit further, but the hack will not work. What happens is that in these cases, the book has been split across multiple pdfs (or actually, still seems to be behind the paywall), so the download link won't work. Content-Type in that case is 'text/html;charset=utf-8' instead of 'application/pdf'.
Files impacted are (using new indexing method)
295 "A Beginner's Guide to Scala, Object Orientation and Functional Programming"
331 "Introduction to Programming with Fortran"
388 "Advanced Guide to Python 3 Programming"
from springer_free_books.
Ok, then it means it's a file that is in the process of being removed from the list. The answer would then be to emit an error that the file is no longer available for download.
from springer_free_books.
also getting this error at the following index. initially i thought it was due to a dropped internet/vpn connection but restarting it always results in the same.
This is a straight dump without any filters.
:~/springer_free_books$ python3 main.py
389 titles ready to be downloaded...
Overall Progress: 75%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 293/389 [02:22<00:46, 2.05it/s]
Traceback (most recent call last):
File "main.py", line 88, in
download_books(books, folder, patches)
File "/home/uduo/springer_free_books/helper.py", line 133, in download_books
download_book(request, output_file, patch)
File "/home/uduo/springer_free_books/helper.py", line 87, in download_book
file_size = int(req.headers['Content-Length'])
File "/home/uduo/springer_free_books/.venv/lib/python3.6/site-packages/requests/structures.py", line 54, in getitem
return self._store[key.lower()][1]
KeyError: 'content-length'
from springer_free_books.
I retrieved the latest code version, and it contains this line that ruins the exception catch file_size = int(req.headers['Content-Length']) if req.headers.get('Content-Length') else 30000
.
After removing that, it goes on like planned. Many kudos both for this solution, of just catching the exception, and adding the retry when any exceptions occur.
from springer_free_books.
Related Issues (20)
- object of type 'float' has no len() HOT 1
- Problem with run_VirtualEnv.bat HOT 1
- Duplicate downloading the pdf files as epub
- Possible Introduction of reCaptcha by Springer? HOT 1
- Error: probably not a valid book HOT 7
- Please share the archive HOT 2
- Somebody please mirror and make a torrent HOT 3
- [Feature Request] Springer's 1000 open-access books HOT 2
- Link to the downloaded books HOT 3
- Use specific User-Agent to bypass Google captcha HOT 1
- in windows : import pandas report error ImportError: DLL load failed while importing aggregations:找不到指定模块
- Download fails around 4%, "File name too long" & NameError: name 'time' is not defined HOT 8
- excel file indices missing - can't download specific books HOT 8
- AttributeError: 'Int64Index' object has no attribute 'array' HOT 4
- HTTP Error 404: Not Found HOT 4
- Where are the downloaded files? HOT 2
- HTTP Error 404 Not Found but I was able to manually download spreadsheet HOT 4
- I just found all the download links HOT 15
- Problem with Docker/Raspberry Pi
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from springer_free_books.