tishacy / scidownl Goto Github PK
View Code? Open in Web Editor NEWAn unofficial api for downloading papers from SciHub via DOI, PMID, title
License: MIT License
An unofficial api for downloading papers from SciHub via DOI, PMID, title
License: MIT License
你好,大部分scihub的ulr都不能用了,能否加上可用的https://libgen.wikicn.top/
或者可以直接选择http://gen.lib.rus.ec/scimag/
多谢
I am trying to download a paper with doi : 10.1021/nn100856y and title is, "Preparation and Characterization of Flexible Asymmetric Supercapacitors Based on Transition-Metal-Oxide Nanowire Single-Walled Carbon Nanotube Hybrid Thin-Film Electrodes".
then I get this,
FileNotFoundError Traceback (most recent call last)
in
10 for doi in DOIs:
11 print(doi)
---> 12 SciHub(doi, out).download(choose_scihub_url_index=3)
C:\ProgramData\Anaconda3\lib\site-packages\scidownl\scihub.py in download(self, choose_scihub_url_index)
88 pdf = self.find_pdf_in_html(res.text)
89
---> 90 self.download_pdf(pdf)
91 # try:
92 # pdf = self.find_pdf_in_html(res.text)
C:\ProgramData\Anaconda3\lib\site-packages\scidownl\scihub.py in download_pdf(self, pdf)
168 out_file_path = os.path.join(self.out, pdf['title']+'.pdf')
169 downl_size = 0
--> 170 with open(out_file_path, 'wb') as f:
171 for data in res.iter_content(chunk_size=1024, decode_unicode=False):
172 f.write(data)
FileNotFoundError: [Errno 2] No such file or directory: 'paper\Preparation and Characterization of Flexible Asymmetric Supercapacitors Based on Transition-Metal-Oxide Nanowire Single-Walled Carbon Nanotube Hybrid Thin-Film Electrodes.pdf'
How can I stop it downloading twice??
➜ temp scidownl download --doi https://doi.org/10.1145/3375633
[INFO] | 2023/12/11 16:17:57 | Run scihub tasks. Tasks information:
[INFO] | 2023/12/11 16:17:57 | DOI(s): ['https://doi.org/10.1145/3375633']
[INFO] | 2023/12/11 16:17:57 | Output: /Users/ma/temp
[INFO] | 2023/12/11 16:17:57 | SciHub Url: <auto.availability_first>
[INFO] | 2023/12/11 16:17:58 | Found 8 valid SciHub domains in total: ['http://sci-hub.ru', 'https://sci-hub.st', 'https://sci-hub.ru', 'https://sci-hub.mobi', 'http://sci-hub.mobi', 'https://sci-hub.se', 'http://sci-hub.se', 'http://sci-hub.st']
[INFO] | 2023/12/11 16:17:58 | Saved 8 SciHub domains to local db.
[INFO] | 2023/12/11 16:17:58 | Choose scihub url [0]: http://sci-hub.ru
[INFO] | 2023/12/11 16:17:58 | <- Request: scihub_url=http://sci-hub.ru, source=DoiSource[type=doi, id=doi.org/10.1145/3375633], proxies={}
[INFO] | 2023/12/11 16:18:00 | -> Response: status_code=200, content_length=7554
[INFO] | 2023/12/11 16:18:00 | * Extracted information: {'url': 'http://sci-hub.ru/downloads/2021-06-09/4a/beschastnikh2020.pdf', 'title': 'Visualizing Distributed System Executions. ACM Transactions on Software Engineering and Methodology, 29(2), 1–38'}
100% [==================================================] 2522115/2522115
[INFO] | 2023/12/11 16:18:03 | ↓ Successfully download the url to: Visualizing Distributed System Executions. ACM Transactions on Software Engineering and Methodology, 29(2), 1–38.pdf
[INFO] | 2023/12/11 16:18:03 | Choose scihub url [0]: http://sci-hub.ru
[INFO] | 2023/12/11 16:18:03 | <- Request: scihub_url=http://sci-hub.ru, source=DoiSource[type=doi, id=doi.org/10.1145/3375633], proxies={}
[INFO] | 2023/12/11 16:18:05 | -> Response: status_code=200, content_length=7554
[INFO] | 2023/12/11 16:18:05 | * Extracted information: {'url': 'http://sci-hub.ru/downloads/2021-06-09/4a/beschastnikh2020.pdf', 'title': 'Visualizing Distributed System Executions. ACM Transactions on Software Engineering and Methodology, 29(2), 1–38'}
100% [==================================================] 2522115/2522115
[INFO] | 2023/12/11 16:18:06 | ↓ Successfully download the url to: Visualizing Distributed System Executions. ACM Transactions on Software Engineering and Methodology, 29(2), 1–38.pdf
➜ temp n
➜ temp scidownl domain.list
+----------------------+----------------+---------------+
| Url | SuccessTimes | FailedTimes |
|----------------------+----------------+---------------|
| http://sci-hub.ru | 2 | 0 |
| https://sci-hub.st | 0 | 0 |
| https://sci-hub.ru | 0 | 0 |
| https://sci-hub.mobi | 0 | 0 |
| http://sci-hub.mobi | 0 | 0 |
| https://sci-hub.se | 0 | 0 |
| http://sci-hub.se | 0 | 0 |
| http://sci-hub.st | 0 | 0 |
+----------------------+----------------+---------------+
➜ temp scidownl download --doi https://doi.org/10.1145/3375633 --out 10.1145_3375633.pdf
[INFO] | 2023/12/11 16:20:50 | Run scihub tasks. Tasks information:
[INFO] | 2023/12/11 16:20:50 | DOI(s): ['https://doi.org/10.1145/3375633']
[INFO] | 2023/12/11 16:20:50 | Output: 10.1145_3375633.pdf
[INFO] | 2023/12/11 16:20:50 | SciHub Url: <auto.availability_first>
[INFO] | 2023/12/11 16:20:50 | Choose scihub url [0]: http://sci-hub.ru
[INFO] | 2023/12/11 16:20:50 | <- Request: scihub_url=http://sci-hub.ru, source=DoiSource[type=doi, id=doi.org/10.1145/3375633], proxies={}
[INFO] | 2023/12/11 16:20:52 | -> Response: status_code=200, content_length=7554
[INFO] | 2023/12/11 16:20:52 | * Extracted information: {'url': 'http://sci-hub.ru/downloads/2021-06-09/4a/beschastnikh2020.pdf', 'title': 'Visualizing Distributed System Executions. ACM Transactions on Software Engineering and Methodology, 29(2), 1–38'}
100% [==================================================] 2522115/2522115
[INFO] | 2023/12/11 16:20:54 | ↓ Successfully download the url to: 10.1145_3375633.pdf
[INFO] | 2023/12/11 16:20:54 | Choose scihub url [0]: http://sci-hub.ru
[INFO] | 2023/12/11 16:20:54 | <- Request: scihub_url=http://sci-hub.ru, source=DoiSource[type=doi, id=doi.org/10.1145/3375633], proxies={}
[INFO] | 2023/12/11 16:20:55 | -> Response: status_code=200, content_length=7554
[INFO] | 2023/12/11 16:20:55 | * Extracted information: {'url': 'http://sci-hub.ru/downloads/2021-06-09/4a/beschastnikh2020.pdf', 'title': 'Visualizing Distributed System Executions. ACM Transactions on Software Engineering and Methodology, 29(2), 1–38'}
100% [==================================================] 2522115/2522115
[INFO] | 2023/12/11 16:20:57 | ↓ Successfully download the url to: 10.1145_3375633.pdf
➜ temp n
➜ temp scidownl domain.list
+----------------------+----------------+---------------+
| Url | SuccessTimes | FailedTimes |
|----------------------+----------------+---------------|
| http://sci-hub.ru | 4 | 0 |
| https://sci-hub.st | 0 | 0 |
| https://sci-hub.ru | 0 | 0 |
| https://sci-hub.mobi | 0 | 0 |
| http://sci-hub.mobi | 0 | 0 |
| https://sci-hub.se | 0 | 0 |
| http://sci-hub.se | 0 | 0 |
| http://sci-hub.st | 0 | 0 |
+----------------------+----------------+---------------+
➜ temp scidownl download --doi 10.1016/j.vascn.2018.01.499 --out test.pdf
[INFO] | 2023/12/11 16:22:19 | Run scihub tasks. Tasks information:
[INFO] | 2023/12/11 16:22:19 | DOI(s): ['10.1016/j.vascn.2018.01.499']
[INFO] | 2023/12/11 16:22:19 | Output: test.pdf
[INFO] | 2023/12/11 16:22:19 | SciHub Url: <auto.availability_first>
[INFO] | 2023/12/11 16:22:19 | Choose scihub url [0]: http://sci-hub.ru
[INFO] | 2023/12/11 16:22:19 | <- Request: scihub_url=http://sci-hub.ru, source=DoiSource[type=doi, id=10.1016/j.vascn.2018.01.499], proxies={}
[INFO] | 2023/12/11 16:22:21 | -> Response: status_code=200, content_length=7786
[INFO] | 2023/12/11 16:22:21 | * Extracted information: {'url': 'http://sci-hub.ru/tree/39/25/3925104e34c91e178a498a5ed59f4dba.pdf', 'title': 'Echocardiography and contractility indices simultaneously evaluated in telemetered beagle dogs A HESI sponsored cross company evaluation. Journal of Pharmacological and Toxicological Methods, 93, 15'}
100% [==================================================] 70282/70282
[INFO] | 2023/12/11 16:22:22 | ↓ Successfully download the url to: test.pdf
[INFO] | 2023/12/11 16:22:22 | Choose scihub url [0]: http://sci-hub.ru
[INFO] | 2023/12/11 16:22:22 | <- Request: scihub_url=http://sci-hub.ru, source=DoiSource[type=doi, id=10.1016/j.vascn.2018.01.499], proxies={}
[INFO] | 2023/12/11 16:22:23 | -> Response: status_code=200, content_length=7786
[INFO] | 2023/12/11 16:22:23 | * Extracted information: {'url': 'http://sci-hub.ru/tree/39/25/3925104e34c91e178a498a5ed59f4dba.pdf', 'title': 'Echocardiography and contractility indices simultaneously evaluated in telemetered beagle dogs A HESI sponsored cross company evaluation. Journal of Pharmacological and Toxicological Methods, 93, 15'}
100% [==================================================] 70282/70282
[INFO] | 2023/12/11 16:22:24 | ↓ Successfully download the url to: test.pdf
➜ temp scidownl domain.list
+----------------------+----------------+---------------+
| Url | SuccessTimes | FailedTimes |
|----------------------+----------------+---------------|
| http://sci-hub.ru | 6 | 0 |
| https://sci-hub.st | 0 | 0 |
| https://sci-hub.ru | 0 | 0 |
| https://sci-hub.mobi | 0 | 0 |
| http://sci-hub.mobi | 0 | 0 |
| https://sci-hub.se | 0 | 0 |
| http://sci-hub.se | 0 | 0 |
| http://sci-hub.st | 0 | 0 |
+----------------------+----------------+---------------+
hi, I am a beginner in python. I would like to download a set of articles with PMID from pubmed and download it in scihub using SCIDOWNL. But I am unable to download few articles ( >50% for a given set of PMIDs) when in FOR loop and successfully downloaded when doing individually. Can you help me resolve this issue?
there had a bug in sci.hub.py 114 ,when I tried to run , It was emerged .
'NoneType' object has no attribute 'attrs'
there had a bug in sci.hub.py 114 ,when I tried to run , It was emerged .
When I try to download an article using
scidownl -D <doi>
I get the following error:
File "/home/jgarcia/.local/lib/python3.9/site-packages/scidownl/scihub.py", line 112, in find_pdf_in_html
pdf_url = soup.find('iframe', {'id': 'pdf'}).attrs['src'].split('#')[0]
AttributeError: 'NoneType' object has no attribute 'attrs'
This didn't happen before. I am using Arch Linux, but also tried in a virtual machine with Linux Mint.
Accessing SciHub manually and downloading the article works.
$ scidownl -D 10.1021/ol9910114
[INFO] Reading available links of Scihub...
[INFO] Successfully read available links of Scihub.
[INFO] Choose the available link 0: https://sci-hub.ren
[ERROR] Failed to access the article.
Hi there,
I can run everything smooth, but unfortunately get the below error msg for DOIS with (
scidownl download --doi 10.1016/0003-3472(95)80204-5
-bash: syntax error near unexpected token `('
Is there any way to overcome this issue? I can of course filter the DOIS with (
and download them manually, but if there is a way to download this type of DOIs would be great!
Many thanks!
Nicolay
Hi , the error messages listed below. And I don't know what should I do
It works on Windows 11 system.
Error occurs, task status: downloading_failed, error: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1108)>
Error occurs, task status: crawling_failed, error: Error occurs when crawling source: DoiSource[type=doi, id=doi.org/10.1145/3375633]
Hi there,
I hope this message finds you well. I wanted to express my gratitude for the fantastic software you've developed. Overall, it runs smoothly, but I've encountered a minor issue that I wanted to bring to your attention.
Specifically, I've noticed that the software doesn't allow me to download papers with DOIs that contain parentheses within the DOI numbers. I was wondering if there is a way to modify the code to enable downloading papers with DOIs that have "(" or ")" characters.
Thank you for your time and attention to this matter.
Nicolay
Can you add captcha automatic recognization to SciDownl?
Hi! Is there an easy way to use proxies?
Hi, I've been trying the tool in multiple ways now, and have been finding it quite interesting. I was curious to know if its possible to crawl papers based on their titles , because there are some older papers(older than the year 2000) ,which do not have an associated dois linked with them on scopus.But they are highly cited papers of those journals , and said papers can be found on Scihub through the title search. Was wondering if similiar thing can be achieved through the CLI tool
Thank you for this awesome tool, I am trying to get the download URL rather than saving the pdf to a folder.
from scidownl import scihub_download
paper = '32467232'
paper_type = "pmid"
out = "./paper/one_paper.pdf"
scihub_download(paper, paper_type=paper_type, out=out)
How to get the download link here?
Being prompted to fill the captcha manually and images from ImageMagick are opening.
Thank you so much for this tool! It is very powerful when you require to download a bunch of articles or papers.
I usually collect articles names and dois within a csv file. The problem is that SciDownl generates a new folder with desired filename for each article. After taking a look at the download
method of SciHub
class, I was able to identify this is the expected behavior. The fact that you can not impose downloaded filename but folder name is a bit annoying.
We could expect from user to pass outputs like my_paper/filename.pdf
or just filename.pdf
and create a new directory if required or not. I could implement this feature if you are interested in 🚀
I am confusing about that if I try to download with a list , what kind of type is support?
Hi there. I use the script for a few times, until it's stop working and get me this message Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.7/bin/scidownl", line 11, in <module> sys.exit(main()) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scidownl/scidownl.py", line 27, in main sci.download(choose_scihub_url_index=SCIHUB_URL_INDEX) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scidownl/scihub.py", line 76, in download self.use_scihub_url(choose_scihub_url_index) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scidownl/scihub.py", line 38, in use_scihub_url self.scihub_url = self.scihub_url_list[index] IndexError: list index out of range
I reinstalled and nothing. Any ideas?
The other thing is. Is possible to give the script a list of DOI and do it with one step? I have several and I was doing one by one.
Thank you so much! I'm super noob at this and this is the only tool that works from a few that I tried.
Hello,
I am a SciDownl user and I would like to add a new Sci-Hub domain to the tool. Specifically, I would like to add the domain https://sci-hub.mksa.top/ to the list of domains that SciDownl uses.
I have tried using the default domains within SciDownl, but I am not getting the desired results. Therefore, I would like to add this new domain to the list of available domains and delete the useless ones.
Could you please provide guidance on how to add a new domain to the current version of SciDownl? I would greatly appreciate any assistance you could provide.
Thank you for creating this helpful tool.
Best regards,
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.