GithubHelp home page GithubHelp logo

tishacy / scidownl Goto Github PK

View Code? Open in Web Editor NEW
167.0 167.0 40.0 259 KB

An unofficial api for downloading papers from SciHub via DOI, PMID, title

License: MIT License

Python 100.00%
doi downloader paper pdf pmid scihub

scidownl's Introduction

Hi there πŸ‘‹

✨ About me

  • πŸ”­ Turn coffee into code and bugs.
  • ✍️ Write small tools to make life easier.
  • πŸ’¬ Languages: Java, Python, Go, JavaScript.

scidownl's People

Contributors

tishacy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

scidownl's Issues

How to add or remove Sci-Hub domains in SciDownl?

Hello,

I am a SciDownl user and I would like to add a new Sci-Hub domain to the tool. Specifically, I would like to add the domain https://sci-hub.mksa.top/ to the list of domains that SciDownl uses.

I have tried using the default domains within SciDownl, but I am not getting the desired results. Therefore, I would like to add this new domain to the list of available domains and delete the useless ones.

Could you please provide guidance on how to add a new domain to the current version of SciDownl? I would greatly appreciate any assistance you could provide.

Thank you for creating this helpful tool.

Best regards,

Can't download papers with DOI with "(" or ")"

Hi there,

I hope this message finds you well. I wanted to express my gratitude for the fantastic software you've developed. Overall, it runs smoothly, but I've encountered a minor issue that I wanted to bring to your attention.

Specifically, I've noticed that the software doesn't allow me to download papers with DOIs that contain parentheses within the DOI numbers. I was wondering if there is a way to modify the code to enable downloading papers with DOIs that have "(" or ")" characters.

Thank you for your time and attention to this matter.

Nicolay

Failed to access the article

$ scidownl -D 10.1021/ol9910114
[INFO] Reading available links of Scihub...
[INFO] Successfully read available links of Scihub.
[INFO] Choose the available link 0: https://sci-hub.ren
[ERROR] Failed to access the article.

Error occurs: CERTIFICATE_VERIFY_FAILED. How do deal with that?

Hi , the error messages listed below. And I don't know what should I do
It works on Windows 11 system.
Error occurs, task status: downloading_failed, error: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1108)>
Error occurs, task status: crawling_failed, error: Error occurs when crawling source: DoiSource[type=doi, id=doi.org/10.1145/3375633]

Getting AttributeError when downloading pdf

When I try to download an article using

scidownl -D <doi>

I get the following error:

File "/home/jgarcia/.local/lib/python3.9/site-packages/scidownl/scihub.py", line 112, in find_pdf_in_html
pdf_url = soup.find('iframe', {'id': 'pdf'}).attrs['src'].split('#')[0]

AttributeError: 'NoneType' object has no attribute 'attrs'

This didn't happen before. I am using Arch Linux, but also tried in a virtual machine with Linux Mint.
Accessing SciHub manually and downloading the article works.

Enable custom filename when downloading

Thank you so much for this tool! It is very powerful when you require to download a bunch of articles or papers.

I usually collect articles names and dois within a csv file. The problem is that SciDownl generates a new folder with desired filename for each article. After taking a look at the download method of SciHub class, I was able to identify this is the expected behavior. The fact that you can not impose downloaded filename but folder name is a bit annoying.

We could expect from user to pass outputs like my_paper/filename.pdf or just filename.pdf and create a new directory if required or not. I could implement this feature if you are interested in πŸš€

[BUG] Downloads twice

How can I stop it downloading twice??

➜  temp scidownl download --doi https://doi.org/10.1145/3375633
[INFO] | 2023/12/11 16:17:57 | Run scihub tasks. Tasks information:
[INFO] | 2023/12/11 16:17:57 |          DOI(s): ['https://doi.org/10.1145/3375633']
[INFO] | 2023/12/11 16:17:57 |          Output: /Users/ma/temp
[INFO] | 2023/12/11 16:17:57 |      SciHub Url: <auto.availability_first>
[INFO] | 2023/12/11 16:17:58 | Found 8 valid SciHub domains in total: ['http://sci-hub.ru', 'https://sci-hub.st', 'https://sci-hub.ru', 'https://sci-hub.mobi', 'http://sci-hub.mobi', 'https://sci-hub.se', 'http://sci-hub.se', 'http://sci-hub.st']
[INFO] | 2023/12/11 16:17:58 | Saved 8 SciHub domains to local db.
[INFO] | 2023/12/11 16:17:58 | Choose scihub url [0]: http://sci-hub.ru
[INFO] | 2023/12/11 16:17:58 | <- Request: scihub_url=http://sci-hub.ru, source=DoiSource[type=doi, id=doi.org/10.1145/3375633], proxies={}
[INFO] | 2023/12/11 16:18:00 | -> Response: status_code=200, content_length=7554
[INFO] | 2023/12/11 16:18:00 | * Extracted information: {'url': 'http://sci-hub.ru/downloads/2021-06-09/4a/beschastnikh2020.pdf', 'title': 'Visualizing Distributed System Executions. ACM Transactions on Software Engineering and Methodology, 29(2), 1–38'}
100% [==================================================] 2522115/2522115
[INFO] | 2023/12/11 16:18:03 | ↓ Successfully download the url to: Visualizing Distributed System Executions. ACM Transactions on Software Engineering and Methodology, 29(2), 1–38.pdf
[INFO] | 2023/12/11 16:18:03 | Choose scihub url [0]: http://sci-hub.ru
[INFO] | 2023/12/11 16:18:03 | <- Request: scihub_url=http://sci-hub.ru, source=DoiSource[type=doi, id=doi.org/10.1145/3375633], proxies={}
[INFO] | 2023/12/11 16:18:05 | -> Response: status_code=200, content_length=7554
[INFO] | 2023/12/11 16:18:05 | * Extracted information: {'url': 'http://sci-hub.ru/downloads/2021-06-09/4a/beschastnikh2020.pdf', 'title': 'Visualizing Distributed System Executions. ACM Transactions on Software Engineering and Methodology, 29(2), 1–38'}
100% [==================================================] 2522115/2522115
[INFO] | 2023/12/11 16:18:06 | ↓ Successfully download the url to: Visualizing Distributed System Executions. ACM Transactions on Software Engineering and Methodology, 29(2), 1–38.pdf
➜  temp n
➜  temp scidownl domain.list
+----------------------+----------------+---------------+
| Url                  |   SuccessTimes |   FailedTimes |
|----------------------+----------------+---------------|
| http://sci-hub.ru    |              2 |             0 |
| https://sci-hub.st   |              0 |             0 |
| https://sci-hub.ru   |              0 |             0 |
| https://sci-hub.mobi |              0 |             0 |
| http://sci-hub.mobi  |              0 |             0 |
| https://sci-hub.se   |              0 |             0 |
| http://sci-hub.se    |              0 |             0 |
| http://sci-hub.st    |              0 |             0 |
+----------------------+----------------+---------------+
➜  temp scidownl download --doi https://doi.org/10.1145/3375633 --out 10.1145_3375633.pdf
[INFO] | 2023/12/11 16:20:50 | Run scihub tasks. Tasks information:
[INFO] | 2023/12/11 16:20:50 |          DOI(s): ['https://doi.org/10.1145/3375633']
[INFO] | 2023/12/11 16:20:50 |          Output: 10.1145_3375633.pdf
[INFO] | 2023/12/11 16:20:50 |      SciHub Url: <auto.availability_first>
[INFO] | 2023/12/11 16:20:50 | Choose scihub url [0]: http://sci-hub.ru
[INFO] | 2023/12/11 16:20:50 | <- Request: scihub_url=http://sci-hub.ru, source=DoiSource[type=doi, id=doi.org/10.1145/3375633], proxies={}
[INFO] | 2023/12/11 16:20:52 | -> Response: status_code=200, content_length=7554
[INFO] | 2023/12/11 16:20:52 | * Extracted information: {'url': 'http://sci-hub.ru/downloads/2021-06-09/4a/beschastnikh2020.pdf', 'title': 'Visualizing Distributed System Executions. ACM Transactions on Software Engineering and Methodology, 29(2), 1–38'}
100% [==================================================] 2522115/2522115
[INFO] | 2023/12/11 16:20:54 | ↓ Successfully download the url to: 10.1145_3375633.pdf
[INFO] | 2023/12/11 16:20:54 | Choose scihub url [0]: http://sci-hub.ru
[INFO] | 2023/12/11 16:20:54 | <- Request: scihub_url=http://sci-hub.ru, source=DoiSource[type=doi, id=doi.org/10.1145/3375633], proxies={}
[INFO] | 2023/12/11 16:20:55 | -> Response: status_code=200, content_length=7554
[INFO] | 2023/12/11 16:20:55 | * Extracted information: {'url': 'http://sci-hub.ru/downloads/2021-06-09/4a/beschastnikh2020.pdf', 'title': 'Visualizing Distributed System Executions. ACM Transactions on Software Engineering and Methodology, 29(2), 1–38'}
100% [==================================================] 2522115/2522115
[INFO] | 2023/12/11 16:20:57 | ↓ Successfully download the url to: 10.1145_3375633.pdf
➜  temp n
➜  temp scidownl domain.list
+----------------------+----------------+---------------+
| Url                  |   SuccessTimes |   FailedTimes |
|----------------------+----------------+---------------|
| http://sci-hub.ru    |              4 |             0 |
| https://sci-hub.st   |              0 |             0 |
| https://sci-hub.ru   |              0 |             0 |
| https://sci-hub.mobi |              0 |             0 |
| http://sci-hub.mobi  |              0 |             0 |
| https://sci-hub.se   |              0 |             0 |
| http://sci-hub.se    |              0 |             0 |
| http://sci-hub.st    |              0 |             0 |
+----------------------+----------------+---------------+
➜  temp scidownl download --doi 10.1016/j.vascn.2018.01.499 --out test.pdf
[INFO] | 2023/12/11 16:22:19 | Run scihub tasks. Tasks information:
[INFO] | 2023/12/11 16:22:19 |          DOI(s): ['10.1016/j.vascn.2018.01.499']
[INFO] | 2023/12/11 16:22:19 |          Output: test.pdf
[INFO] | 2023/12/11 16:22:19 |      SciHub Url: <auto.availability_first>
[INFO] | 2023/12/11 16:22:19 | Choose scihub url [0]: http://sci-hub.ru
[INFO] | 2023/12/11 16:22:19 | <- Request: scihub_url=http://sci-hub.ru, source=DoiSource[type=doi, id=10.1016/j.vascn.2018.01.499], proxies={}
[INFO] | 2023/12/11 16:22:21 | -> Response: status_code=200, content_length=7786
[INFO] | 2023/12/11 16:22:21 | * Extracted information: {'url': 'http://sci-hub.ru/tree/39/25/3925104e34c91e178a498a5ed59f4dba.pdf', 'title': 'Echocardiography and contractility indices simultaneously evaluated in telemetered beagle dogs  A HESI sponsored cross company evaluation. Journal of Pharmacological and Toxicological Methods, 93, 15'}
100% [==================================================] 70282/70282
[INFO] | 2023/12/11 16:22:22 | ↓ Successfully download the url to: test.pdf
[INFO] | 2023/12/11 16:22:22 | Choose scihub url [0]: http://sci-hub.ru
[INFO] | 2023/12/11 16:22:22 | <- Request: scihub_url=http://sci-hub.ru, source=DoiSource[type=doi, id=10.1016/j.vascn.2018.01.499], proxies={}
[INFO] | 2023/12/11 16:22:23 | -> Response: status_code=200, content_length=7786
[INFO] | 2023/12/11 16:22:23 | * Extracted information: {'url': 'http://sci-hub.ru/tree/39/25/3925104e34c91e178a498a5ed59f4dba.pdf', 'title': 'Echocardiography and contractility indices simultaneously evaluated in telemetered beagle dogs  A HESI sponsored cross company evaluation. Journal of Pharmacological and Toxicological Methods, 93, 15'}
100% [==================================================] 70282/70282
[INFO] | 2023/12/11 16:22:24 | ↓ Successfully download the url to: test.pdf
➜  temp scidownl domain.list
+----------------------+----------------+---------------+
| Url                  |   SuccessTimes |   FailedTimes |
|----------------------+----------------+---------------|
| http://sci-hub.ru    |              6 |             0 |
| https://sci-hub.st   |              0 |             0 |
| https://sci-hub.ru   |              0 |             0 |
| https://sci-hub.mobi |              0 |             0 |
| http://sci-hub.mobi  |              0 |             0 |
| https://sci-hub.se   |              0 |             0 |
| http://sci-hub.se    |              0 |             0 |
| http://sci-hub.st    |              0 |             0 |
+----------------------+----------------+---------------+

Is there any character limit for file name downloaded by this package?

I am trying to download a paper with doi : 10.1021/nn100856y and title is, "Preparation and Characterization of Flexible Asymmetric Supercapacitors Based on Transition-Metal-Oxide Nanowire Single-Walled Carbon Nanotube Hybrid Thin-Film Electrodes".

then I get this,

[INFO] Choose the available link 3: https://sci-hub.se
[INFO] PDF url ->
http://dacemirror.sci-hub.se/journal-article/ed7b231d4f4214bbd4c58ccf9ee781f5/chen2010.pdf
[INFO] Article title ->
Preparation and Characterization of Flexible Asymmetric Supercapacitors Based on Transition-Metal-Oxide Nanowire Single-Walled Carbon Nanotube Hybrid Thin-Film Electrodes
[INFO] Verifying...
[INFO] Verification success.

FileNotFoundError Traceback (most recent call last)
in
10 for doi in DOIs:
11 print(doi)
---> 12 SciHub(doi, out).download(choose_scihub_url_index=3)

C:\ProgramData\Anaconda3\lib\site-packages\scidownl\scihub.py in download(self, choose_scihub_url_index)
88 pdf = self.find_pdf_in_html(res.text)
89
---> 90 self.download_pdf(pdf)
91 # try:
92 # pdf = self.find_pdf_in_html(res.text)

C:\ProgramData\Anaconda3\lib\site-packages\scidownl\scihub.py in download_pdf(self, pdf)
168 out_file_path = os.path.join(self.out, pdf['title']+'.pdf')
169 downl_size = 0
--> 170 with open(out_file_path, 'wb') as f:
171 for data in res.iter_content(chunk_size=1024, decode_unicode=False):
172 f.write(data)

FileNotFoundError: [Errno 2] No such file or directory: 'paper\Preparation and Characterization of Flexible Asymmetric Supercapacitors Based on Transition-Metal-Oxide Nanowire Single-Walled Carbon Nanotube Hybrid Thin-Film Electrodes.pdf'

Download Through Title or other Input Source?

Hi, I've been trying the tool in multiple ways now, and have been finding it quite interesting. I was curious to know if its possible to crawl papers based on their titles , because there are some older papers(older than the year 2000) ,which do not have an associated dois linked with them on scopus.But they are highly cited papers of those journals , and said papers can be found on Scihub through the title search. Was wondering if similiar thing can be achieved through the CLI tool

Was everything ok until this error message Index out of range

Hi there. I use the script for a few times, until it's stop working and get me this message Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.7/bin/scidownl", line 11, in <module> sys.exit(main()) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scidownl/scidownl.py", line 27, in main sci.download(choose_scihub_url_index=SCIHUB_URL_INDEX) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scidownl/scihub.py", line 76, in download self.use_scihub_url(choose_scihub_url_index) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scidownl/scihub.py", line 38, in use_scihub_url self.scihub_url = self.scihub_url_list[index] IndexError: list index out of range

I reinstalled and nothing. Any ideas?

The other thing is. Is possible to give the script a list of DOI and do it with one step? I have several and I was doing one by one.

Thank you so much! I'm super noob at this and this is the only tool that works from a few that I tried.

How to get the download link?

Thank you for this awesome tool, I am trying to get the download URL rather than saving the pdf to a folder.

from scidownl import scihub_download
paper = '32467232'
paper_type = "pmid"
out = "./paper/one_paper.pdf"
scihub_download(paper, paper_type=paper_type, out=out)

How to get the download link here?

-bash: syntax error near unexpected token `('

Hi there,

I can run everything smooth, but unfortunately get the below error msg for DOIS with (

scidownl download --doi 10.1016/0003-3472(95)80204-5

-bash: syntax error near unexpected token `('

Is there any way to overcome this issue? I can of course filter the DOIS with ( and download them manually, but if there is a way to download this type of DOIs would be great!

Many thanks!
Nicolay

cannot download pdf

hello,thank you for your project.i want to ask my pdf can be download,but it's size only 1 KB
image

I want to ask what cause of it?thank you.

multiple download issue

I am confusing about that if I try to download with a list , what kind of type is support?

crawling failed

Hello,
It appears that the crawling failed when trying to download the DOI. However, the SciHub.st was able to download the DOI without any problem. It is likely that the issue is related to the crawling process, as opposed to the DOI itself.
Here is the specific DOI: 10.1097/WAD.0000000000000000
1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.