ferru97 / pypaperbot Goto Github PK

View Code? Open in Web Editor NEW

350.0 7.0 67.0 411 KB

PyPaperBot is a Python tool for downloading scientific papers using Google Scholar, Crossref, and SciHub.

License: MIT License

Python 100.00%

download-papers google-scholar scihub scholar crossref papers

pypaperbot's People

Contributors

Stargazers

Watchers

pypaperbot's Issues

Same bibtex keys

I ran into a situation where different articles with the same keys appear in the bibtex.bib file. For example:

@inproceedings{Hosseini_2016,
	doi = {10.1109/ism.2016.0028},
	url = {https://doi.org/10.1109%2Fism.2016.0028},
	year = 2016,
	month = {dec},
	publisher = {{IEEE}},
	author = {Mohammad Hosseini and Viswanathan Swaminathan},
	title = {Adaptive 360 {VR} Video Streaming: Divide and Conquer},
	booktitle = {2016 {IEEE} International Symposium on Multimedia ({ISM})}
}
@inproceedings{Hosseini_2016,
	doi = {10.1109/ism.2016.0093},
	url = {https://doi.org/10.1109%2Fism.2016.0093},
	year = 2016,
	month = {dec},
	publisher = {{IEEE}},
	author = {Mohammad Hosseini and Viswanathan Swaminathan},
	title = {Adaptive 360 {VR} Video Streaming Based on {MPEG}-{DASH} {SRD}},
	booktitle = {2016 {IEEE} International Symposium on Multimedia ({ISM})}

Because of this, I cannot correctly process the records using the bibtex parsing library. The library believes that the same articles are written under the same keys, although this is not the case. Is there a way to avoid giving the same keys to articles? For example, add an option that will add a sequence number or random characters to the key.

Typo

If you call it without arguments, it will tell you:

Error, provide at least one of the following arguments: --query or --file
The correct argument appears to be --doi-file (not --file)

Skip the DOI

Hello,

Thank you for your tool. It is magnificent and very useful. I just want to highlight a minor thing. When it is searching for the list of DOIs, if it can't find it, this causes an error when it comes to download it and stop the program.

Bibtex file encoding

Hi!

Is there any reason the .bib file is saved in latin-1 encoding?

PyPaperBot/PyPaperBot/Paper.py

Line 96 in ee5b502

f = open(path, "w", encoding="latin-1", errors="ignore")

Why not utf-8? Because of this, I have to change the encoding of the file before opening it.

Regex or Re

I'm encountered an error using a .txt with 12 DOIs that traces back to the regular expressions in .Paper. The Re package won't download because it has depreciated. Could you update the .Paper module to import Regex instead of Re?

Question: Is proxy also for crossref?

I can't understand whether the proxy is only for downloading papers, or also for crossref?

I kind of wish it is for both. So when use it frequently, and not get blocked.

Download error. TypeError.

Hello!

I got this error while downloading

Download 202 of 8701 -> None
Traceback (most recent call last):
  File "C:\Users\kir-m\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\kir-m\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\kir-m\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPaperBot\__main__.py", line 122, in <module>
    main()
  File "C:\Users\kir-m\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPaperBot\__main__.py", line 118, in main
    start(args.query, args.scholar_pages, dwn_dir, args.min_year , max_dwn, max_dwn_type , args.journal_filter, args.restrict, DOIs)
  File "C:\Users\kir-m\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPaperBot\__main__.py", line 45, in start
    downloadPapers(to_download, dwn_dir, num_limit)
  File "C:\Users\kir-m\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPaperBot\Downloader.py", line 62, in downloadPapers
    pdf_dir = getSaveDir(dwnl_dir, p.getFileName())
  File "C:\Users\kir-m\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPaperBot\Paper.py", line 31, in getFileName
    return re.sub('[^\w\-_\. ]', '_', self.title)+".pdf"
  File "C:\Users\kir-m\AppData\Local\Programs\Python\Python37\lib\re.py", line 192, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object

I understand that all download problems are difficult to fix. But I need to download quite a few articles. I would like to have, for example, an option or default behavior when such errors do not lead to an abnormal end but are written to the log. I think it's easy to do it by adding try-except.

Wanted to download in HTML format

Hi,

Thank you so much for this nice tool. I wanted to download papers in HTML format, how can I use it for such a purpose?

Thanks.

Error 'Paper' object has no attribute 'sc_year'

Hello and thanks for making this tool. So I encountered an error while trying to download a paper, here is the output

$ python -m PyPaperBot --query="Machine Learning" --scholar-pages=1 --min-year=2020 --dwn-dir="~/current"             
PyPaperBot is a Python tool for downloading scientific papers using Google Scholar, Crossref and SciHub.

Query: Machine Learning

Google Scholar page 1 : 5 papers found
Searching paper 1 of 5 on Crossref...
Searching paper 2 of 5 on Crossref...
Searching paper 3 of 5 on Crossref...
Searching paper 4 of 5 on Crossref...
Searching paper 5 of 5 on Crossref...
Papers found on Crossref: 4/5

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/hskalin/.local/lib/python3.8/site-packages/PyPaperBot/__main__.py", line 122, in <module>
    main()
  File "/home/hskalin/.local/lib/python3.8/site-packages/PyPaperBot/__main__.py", line 118, in main
    start(args.query, args.scholar_pages, dwn_dir, args.min_year , max_dwn, max_dwn_type , args.journal_filter, args.restrict, DOIs)
  File "/home/hskalin/.local/lib/python3.8/site-packages/PyPaperBot/__main__.py", line 37, in start
    to_download = filter_min_date(to_download,min_date)  
  File "/home/hskalin/.local/lib/python3.8/site-packages/PyPaperBot/PapersFilters.py", line 50, in filter_min_date
    if paper.sc_year!=None and int(paper.sc_year)>=min_year:
AttributeError: 'Paper' object has no attribute 'sc_year'

So what might be causing this?

Add Heroku Support

Please add heroku support so that we can deploy it on heroku and use it on telegram

Automatically detect a Sci-Hub working link

Use some service/website to automatically detect a Sci-Hub working link

Skip the download files

Hi，

Thanks for your excellent tools for paper download.

Could you add one function that can skip already download papers in the folder?

Best wishes

how to provide google scholar advanced search string?

Hi thanks for this Nice package.

I was wondering how can I provide an google scholar advanced search string?
I would like something like: --query="string 1 "string 2 that is an exact phrase" "

Also how do I add, --max-year like --min-year so that i can search for a limit my search to a time window between [min-year, max-year]

TIA.

scholar pages argument

The package always detect and download files upto 10 pages.

Can Script save the PDF URL address in the results spreadsheet

Hi, this script is great. Although I need to generate a list of active URLs where people can access the PDFs rather than just downloading the PDFs locally to my computer. Can we add a new parameter that would simple copy/paste the URL to the PDF that the script already knows and uses to the spreadsheet output. Thanks.

Please help Google Scholar: --query does not work

Good morning,

I am trying to download a pdf of a science paper with this code line:

!python -m PyPaperBot --query="10.1038/s41598-023-43091-0" --scholar-pages=2 --dwn-dir="path/to/download/dir"

The query string is a DOI and if you search for it in Google Scholar it does find only one paper (which is the one I am searching for).
Unfortunately, the code line gives me this error:

_Query: 10.1038/s41598-023-43091-0

Google Scholar page 1 : 10 papers found
Paper not found...

Google Scholar page 2 : 10 papers found
Paper not found...

Work completed!_

I tried to use the title of the paper, but it does not work. I tried the URL, but again it returns an error.
How can I fix it? Can you help me please?

I am using Colab right now, Python 3.10 and I would like to use Google Scholar option and not Scihub.

Thank you so much in advance!

Matteo

The papers still aren't downloading

I have successfully installed all dependencies, ensured correct configuration settings, and the application runs without any immediate errors. However, the papers still aren't downloading.
result.csv

Downloading papers from DOIs

Searching paper 1 of 13 with DOI 10.1108/IJIS-01-2021-0022
Python 3
Searching paper 2 of 13 with DOI 10.3390/app11219816
Python 3
Searching paper 3 of 13 with DOI 10.1007/s10457-017-0145-y
Python 3
Searching paper 4 of 13 with DOI 10.1016/j.deveng.2018.07.001
Python 3
Searching paper 5 of 13 with DOI 10.12775/EQ.2017.006
Python 3
Searching paper 6 of 13 with DOI 10.1016/j.aquaculture.2016.05.012
Python 3
Searching paper 7 of 13 with DOI 10.1080/14754835.2013.754293
Python 3
Searching paper 8 of 13 with DOI 10.4113/jom.2010.1086
Python 3
Searching paper 9 of 13 with DOI 10.1016/j.foodpol.2006.05.005
Python 3
Searching paper 10 of 13 with DOI 10.1142/9789812703040_0140
Python 3
Searching paper 11 of 13 with DOI 10.1038/s41598-023-33042-0
Python 3
Searching paper 12 of 13 with DOI 10.1016/j.still.2023.105744
Python 3
Searching paper 13 of 13 with DOI 10.1016/j.ecoinf.2023.102075
Python 3

Using https://sci-hub.shop as Sci-Hub instance
Download 1 of 13 -> The intertwined relationship of shadow banking and commercial banks’ deposit growth: evidence from India
Download 2 of 13 -> A Novel Approach in Prediction of Crop Production Using Recurrent Cuckoo Search Optimization Neural Networks
Download 3 of 13 -> FAO guidelines and geospatial application for agroforestry suitability mapping: case study of Ranchi, Jharkhand state of India
Download 4 of 13 -> Sustainable development as successful technology transfer: Empowerment through teaching, learning, and using digital participatory mapping techniques in Mazvihwa, Zimbabwe
Download 5 of 13 -> Land Evaluation in terms of Agroforestry Suitability, an Approach to Improve Livelihood and Reduce Poverty: A FAO based Methodology by Geospatial Solution: A case study of Palamu district, Jharkhand, India
Download 6 of 13 -> Hierarchical clustering and partitioning to characterize shrimp grow-out farms in northeast Brazil
Download 7 of 13 -> Fictions of Humanitarian Responsibility: Narrating Microfinance
Download 8 of 13 -> Roads to Participatory Planning: Integrating Cognitive Mapping and GIS for Transport Prioritization in Rural Lesotho
Download 9 of 13 -> Growth options and poverty reduction in Ethiopia – An economy-wide model analysis
Download 10 of 13 -> An Integrated Approach of Remote Sensing and GIS to Poverty Alleviation and Coastal Development in Cox’s Bazar, Bangladesh
Download 11 of 13 -> Towards reducing chemical usage for weed control in agriculture using UAS imagery analysis and computer vision techniques
Download 12 of 13 -> Delineation and optimization of cotton farmland management zone based on time series of soil-crop properties at landscape scale in south Xinjiang, China
Download 13 of 13 -> Machine learning-based spatial-temporal assessment and change transition analysis of wetlands: An application of Google Earth Engine in Sylhet, Bangladesh (1985–2022)

Work completed!
If you like this project, you can offer me a cup of coffee at --> https://www.paypal.com/paypalme/ferru97 <-- :)

Is someone else facing this issue? Am I missing some step? Should we explicitly add api keys to some page?

Add parameter --scholar_start

--scholar_start is used to choose the starting page from witch start the search on scholar

Problem to fetch 100 PDFS using this package

Hello.

I am trying to download 100 pdfs using dois using pyPaperBot. But only 41 gets downloaded and i get this error.

Here are the error messages. It finished download at number 40 and then printed this : TypeError: expected string or bytes-like object. detailed error below.
Thanks a lot in advance.

Download 40 of 100 -> Biochar decreased rhizodeposits stabilization via opposite effects on bacteria and fungi: diminished fungi-promoted aggregation and enhanced bacterial mineralization
Download 41 of 100 -> None
Traceback (most recent call last):
File "/mnt/home/bandopad/miniconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/mnt/home/bandopad/miniconda3/lib/python3.7/runpy.py", line 85, in run_code
exec(code, run_globals)
File "/mnt/ufs18/rs-033/ShadeLab/WorkingSpace/Bandopadhyay_WorkingSpace/metaanalysis_doi/environment/lib/python3.7/site-packages/PyPaperBot/main.py", line 122, in
main()
File "/mnt/ufs18/rs-033/ShadeLab/WorkingSpace/Bandopadhyay_WorkingSpace/metaanalysis_doi/environment/lib/python3.7/site-packages/PyPaperBot/main.py", line 118, in main
start(args.query, args.scholar_pages, dwn_dir, args.min_year , max_dwn, max_dwn_type , args.journal_filter, args.restrict, DOIs)
File "/mnt/ufs18/rs-033/ShadeLab/WorkingSpace/Bandopadhyay_WorkingSpace/metaanalysis_doi/environment/lib/python3.7/site-packages/PyPaperBot/main.py", line 45, in start
downloadPapers(to_download, dwn_dir, num_limit)
File "/mnt/ufs18/rs-033/ShadeLab/WorkingSpace/Bandopadhyay_WorkingSpace/metaanalysis_doi/environment/lib/python3.7/site-packages/PyPaperBot/Downloader.py", line 62, in downloadPapers
pdf_dir = getSaveDir(dwnl_dir, p.getFileName())
File "/mnt/ufs18/rs-033/ShadeLab/WorkingSpace/Bandopadhyay_WorkingSpace/metaanalysis_doi/environment/lib/python3.7/site-packages/PyPaperBot/Paper.py", line 31, in getFileName
return re.sub('[^\w\-_\. ]', '', self.title)+".pdf"
File "/mnt/home/bandopad/miniconda3/lib/python3.7/re.py", line 192, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
(environment) (base) -bash-4.2$

Why not add year filtering to the query itself

The script now searches for all articles regardless of the year, and then filters them if the --min-year option is specified. Because of this, much fewer articles are downloaded from one page than they actually are. To get around this, I use a trick like this:

python -m PyPaperBot --query="stereoscopic&as_ylo=2010" --scholar-pages=10 --dwn-dir="./"

It would be cool to set the as_ylo option inside the script itself

Is this abandoned?

Sounds like users have having similar problems with downloading and there aren't many updates.

Cannot download an article

Good time of a day! While I try to download an article, it just creates bibtex and csv file.

python -m PyPaperBot --doi=":10.4304/jetwi.2.3.258-268" --dwn-dir="/home/___/Desktop/Thesis/Experiment"

Downloading papers from DOIs

Searching paper 1 of 1 with DOI :10.4304/jetwi.2.3.258-268
Python 3

Using https://sci-hub.ee as Sci-Hub instance
Download 1 of 1 -> A Survey of Text Summarization Extractive Techniques

Work completed!

Duplicate page after changing IP

Hello!

Is there any reason for this line to existing? Because of it, after changing the IP, the previous page is downloaded again.

PyPaperBot/PyPaperBot/Scholar.py

Line 33 in a380ee0

i -= 1

Bug download multiple times the same article

When using the --query option, each article is downloaded 10 times and someone is skipped

Download Issue

Apologies to not getting back to you sooner.

I switched to a different computer and managed to get two separate downloads to my specified directory. Only thing is, I now have two instances of a 'bibtex.bib' and a excel filed named 'result,' which is simply the book's information in each separate field.

Tried changing the download directory, and used a different DOI from a different article: same result, same two files. Any help would be appreciated

Restrict mode download the papers anyway

Restrict mode 0 download the papers anyway

Download Error

Hello, I've been attempting to download books via the --doi command, but after inputting the relevant information for the DOI number and correct download dir, I get a

FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/XYZ/Downloads`/result.csv'

Any help would be appreciated, thanks

Earlier fix has broken current release＼(º □ º l|l)/

As a result of the fix in #45
Executing commands with or without --max-dwn-cites=10

!python -m PyPaperBot --query="Machine learning" --scholar-pages=1  --min-year=2018 --max-dwn-cites=10 --dwn-dir="\content\papers" --scihub-mirror="https://sci-hub.do"

Now results in

PyPaperBot is a Python tool for downloading scientific papers using Google Scholar, Crossref and SciHub.
If you like this project, you can give me a cup of coffee at --> https://www.paypal.com/paypalme/ferru97 <-- :)

Query: Machine learning

Google Scholar page 1 : 10 papers found
Paper not found...
Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.7/dist-packages/PyPaperBot/__main__.py", line 148, in <module>
    main()
  File "/usr/local/lib/python3.7/dist-packages/PyPaperBot/__main__.py", line 145, in main
    start(args.query, args.scholar_results, scholar_pages, dwn_dir, proxy, args.min_year , max_dwn, max_dwn_type , args.journal_filter, args.restrict, DOIs, args.scihub_mirror)
  File "/usr/local/lib/python3.7/dist-packages/PyPaperBot/__main__.py", line 48, in start
    Paper.generateReport(to_download,dwn_dir+"result.csv")
  File "/usr/local/lib/python3.7/dist-packages/PyPaperBot/Paper.py", line 65, in generateReport
    with open(path, mode="w", encoding='utf-8', newline='', buffering=1) as w_file:
FileNotFoundError: [Errno 2] No such file or directory: '/content/papers/result.csv'

Multithreading/parallelize?

Hi,
Your software works great, but it is a little bit slow when searching for queries on google scholar. Is it possible to parallelize for example the search on the single pages?

UnboundLocalError: local variable 'scholar_pages' referenced before assignment

When trying to download papers using DOI I got the following error:

C:\Users\sparadis>python -m PyPaperBot --doi="10.0086/s41037-711-0132-1" --dwn-dir="C:\User\example\papers"`

PyPaperBot is a Python tool for downloading scientific papers using Google Scholar, Crossref and SciHub.
If you like this project, you can give me a cup of coffee at --> https://www.paypal.com/paypalme/ferru97 <-- :)

Traceback (most recent call last):
File "C:\Users\sparadis\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
return run_code(code, main_globals, None,
File "C:\Users\sparadis\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\sparadis\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPaperBot_main.py", line 139, in
main()
File "C:\Users\sparadis\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPaperBot_main.py", line 136, in main
start(args.query, scholar_pages, dwn_dir, args.min_year , max_dwn, max_dwn_type , args.journal_filter, args.restrict, DOIs, args.scihub_mirror)
UnboundLocalError: local variable 'scholar_pages' referenced before assignment

Enhancement: Feedback when CAPTCHA

The search is sometimes rate-limited but PyPaperBot's response is simply "Paper not found...". In this case, PyPaperBot should display the message returned in the HTML response (see below). Optionally, for those working on their own local network, an option could appear to open the URL in a browser to solve the CAPTCHA there.

Response:
HTML status code: 429
HTML response:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head><meta http-equiv="content-type" content="text/html; charset=utf-8"><meta name="viewport" content="initial-scale=1"><title>https://scholar.google.com/scholar?hl=en&amp;q=abc%22&amp;as_vis=1&amp;as_sdt=1,5&amp;start=380</title></head>
<body style="font-family: arial, sans-serif; background-color: #fff; color: #000; padding:20px; font-size:18px;" onload="e=document.getElementById('captcha');if(e){e.focus();} if(solveSimpleChallenge) {solveSimpleChallenge(,);}">
<div style="max-width:400px;">
<hr noshade size="1" style="color:#ccc; background-color:#ccc;"><br>
<form id="captcha-form" action="index" method="post">
<noscript>
<div style="font-size:13px;">
  In order to continue, please enable javascript on your web browser.
</div>
</noscript>
<script src="https://www.google.com/recaptcha/api.js" async defer></script>
<script>var submitCallback = function(response) {document.getElementById('captcha-form').submit();};</script>
<div id="recaptcha" class="g-recaptcha" data-sitekey="6LfwuyUTAAAAAOAmoS0fdqijC2PbbdH4kjq62Y1b" data-callback="submitCallback" data-s="KskQ5aUxKskQnKskQaho-qeu-uodlwNquodlPzUtHOt0SgxuodlK-LDBK8m5HPeJXBMS9x8m5HPeJI0J2v8m5HPeJltxo_1M0kQRb8m5HPeJbfd8pHy0kNPRa2Z_RFJpvQHAs6zrLM1aI5Lca58_waI5Lca51aI5Lca5x3IDmu1ffftae0mAEAvsm4Un_7xFpkcSr7xFpkcSFkD7xFpkcSVwXjYIOOdb_jc"></div>

<input type='hidden' name='q' value='NWU4I8GNWU4I8GIhAxOzGIhAxOZT-uGIhAxOMgFy'><input type="hidden" name="continue" value="https://scholar.google.com/scholar?hl=en&amp;q=abc%22&amp;as_vis=1&amp;as_sdt=1,5&amp;start=380">
</form>
<hr noshade size="1" style="color:#ccc; background-color:#ccc;">

<div style="font-size:13px;">
<b>About this page</b><br><br>

Our systems have detected unusual traffic from your computer network.  This page checks to see if it&#39;s really you sending the requests, and not a robot.  <a href="#" onclick="document.getElementById('infoDiv').style.display='block';">Why did this happen?</a><br><br>

<div id="infoDiv" style="display:none; background-color:#eee; padding:10px; margin:0 0 15px 0; line-height:1.4em;">
This page appears when Google automatically detects requests coming from your computer network which appear to be in violation of the <a href="//www.google.com/policies/terms/">Terms of Service</a>. The block will expire shortly after those requests stop.  In the meantime, solving the above CAPTCHA will let you continue to use our services.<br><br>This traffic may have been sent by malicious software, a browser plug-in, or a script that sends automated requests.  If you share your network connection, ask your administrator for help &mdash; a different computer using the same IP address may be responsible.  <a href="//support.google.com/websearch/answer/86640">Learn more</a><br><br>Sometimes you may be asked to solve the CAPTCHA if you are using advanced terms that robots are known to use, or sending requests very quickly.
</div>

IP address: xxx.xxx.xxx.xxx<br>Time: 2022-01-30T12:01:36Z<br>URL: https://scholar.google.com/scholar?hl=en&amp;q=abc&amp;as_vis=1&amp;as_sdt=1,5&amp;start=380<br>
</div>
</div>
</body>
</html>

Abstract

Hi, is there any opportunity to have this grab asbstracts? That would be extremely convenient and helpful. Let me know what you think?

AttributeError: 'Paper' object has no attribute 'sc_cites' and AttributeError: 'Paper' object has no attribute 'sc_year'

Hi Vito, This is a neat tool you've got here, I came across this error when I tried to use --max-dwn-cites

Environment: Google Colab

Input:

!python -m PyPaperBot --query="Machine learning" --scholar-pages=1  --min-year=2018 --max-dwn-cites=10 --dwn-dir="\content\papers" --scihub-mirror="https://sci-hub.do"

Output:

PyPaperBot is a Python tool for downloading scientific papers using Google Scholar, Crossref and SciHub.
If you like this project, you can give me a cup of coffee at --> https://www.paypal.com/paypalme/ferru97 <-- :)

Query: Machine learning

Google Scholar page 1 : 10 papers found
Searching paper 1 of 9 on Crossref...
Searching paper 2 of 9 on Crossref...
Python 3
Searching paper 3 of 9 on Crossref...
Python 3
Searching paper 4 of 9 on Crossref...
Python 3
Searching paper 5 of 9 on Crossref...
Python 3
Searching paper 6 of 9 on Crossref...
Python 3
Searching paper 7 of 9 on Crossref...
Python 3
Python 3
Searching paper 8 of 9 on Crossref...
Python 3
Searching paper 9 of 9 on Crossref...
Python 3
Papers found on Crossref: 8/9

Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.7/dist-packages/PyPaperBot/__main__.py", line 148, in <module>
    main()
  File "/usr/local/lib/python3.7/dist-packages/PyPaperBot/__main__.py", line 145, in main
    start(args.query, args.scholar_results, scholar_pages, dwn_dir, proxy, args.min_year , max_dwn, max_dwn_type , args.journal_filter, args.restrict, DOIs, args.scihub_mirror)
  File "/usr/local/lib/python3.7/dist-packages/PyPaperBot/__main__.py", line 43, in start
    to_download.sort(key=lambda x: int(x.sc_cites) if x.sc_cites!=None else 0, reverse=True)
  File "/usr/local/lib/python3.7/dist-packages/PyPaperBot/__main__.py", line 43, in <lambda>
    to_download.sort(key=lambda x: int(x.sc_cites) if x.sc_cites!=None else 0, reverse=True)
AttributeError: 'Paper' object has no attribute 'sc_cites'

Also when I tried to use --max-dwn-year

Input:

!python -m PyPaperBot --query="Machine learning" --scholar-pages=1  --min-year=2018 --max-dwn-year=10 --dwn-dir="\content\papers" --scihub-mirror="https://sci-hub.do"

Output:

PyPaperBot is a Python tool for downloading scientific papers using Google Scholar, Crossref and SciHub.
If you like this project, you can give me a cup of coffee at --> https://www.paypal.com/paypalme/ferru97 <-- :)

Query: Machine learning

Google Scholar page 1 : 10 papers found
Searching paper 1 of 9 on Crossref...
Searching paper 2 of 9 on Crossref...
Python 3
Searching paper 3 of 9 on Crossref...
Python 3
Searching paper 4 of 9 on Crossref...
Python 3
Searching paper 5 of 9 on Crossref...
Python 3
Searching paper 6 of 9 on Crossref...
Python 3
Searching paper 7 of 9 on Crossref...
Python 3
Python 3
Searching paper 8 of 9 on Crossref...
Python 3
Searching paper 9 of 9 on Crossref...
Python 3
Papers found on Crossref: 8/9

Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.7/dist-packages/PyPaperBot/__main__.py", line 148, in <module>
    main()
  File "/usr/local/lib/python3.7/dist-packages/PyPaperBot/__main__.py", line 145, in main
    start(args.query, args.scholar_results, scholar_pages, dwn_dir, proxy, args.min_year , max_dwn, max_dwn_type , args.journal_filter, args.restrict, DOIs, args.scihub_mirror)
  File "/usr/local/lib/python3.7/dist-packages/PyPaperBot/__main__.py", line 40, in start
    to_download.sort(key=lambda x: int(x.sc_year) if x.sc_year!=None else 0, reverse=True)
  File "/usr/local/lib/python3.7/dist-packages/PyPaperBot/__main__.py", line 40, in <lambda>
    to_download.sort(key=lambda x: int(x.sc_year) if x.sc_year!=None else 0, reverse=True)
AttributeError: 'Paper' object has no attribute 'sc_year'

What's the reason for this🐺

ferru97 / pypaperbot Goto Github PK

pypaperbot's People

Contributors

Stargazers

Watchers

Forkers

pypaperbot's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs