GithubHelp home page GithubHelp logo

bibcure / scihub2pdf Goto Github PK

View Code? Open in Web Editor NEW
190.0 13.0 43.0 1.32 MB

Downloads pdfs via a DOI number, article title or a bibtex file, using the database of libgen(sci-hub) , arxiv

License: GNU Affero General Public License v3.0

Python 100.00%
sci-hub doi bibtex bibtexparser science scientific-journals latex arxiv

scihub2pdf's Introduction

bibcure (Beta Version)

logo_64x64 Bibcure helps in boring tasks by keeping your bibtex file up to date and normalized.

bibcure_op

Requirements/Install

Bibcure uses the wonderful Bibtex parser. In this moment we waiting for new release of python-bibtexparser to solve some bugs.

Install it using pip:

$ sudo python /usr/bin/pip install bibcure
# or
$ sudo pip install bibcure
# or
$ sudo pip3 install bibcure  # for Python 3

You can also install from the source: git clone the repository, and install with the setup.py script.

scihub2pdf (beta)

sci_hub_64 If you want download articles via a DOI number, article title or a bibtex file, using the database of arXiv, libgen or sci-hub, see bibcure/scihub2pdf.


Features and how to use

bibcure

Given a bib file...

$ bibcure -i input.bib -o output.bib
  • check sure the arXiv items have been published, then update them (requires internet connection),

  • complete all fields(url, journal, etc) of all bib items using DOI number (requires internet connection),

  • find and create DOI number associated with each bib item which has not DOI field (requires internet connection),

  • abbreviate journals names.

arxivcheck

Given an arXiv id...

$ arxivcheck 1601.02785
  • check if has been published, and then returns the updated bib (requires internet connection).

Given a title...

$ arxivcheck --title "A useful paper with hopefully unique title published on arxiv"
  • search papers related and return a bibtex file for the first item.

You can easily append a bib into a bibfile, just do

$ arxivcheck --title "A useful paper with hopefully unique title published on arxiv" >> file.bib

You also can interact with results, just pass --ask parameter:

$ arxivcheck --ask --title "A useful paper with hopefully unique title published on arxiv"

scihub2pdf

Given a bibtex file

$ scihub2pdf -i input.bib

Given a DOI number...

$ scihub2pdf 10.1038/s41524-017-0032-0

Given an arXiv id...

$ scihub2pdf arxiv:1708.06891

Given a title...

$ scihub2bib --title "A useful paper with hopefully unique title"

or arxiv...

$ scihub2bib --title arxiv:"A useful paper with hopefully unique title"

Location folder as argument:

$ scihub2pdf -i input.bib -l somefolder/

Use libgen instead sci-hub:

$ scihub2pdf --uselibgen -i input.bib

doi2bib

Given a DOI number...

$ doi2bib 10.1038/s41524-017-0032-0
  • get bib item given a DOI (requires internet connection)

You can easily append a bib into a bibfile, just do:

$ doi2bib 10.1038/s41524-017-0032-0 >> file.bib

You also can generate a bibtex from a txt file containing a list of DOIs:

$ doi2bib --input file_with_dois.txt --output refs.bib

title2bib

Given a title...

$ title2bib "A useful paper with hopefully unique title"
  • search papers related and return a bib for the selected paper (requires internet connection)

You can easily append a bib into a bibfile, just do

$ title2bib "A useful paper with hopefully unique title" --first >> file.bib

You also can generate a bibtex from a txt file containing a list of "titles"

$ title2bib --input file_with_titles.txt --output refs.bib --first

Comparison: Sci-Hub vs LibGen

Sci-Hub

  • Stable
  • Annoying CAPTCHA
  • Fast

Libgen

  • Unstable
  • No CAPTCHA
  • Slow

License

GNU Affero General Public License v3.0. For more details, see the LICENSE file.

scihub2pdf's People

Contributors

devmessias avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scihub2pdf's Issues

Problems with the domain

It seems that my school has banned the "sci-hub.cc" domain but the "sci-hub.bz" it is still working. Is there any way to change the domain? maybe passing it as an argument or using some kind of a config file?. A search between a list of possible domains can also be useful.

Exceptions with example sci2pdf 10.1038/s41524-017-0032-0

Debian 8

jaap@jaap:/$ sci2pdf 10.1038/s41524-017-0032-0
10.1038/s41524-017-0032-0
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 595, in urlopen
chunked=chunked)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 393, in _make_request
six.raise_from(e, None)
File "", line 2, in raise_from
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 389, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib/python3.4/http/client.py", line 1172, in getresponse
response.begin()
File "/usr/lib/python3.4/http/client.py", line 351, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.4/http/client.py", line 321, in _read_status
raise BadStatusLine(line)
http.client.BadStatusLine: ''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 423, in send
timeout=timeout
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 640, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 261, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/lib/python3/dist-packages/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 595, in urlopen
chunked=chunked)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 393, in _make_request
six.raise_from(e, None)
File "", line 2, in raise_from
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 389, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib/python3.4/http/client.py", line 1172, in getresponse
response.begin()
File "/usr/lib/python3.4/http/client.py", line 351, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.4/http/client.py", line 321, in _read_status
raise BadStatusLine(line)
requests.packages.urllib3.exceptions.ProtocolError: ('Connection aborted.', BadStatusLine("''",))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/bin/sci2pdf", line 80, in
main()
File "/usr/local/bin/sci2pdf", line 68, in main
download_from_doi(value, location)
File "/usr/local/lib/python3.4/dist-packages/sci2pdf/libgen.py", line 91, in download_from_doi
bib_libgen = get_libgen_url(bib)
File "/usr/local/lib/python3.4/dist-packages/sci2pdf/libgen.py", line 22, in get_libgen_url
r = requests.get(url, params=params, headers=headers)
File "/usr/lib/python3/dist-packages/requests/api.py", line 70, in get
return request('get', url, params=params, **kwargs)
File "/usr/lib/python3/dist-packages/requests/api.py", line 56, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 475, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 596, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 473, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",))
jaap@jaap:/$

Scihub2pdf not works now! I guess the base scihub url root has changed.

image

root@ecs-6e13:~# scihub2pdf  doi:10.1016/j.patcog.2016.10.023  --uselibgen

	 Using Libgen.

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 144, in _new_conn
    (self.host, self.port), self.timeout, **extra_kw)
  File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 60, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/usr/lib/python3.6/socket.py", line 745, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -5] No address associated with hostname

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 357, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.6/http/client.py", line 1281, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1327, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1276, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1042, in _send_output
    self.send(msg)
  File "/usr/lib/python3.6/http/client.py", line 980, in send
    self.connect()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 169, in connect
    conn = self._new_conn()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 153, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fbf0d0f29e8>: Failed to establish a new connection: [Errno -5] No address associated with hostname

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 440, in send
    timeout=timeout
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 639, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 398, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='libgen.io', port=80): Max retries exceeded with url: /scimag/ads.php?doi=doi%3A10.1016%2Fj.patcog.2016.10.023&downloadname= (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fbf0d0f29e8>: Failed to establish a new connection: [Errno -5] No address associated with hostname',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/scihub2pdf", line 191, in <module>
    main()
  File "/usr/local/bin/scihub2pdf", line 148, in main
    download_from_doi(value, location, use_libgen)
  File "/usr/local/lib/python3.6/dist-packages/scihub2pdf/download.py", line 161, in download_from_doi
    download_from_libgen(doi, pdf_file)
  File "/usr/local/lib/python3.6/dist-packages/scihub2pdf/download.py", line 68, in download_from_libgen
    found, r = ScrapLib.navigate_to(doi, pdf_file)
  File "/usr/local/lib/python3.6/dist-packages/scihub2pdf/libgen.py", line 44, in navigate_to
    headers=self.headers
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 533, in get
    return self.request('GET', url, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 520, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 630, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 508, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='libgen.io', port=80): Max retries exceeded with url: /scimag/ads.php?doi=doi%3A10.1016%2Fj.patcog.2016.10.023&downloadname= (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fbf0d0f29e8>: Failed to establish a new connection: [Errno -5] No address associated with hostname',))

Download has stopped because of the captcha?

I have tried to download pdfs using the list of DOI that I have stored in the .txt file. Then, I got an issue after 3-4 pdfs are succesfully downloaded:

`DOI:  10.1016/j.telpol.2009.08.001
	Sci-Hub Link:  http://sci-hub.tw/10.1016/j.telpol.2009.08.001
	checking if has captcha...
	Download: ok

	DOI:  10.1080/0268396032000150816
	Sci-Hub Link:  http://sci-hub.tw/10.1080/0268396032000150816
	checking if has captcha...
Traceback (most recent call last):
  File "/Users/mustikarizkifitriyanti/anaconda/envs/thesis/bin/scihub2pdf", line 191, in <module>
    main()
  File "/Users/mustikarizkifitriyanti/anaconda/envs/thesis/bin/scihub2pdf", line 163, in main
    download_from_doi(value, location, use_libgen)
  File "/Users/mustikarizkifitriyanti/anaconda/envs/thesis/lib/python2.7/site-packages/scihub2pdf/download.py", line 163, in download_from_doi
    download_from_scihub(doi, pdf_file)
  File "/Users/mustikarizkifitriyanti/anaconda/envs/thesis/lib/python2.7/site-packages/scihub2pdf/download.py", line 105, in download_from_scihub
    captcha_img = ScrapSci.get_captcha_img()
  File "/Users/mustikarizkifitriyanti/anaconda/envs/thesis/lib/python2.7/site-packages/scihub2pdf/scihub.py", line 98, in get_captcha_img
    self.driver.execute_script("document.getElementById('content').style.zIndex = 9999;")
  File "/Users/mustikarizkifitriyanti/anaconda/envs/thesis/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 635, in execute_script
    'args': converted_args})['value']
  File "/Users/mustikarizkifitriyanti/anaconda/envs/thesis/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 320, in execute
    self.error_handler.check_response(response)
  File "/Users/mustikarizkifitriyanti/anaconda/envs/thesis/lib/python2.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: {"errorMessage":"null is not an object (evaluating 'document.getElementById('content').style')","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"134","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:49931","User-Agent":"selenium/3.13.0 (python mac)"},"httpVersion":"1.1","method":"POST","post":"{\"sessionId\": \"927e3730-9652-11e8-ae2e-f99d263e318f\", \"args\": [], \"script\": \"document.getElementById('content').style.zIndex = 9999;\"}","url":"/execute","urlParsed":{"anchor":"","query":"","file":"execute","directory":"/","path":"/execute","relative":"/execute","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/execute","queryKey":{},"chunks":["execute"]},"urlOriginal":"/session/927e3730-9652-11e8-ae2e-f99d263e318f/execute"}}
Screenshot: available via screen
`

I wonder maybe this happens because that specific DOI has a captcha, Does anyone can help me to solve this issue?

Direst output infos to files

Nice jobs. most articles are downloaded automaticlly, but a few cannot be found like below:
DOI: 10.1080/16742834.2014.11447220
Sci-Hub Link: https://sci-hub.se/10.1080/16742834.2014.11447220
checking if has captcha...
No pdf found. Maybe, the sci-hub dosen't have the file
Try to open the link in your browser.

Then, I'd like to grep the Sci-Hub Link only and save the Link to a.txt.
scihub2pdf --title ${article_title} > a.txt
doens't work, it will treat '${article_title} > a.txt' as a whole title.
Appreciate if you could solve it.
For example, add an option -d to direst the scihub2pdf infos to files. Thanks.

Multiple requests at same time

Libgen site says the limit the connections per user is 40, but for some reason, I' only can do ~3 requests at the same time. I think this issue is related to my code(I've not yet studied the documentation of the lib requests.py)...libgen also can limit requests/per user in a given interval of time... I didn't find any information about that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.