GithubHelp home page GithubHelp logo

crystalgit / scidownl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tishacy/scidownl

0.0 0.0 0.0 214 KB

Download pdfs from Scihub via DOI. Easy to use. Easy to deal with captcha. Easy to update Scihub newest domains.

License: MIT License

Python 100.00%

scidownl's Introduction

SciDownl

Download pdfs from Scihub via DOI.

  • Easy to use.
  • Easy to deal with captcha.
  • Easy to update Scihub newest domains.

Install

$ pip3 install -U scidownl

Usage

Command line

$ scidownl -h
usage: Command line tool to download pdf via DOI from Scihub.
       [-h] [-c CHOOSE] [-D DOI] [-o OUTPUT] [-u] [-l]

optional arguments:
  -h, --help            show this help message and exit
  -c CHOOSE, --choose CHOOSE
                        choose scihub url by index
  -D DOI, --DOI DOI     the DOI number of the paper
  -o OUTPUT, --output OUTPUT
                        directory to download the pdf
  -u, --update          update available Scihub links
  -l, --list            list current saved sichub urls

Examples

# Update available links of Scihub
$ scidownl -u
[INFO] Updating links ...
[INFO] https://sci-hub.ren
[INFO] http://sci-hub.ren
[INFO] http://sci-hub.red
[INFO] http://sci-hub.se
[INFO] https://sci-hub.se
[INFO] http://sci-hub.tw

# Choose scihub url by the index.
$ scidownl -c 5
Current scihub url: http://sci-hub.tw

# List available links of Scihub. You can see the current scihub url is pointing to the 5th scihub url.
$ scidownl -l
  [0] https://sci-hub.ren
  [1] http://sci-hub.ren
  [2] http://sci-hub.red
  [3] http://sci-hub.se
  [4] https://sci-hub.se
* [5] http://sci-hub.tw

# Download to the current directory
$ scidownl -D 10.1021/ol9910114
$ scidownl -D 10.1021/ol9910114 -o .

# Download to the specified directory, ie. '-o paper' for downloading to paper directory.
$ scidownl -D 10.1021/ol9910114 -o paper

# if 'PermessionError' shows, just use sudo. ie:
$ sudo scidownl -u

Module

If you have a list of DOIs, using scidownl in your python scripts for downloading all of the papers is recommended.

Download single paper via DOI.

from scidownl.scihub import *

DOI = "10.1021/ol9910114"
out = 'paper'
sci = SciHub(DOI, out).download(choose_scihub_url_index=3)

Dowloading a list of DOIS by simply using a for loop.

from scidownl.scihub import *

DOIs = [...]
out = 'paper'
for doi in DOIs:
  SciHub(doi, out).download(choose_scihub_url_index=3)

Update available Scihub links.

from scidownl.update_link import *

# Use crawling method to update available Scihub links.
update_link(mod='c')
# Use brute force search method to update available Scihub links.
update_link(mod='b')

RELEASE

  • v0.1.0: First release.
  • v0.2.0:
    • Optimized the download speed.
    • Optimized the captcha processment.
  • v0.2.1:
    • Applied stream download.
    • Display of download progress is added.
    • Fixed bugs of invalid scihub links.
  • v0.2.2:
    • Add new source website.
    • Add -l/--list argument in command line tool.
  • v0.2.3:
    • Fix bugs of empty filename and wrong scidhub urls.
    • Fix bugs in the brute-force method of updating scihub urls.
  • V0.2.4:
    • Fix #2.
    • Fix bugs of error: file name too long.
  • V0.2.5:
    • Reconstruct code.
    • Fix 'no content-length' error.
    • Add -c/--choose argument for manually choosing scihub url used.
  • V0.2.6:
    • Fix bug where retry time too long.
  • V0.2.7:
    • Add -b/--brute-update argument for updating scihub urls by brute-force search method.
  • V0.2.8:
    • Replace the mspider with qspider in brute-force search.

LICENSE

Copyright (c) 2019 tishacy.

Licensed under the MIT License.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.