skyme5 / magzdb Goto Github PK

View Code? Open in Web Editor NEW

48.0 3.0 10.0 211 KB

magzdb.org Downloader

License: MIT License

Makefile 9.61% Python 89.17% Dockerfile 1.22%

magzdb magazine magazines downloader

magzdb's Introduction

magzdb - magzdb.org Downloader

Installation

Install using pip

$ pip install -U magzdb

Usage

usage: magzdb [-h] [-V] -i MAGAZINE_ID [-e [EDITION [EDITION ...]]]
              [-f FILTER] [-l] [-P DIRECTORY_PREFIX] [--downloader DOWNLOADER]
              [--debug]

Magzdb.org Downloader

required arguments:
  -i MAGAZINE_ID, --id MAGAZINE_ID
                        ID of the Magazine to Download. eg. http://magzdb.org/j/<ID>.

optional arguments:
  -h, --help            show this help message and exit
  -V, --version         Print program version and exit
  -e [EDITION [EDITION ...]], --editions [EDITION [EDITION ...]]
                        Select Edition
  -f FILTER, --filter FILTER
                        Use filter. See README#Filters
  -l, --latest          Download only latest edition.
  -P DIRECTORY_PREFIX, --directory-prefix DIRECTORY_PREFIX
                        Download directory.
  --downloader DOWNLOADER
                        Use External downloader (RECOMMENDED). Currently supported: aria2, wget, curl
  --debug               Print debug information.
  --skip-download       Don't download files.

Usage Examples

Docker

docker build . -t magzdb
docker run -v $(PWD):/tmp magzdb -h

# Add alias to shell
alias magzdb="docker run -v $(PWD):/tmp magzdb"
magzdb -h

Download all editions

$ magzdb -i 1826

Filters

You can supply filter using -f, for example to download issues between 4063895 and 4063901, you can write as

$ magzdb -i 1826 -f "eid > 4063895 and eid < 4063901"

You can use eid, year in the filter expression.

More examples of filter expression

eid > 4063895 and eid < 4063901 or eid >= 4063895 and eid <= 4063901
eid >= 4063895 or eid != 4063895
year >= 2018, year <= 2018, year == 2018 or even year != 2018

Download only latest edition

$ magzdb -i 1826 -l

Download only latest edition with custom location `magazine`

$ magzdb -i 1826 -l -P magazine

Use external downloader

$ magzdb -i 1826 -l -P magazine --downloader wget

This is recommended since internal downloader does not support resuming interrupted downloads.

Python Installation Recommendation

If you don't want to install official Python to your system (global). You can install pyenv installer environment under your specific account. It's prefered method for macOS users, because High Sierra and later macOS ships with old Python 2.7.10.

Contributing

Found a bug or missing a feature you are more than welcome to contribute.

License

MIT

magzdb's People

Contributors

Stargazers

Watchers

Forkers

elmergonzalezb av-98-ingram coool mayurnix akash-gajjar larseberhart model-map jjrh bigrusboss thehaven

magzdb's Issues

Is a range of edition possible?

magzdb version: 1.0.4
Python version: 3.7.3
Operating System: Debian 10.5

Description

Is it possible to download a range of editions, for example all between the years 2000 and 2020?

What I Did

For example,
magzdb -i 1234 -e 100-200, did not work.

Possible to include other sources? https://magazinelib.com?

Just curious if it is possible to adapt the script to pull from other magazine resources, like https://magazinelib.com for example? It looks slightly more complicated (I'm not a programmer though) as magazines and releases are not organized by an ID number. That said, there are many magazines there that are not available on Magzdb and it would be a great addition!

Just a thought....

Cheers!

Downloads aren't working instantly skips to next file

magzdb version: 1.1.8
Python version: 3.10
Operating System: Win 11

Description

Trying to do a download
magzdb -i 1798
Gets back the correct list of files to download. But it instantly says each files was downloaded, Parses through list instantly but folder is empty of files, no actual download occurs.

Retrieve correct filenames

magzdb version: 1.1.6
Python version: 3.9.2
Operating System: Debian Current Latest

Description

At Компьютерра 2009 № 47-48 (811-812) have Download link and saves it as [Компьютерра _811-812 2009-47-48] - (2009) - libgen.li.pdf, but magzdb saves it as get.phpdirecttruemd5fd5635dd40e1a4feb490db8bac3e384b.
Can you fix it?

Thanks.

Add docker to Unraid community application

First of all, thanks for the development. This is great!!!

Would be great to see native support for Unraid by adding the docker container to Unraid's application.

Currently, it can installed via dockerhub, but think it would see wider use if added as community application.

Download URL not found when multiple download links exist

magzdb version: 1.1.5
Python version: 3.9.7
Operating System: Ubuntu 21.10

Description

I'm trying to download an edition of scientific american from 1866. This issue is available and has two download links. The first link is dead/ goes to a page that says no resource. The second download link works, but magzdb doesn't see it.

http://magzdb.org/num/3694138

Format: pdf (12.62 megabytes / 13230323 b.) Download from [freelibrary.lib] Scan
author: no
Note: no
http://magzdb.org/file/262482/dl (The file is not available on this resource.)

Format: pdf (12.62 megabytes / 13230323 b.) Download from [file.magzdb.org] Scan
author: no
Note: no
http://magzdb.org/file/463968/dl
http://file.magzdb.org/ul/2490/1866/Scientific%20American%201866%20v015%20n16.pdf

magzdb -i 2490 -e 3694138 --downloader wget
2021-12-06 13:31:07.510 | INFO     | magzdb.magzdb:download:206 - Found 1 editions of Scientific_American
2021-12-06 13:31:07.510 | INFO     | magzdb.magzdb:download:211 - Downloading year 1866 id 3694138
2021-12-06 13:31:08.259 | ERROR    | magzdb.magzdb:download:226 - Download Url not found for http://magzdb.org/num/3694138/dl

Automating script for magazine download

Really love the script. A few ideas how this can be automated:

Have a folder where I can place magazine ID (as file without extension). This can be used to run the script automatically for all magazine IDs placed in this folder
Rename and tag files after download
Check if magazine issue already downloaded and not download again if exists

Issue not downloading even when file is there

magzdb version: latest
Python version: latest
Operating System: Win 10

Description

Similar issue as #67 just with Rock & Mineral (http://magzdb.org/j/1680). There is 248 issues and it only grabs 111 of them. the others are 0 byte pdfs. I can download these fine via the site.

external downloaders not displaying useful download information

magzdb version: v1.1.6
Python version: 3.10
Operating System: Windows 10

Description

Output no longer showing useful downloading information as described in #67 here by Adam-0-Moore

Download exits (seemingly before complete)

magzdb version: 1.1.1
Python version: 3.7.3
Operating System: Debian 10.5

Description

Magzdb quits downloading and exits with the following,

Traceback (most recent call last):
  File "/home/user01/.local/bin/magzdb", line 10, in <module>
    sys.exit(main())
  File "/home/user01/.local/lib/python3.7/site-packages/magzdb/cli.py", line 87, in main
    filter=args.filter,
  File "/home/user01/.local/lib/python3.7/site-packages/magzdb/magzdb.py", line 214, in download
    r'''<a\s*href\=\.\.\/file\/(?P<id>\d+)/dl>''',
AttributeError: 'NoneType' object has no attribute 'group'

What I Did

I've had this error several times, but in this instance I ran the following,
/home/user01/.local/bin/magzdb -i 1654 -f "year >= 2009"

Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.

custom output filename

Description

Currently filenames are used as provided from the magzdb.org, it would be nice to provide ability for custom filenames with attributes such eid, year, issue

wget exiting with error - not creating folder?

magzdb version: 1.1.2
Python version: 3.7.3
Operating System: Debian 10.5

Description

When running with --downloader wget, magzdb exits with error. It seems to be that wget isn't creating the correct folders.

What I Did

I ran the following,

user01@s107487:~/books/temp_magazines$ /home/user01/.local/bin/magzdb -i 1593 -e 4056530 --downloader wget --debug
2020-09-25 06:01:11.903 | INFO     | magzdb.magzdb:download:199 - Found 1 editions of Everyday_practical_electronics
2020-09-25 06:01:11.903 | INFO     | magzdb.magzdb:download:204 - Downloading year 2020 issue 8
2020-09-25 06:01:11.904 | DEBUG    | magzdb.magzdb:_print:52 - Issue ID: 4056530
2020-09-25 06:01:11.988 | DEBUG    | magzdb.magzdb:_print:52 - Download Link ID: 675386
2020-09-25 06:01:12.084 | DEBUG    | magzdb.magzdb:_print:52 - Download URL: http://file.magzdb.org/ul/1593/Practical Electronics - 2020-08.pdf
2020-09-25 06:01:12.085 | DEBUG    | magzdb.magzdb:_print:52 - wget -c -O "/home/user01/books/temp_magazines/Everyday_practical_electronics/Practical_Electronics_-_2020-08.pdf" "http://file.magzdb.org/ul/1593/Practical Electronics - 2020-08.pdf"
Traceback (most recent call last):
  File "/home/user01/.local/bin/magzdb", line 10, in <module>
    sys.exit(main())
  File "/home/user01/.local/lib/python3.7/site-packages/magzdb/cli.py", line 96, in main
    filter=args.filter,
  File "/home/user01/.local/lib/python3.7/site-packages/magzdb/magzdb.py", line 241, in download
    subprocess.run(command)
  File "/usr/lib/python3.7/subprocess.py", line 472, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib/python3.7/subprocess.py", line 775, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.7/subprocess.py", line 1522, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'wget -c -O "/home/user01/books/temp_magazines/Everyday_practical_electronics/Practical_Electronics_-_2020-08.pdf" "http://file.magzdb.org/ul/1593/Practical Electronics - 2020-08.pdf"': 'wget -c -O "/home/user01/books/temp_magazines/Everyday_practical_electronics/Practical_Electronics_-_2020-08.pdf" "http://file.magzdb.org/ul/1593/Practical Electronics - 2020-08.pdf"'

If I first run the command without --downloader, as expected the file downloads and I get...

user01@s107487:~/books/temp_magazines$ /home/user01/.local/bin/magzdb -i 1593 -e 4056530 --debug
2020-09-25 06:50:37.830 | WARNING  | magzdb.cli:main:81 - Use of external downloader like wget or aria2 is recommended
2020-09-25 06:50:38.245 | INFO     | magzdb.magzdb:download:199 - Found 1 editions of Everyday_practical_electronics
2020-09-25 06:50:38.245 | INFO     | magzdb.magzdb:download:204 - Downloading year 2020 issue 8
2020-09-25 06:50:38.245 | DEBUG    | magzdb.magzdb:_print:52 - Issue ID: 4056530
2020-09-25 06:50:38.328 | DEBUG    | magzdb.magzdb:_print:52 - Download Link ID: 675386
2020-09-25 06:50:38.448 | DEBUG    | magzdb.magzdb:_print:52 - Download URL: http://file.magzdb.org/ul/1593/Practical Electronics - 2020-08.pdf
2020-09-25 06:50:38.926 | DEBUG    | magzdb.magzdb:_print:52 - Downloading to /home/user01/books/temp_magazines/Everyday_practical_electronics/Practical_Electronics_-_2020-08.pdf

Without removing the file/folder created in the last step, If I then run the wget command that DEBUG printed in the first attempt,

user01@s107487:~/books/temp_magazines$ wget -c -O "/home/user01/books/temp_magazines/Everyday_practical_electronics/Practical_Electronics_-_2020-08.pdf" "http://file.magzdb.org/ul/1593/Practical Electronics - 2020-08.pdf"
--2020-09-25 06:51:41--  http://file.magzdb.org/ul/1593/Practical%20Electronics%20-%202020-08.pdf
Resolving file.magzdb.org (file.magzdb.org)... 84.39.241.145
Connecting to file.magzdb.org (file.magzdb.org)|84.39.241.145|:80... connected.
HTTP request sent, awaiting response... 416 Requested Range Not Satisfiable

    The file is already fully retrieved; nothing to do.

Again, without deleting the file/folder created and running the first command again, it exits with the same error,

user01@s107487:~/books/temp_magazines$ /home/user01/.local/bin/magzdb -i 1593 -e 4056530 --downloader wget --debug
2020-09-25 06:55:39.899 | INFO     | magzdb.magzdb:download:199 - Found 1 editions of Everyday_practical_electronics
2020-09-25 06:55:39.900 | INFO     | magzdb.magzdb:download:204 - Downloading year 2020 issue 8
2020-09-25 06:55:39.900 | DEBUG    | magzdb.magzdb:_print:52 - Issue ID: 4056530
2020-09-25 06:55:39.976 | DEBUG    | magzdb.magzdb:_print:52 - Download Link ID: 675386
2020-09-25 06:55:40.090 | DEBUG    | magzdb.magzdb:_print:52 - Download URL: http://file.magzdb.org/ul/1593/Practical Electronics - 2020-08.pdf
2020-09-25 06:55:40.090 | DEBUG    | magzdb.magzdb:_print:52 - wget -c -O "/home/user01/books/temp_magazines/Everyday_practical_electronics/Practical_Electronics_-_2020-08.pdf" "http://file.magzdb.org/ul/1593/Practical Electronics - 2020-08.pdf"
Traceback (most recent call last):
  File "/home/user01/.local/bin/magzdb", line 10, in <module>
    sys.exit(main())
  File "/home/user01/.local/lib/python3.7/site-packages/magzdb/cli.py", line 96, in main
    filter=args.filter,
  File "/home/user01/.local/lib/python3.7/site-packages/magzdb/magzdb.py", line 241, in download
    subprocess.run(command)
  File "/usr/lib/python3.7/subprocess.py", line 472, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib/python3.7/subprocess.py", line 775, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.7/subprocess.py", line 1522, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'wget -c -O "/home/user01/books/temp_magazines/Everyday_practical_electronics/Practical_Electronics_-_2020-08.pdf" "http://file.magzdb.org/ul/1593/Practical Electronics - 2020-08.pdf"': 'wget -c -O "/home/user01/books/temp_magazines/Everyday_practical_electronics/Practical_Electronics_-_2020-08.pdf" "http://file.magzdb.org/ul/1593/Practical Electronics - 2020-08.pdf"'

Found 0 editions of ____

magzdb version:1.1.4
Python version:3.7.3
Operating System:Debian 10.5

Description

This issue seems to be back again. I get 'Found Zero Editions' when running the below, but do not have any editions in the download folder. It doesn't seem to happen with all magazines though; 5423 is the only example I have at the moment

What I Did

magzdb -i 5423 -f "year >= 2018" -P /home/books/magazines --downloader wget
2020-11-07 09:24:32.888 | INFO     | magzdb.magzdb:download:202 - Found 0 editions of TIME

Download stuck

magzdb version: 1.0.3
Python version: 3.8
Operating System: Win10

Description

Download stuck after first pdf downloaded

magzdb version: 1.0.2
Python version: 3.7.3
Operating System: Debian 10.5

Description

All attempts to download return 'Found 0 editions of ____ '

What I Did

magzdb -i 4274
magzdb -i 4274 -e 4058710
magzdb -I 2703
magzdb -i 2869

--downloader aria2 not working

just ignored "--downloader aria2"

skyme5 / magzdb Goto Github PK

magzdb's Introduction

magzdb - magzdb.org Downloader

Installation

Usage

Usage Examples

Docker

Download all editions

Filters

More examples of filter expression

Download only latest edition

Download only latest edition with custom location magazine

Use external downloader

Python Installation Recommendation

Contributing

License

magzdb's People

Contributors

Stargazers

Watchers

Forkers

magzdb's Issues

Description

What I Did

Description

Description

Description

Description

Description

Description

What I Did

Description

Description

What I Did

Description

What I Did

Description

Description

What I Did

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Download only latest edition with custom location `magazine`