GithubHelp home page GithubHelp logo

mikf / gallery-dl Goto Github PK

View Code? Open in Web Editor NEW
11.0K 142.0 901.0 11.32 MB

Command-line program to download image galleries and collections from several image hosting sites

License: GNU General Public License v2.0

Python 99.67% Shell 0.25% Makefile 0.05% Dockerfile 0.03%
gallery downloader pixiv danbooru deviantart tumblr flickr twitter mangadex kemono

gallery-dl's Introduction

gallery-dl

gallery-dl is a command-line program to download image galleries and collections from several image hosting sites (see Supported Sites). It is a cross-platform tool with many configuration options and powerful filenaming capabilities.

pypi build

The stable releases of gallery-dl are distributed on PyPI and can be easily installed or upgraded using pip:

python3 -m pip install -U gallery-dl

Installing the latest dev version directly from GitHub can be done with pip as well:

python3 -m pip install -U -I --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz

Note: Windows users should use py -3 instead of python3.

It is advised to use the latest version of pip, including the essential packages setuptools and wheel. To ensure these packages are up-to-date, run

python3 -m pip install --upgrade pip setuptools wheel

Prebuilt executable files with a Python interpreter and required Python packages included are available for

Executables build from the latest commit can be found at

Linux users that are using a distro that is supported by Snapd can install gallery-dl from the Snap Store:

snap install gallery-dl

Windows users that have Chocolatey installed can install gallery-dl from the Chocolatey Community Packages repository:

choco install gallery-dl

gallery-dl is also available in the Scoop "main" bucket for Windows users:

scoop install gallery-dl

For macOS or Linux users using Homebrew:

brew install gallery-dl

For macOS users with MacPorts:

sudo port install gallery-dl

Using the Dockerfile in the repository:

git clone https://github.com/mikf/gallery-dl.git
cd gallery-dl/
docker build -t gallery-dl:latest .

Pulling image from Docker Hub:

docker pull mikf123/gallery-dl
docker tag mikf123/gallery-dl gallery-dl

Pulling image from GitHub Container Registry:

docker pull ghcr.io/mikf/gallery-dl
docker tag ghcr.io/mikf/gallery-dl gallery-dl

To run the container you will probably want to attach some directories on the host so that the config file and downloads can persist across runs.

Make sure to either download the example config file reference in the repo and place it in the mounted volume location or touch an empty file there.

If you gave the container a different tag or are using podman then make sure you adjust. Run docker image ls to check the name if you are not sure.

This will remove the container after every use so you will always have a fresh environment for it to run. If you setup a ci-cd pipeline to autobuild the container you can also add a --pull=newer flag so that when you run it docker will check to see if there is a newer container and download it before running.

docker run --rm  -v $HOME/Downloads/:/gallery-dl/ -v $HOME/.config/gallery-dl/gallery-dl.conf:/etc/gallery-dl.conf -it gallery-dl:latest

You can also add an alias to your shell for "gallery-dl" or create a simple bash script and drop it somewhere in your $PATH to act as a shim for this command.

To use gallery-dl simply call it with the URLs you wish to download images from:

gallery-dl [OPTIONS]... URLS...

Use gallery-dl --help or see docs/options.md for a full list of all command-line options.

Download images; in this case from danbooru via tag search for 'bonocho':

gallery-dl "https://danbooru.donmai.us/posts?tags=bonocho"

Get the direct URL of an image from a site supporting authentication with username & password:

gallery-dl -g -u "<username>" -p "<password>" "https://twitter.com/i/web/status/604341487988576256"

Filter manga chapters by chapter number and language:

gallery-dl --chapter-filter "10 <= chapter < 20" -o "lang=fr" "https://mangadex.org/title/59793dd0-a2d8-41a2-9758-8197287a8539"
Search a remote resource for URLs and download images from them:
(URLs for which no extractor can be found will be silently ignored)
gallery-dl "r:https://pastebin.com/raw/FLwrCYsT"

If a site's address is nonstandard for its extractor, you can prefix the URL with the extractor's name to force the use of a specific extractor:

gallery-dl "tumblr:https://sometumblrblog.example"

Configuration files for gallery-dl use a JSON-based file format.

A list of all available configuration options and their descriptions can be found at https://gdl-org.github.io/docs/configuration.html.

For a default configuration file with available options set to their default values, see docs/gallery-dl.conf.
For a commented example with more involved settings and option usage, see docs/gallery-dl-example.conf.

gallery-dl searches for configuration files in the following places:

Windows:
  • %APPDATA%\gallery-dl\config.json
  • %USERPROFILE%\gallery-dl\config.json
  • %USERPROFILE%\gallery-dl.conf

(%USERPROFILE% usually refers to a user's home directory, i.e. C:\Users\<username>\)

Linux, macOS, etc.:
  • /etc/gallery-dl.conf
  • ${XDG_CONFIG_HOME}/gallery-dl/config.json
  • ${HOME}/.config/gallery-dl/config.json
  • ${HOME}/.gallery-dl.conf

When run as executable, gallery-dl will also look for a gallery-dl.conf file in the same directory as said executable.

It is possible to use more than one configuration file at a time. In this case, any values from files after the first will get merged into the already loaded settings and potentially override previous ones.

Some extractors require you to provide valid login credentials in the form of a username & password pair. This is necessary for nijie and optional for aryion, danbooru, e621, exhentai, idolcomplex, imgbb, inkbunny, mangadex, mangoxo, pillowfort, sankaku, subscribestar, tapas, tsumino, twitter, and zerochan.

You can set the necessary information in your configuration file

{
    "extractor": {
        "twitter": {
            "username": "<username>",
            "password": "<password>"
        }
    }
}

or you can provide them directly via the -u/--username and -p/--password or via the -o/--option command-line options

gallery-dl -u "<username>" -p "<password>" "URL"
gallery-dl -o "username=<username>" -o "password=<password>" "URL"

For sites where login with username & password is not possible due to CAPTCHA or similar, or has not been implemented yet, you can use the cookies from a browser login session and input them into gallery-dl.

This can be done via the cookies option in your configuration file by specifying

  • the path to a Mozilla/Netscape format cookies.txt file exported by a browser addon
    (e.g. Get cookies.txt LOCALLY for Chrome, Export Cookies for Firefox)
  • a list of name-value pairs gathered from your browser's web developer tools
    (in Chrome, in Firefox)
  • the name of a browser to extract cookies from
    (supported browsers are Chromium-based ones, Firefox, and Safari)

For example:

{
    "extractor": {
        "instagram": {
            "cookies": "$HOME/path/to/cookies.txt"
        },
        "patreon": {
            "cookies": {
                "session_id": "K1T57EKu19TR49C51CDjOJoXNQLF7VbdVOiBrC9ye0a"
            }
        },
        "twitter": {
            "cookies": ["firefox"]
        }
    }
}
You can also specify a cookies.txt file with the --cookies command-line option
or a browser to extract cookies from with --cookies-from-browser:
gallery-dl --cookies "$HOME/path/to/cookies.txt" "URL"
gallery-dl --cookies-from-browser firefox "URL"

gallery-dl supports user authentication via OAuth for some extractors. This is necessary for pixiv and optional for deviantart, flickr, reddit, smugmug, tumblr, and mastodon instances.

Linking your account to gallery-dl grants it the ability to issue requests on your account's behalf and enables it to access resources which would otherwise be unavailable to a public user.

To do so, start by invoking it with oauth:<sitename> as an argument. For example:

gallery-dl oauth:flickr

You will be sent to the site's authorization page and asked to grant read access to gallery-dl. Authorize it and you will be shown one or more "tokens", which should be added to your configuration file.

To authenticate with a mastodon instance, run gallery-dl with oauth:mastodon:<instance> as argument. For example:

gallery-dl oauth:mastodon:pawoo.net
gallery-dl oauth:mastodon:https://mastodon.social/

gallery-dl's People

Contributors

actuallykit avatar ailothaen avatar alice945 avatar balgden avatar brlin-tw avatar bug-assassin avatar chio0hai avatar closedport22 avatar delphox avatar enduser420 avatar folliehiyuki avatar giovanh avatar gmanley avatar hrxn avatar iamleot avatar jsouthgb avatar mikf avatar nifnat avatar nitrousoxide avatar prinz23 avatar saidbysolo avatar shinji257 avatar shruggzoltan avatar stephan972 avatar thatfuckingbird avatar the-blank-x avatar tobi823 avatar vrihub avatar wiiplay123 avatar wlritchi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gallery-dl's Issues

[deviantart] Download fails after 1200 items

OS: Windows 10 x64 [Version 10.0.14393]
gallery-dl: 0.6.4 standalone executable

Traceback (most recent call last):
  File "__main__.py", line 20, in <module>
  File "E:\gallery-dl\gallery_dl\__init__.py", line 163, in main
  File "E:\gallery-dl\gallery_dl\job.py", line 82, in run
  File "E:\gallery-dl\gallery_dl\job.py", line 25, in run
  File "E:\gallery-dl\gallery_dl\extractor\deviantart.py", line 36, in items
  File "E:\gallery-dl\gallery_dl\extractor\deviantart.py", line 142, in gallery_all
KeyError: 'results'

Got this error while trying to do a download test.
https://github.com/mikf/gallery-dl/blob/master/gallery_dl/extractor/deviantart.py#L142
params = {"username": username, "offset": offset}
The function gallery_all fails here, apparently relying on user credentials for accessing the DeviantArt API.

I did not provide any credentials for this test run and my gallery-dl.conf only uses "base-directory" as a custom value, everything else is default.

Is this known behaviour? Providing username and password for the DeviantArt API, otherwise it will fail?

If yes, I'm not sure I understand how it could download 1200 items then. Might not be accidental, because that doesn't look like a random number to me.

[batoto] Attribute Error - Regex mismatch

iMac:~ aaa$ gallery-dl --chapter-filter "lang == 'en'" https://bato.to/comic/_/comics/cosplay-deka-r2563 --verbose
[gallery-dl][debug] Version 1.1.2-dev
[gallery-dl][debug] Python 3.6.1 - Darwin-17.3.0-x86_64-i386-64bit
[gallery-dl][debug] requests 2.18.4
[gallery-dl][debug] urllib3 1.22
[gallery-dl][debug] Starting DownloadJob for 'https://bato.to/comic/_/comics/cosplay-deka-r2563'
[batoto][debug] Using BatotoMangaExtractor for 'https://bato.to/comic/_/comics/cosplay-deka-r2563'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): bato.to
[urllib3.connectionpool][debug] https://bato.to:443 "GET /comic/_/comics/cosplay-deka-r2563 HTTP/1.1" 200 None
[batoto][error] An unexpected error occurred: AttributeError - 'NoneType' object has no attribute 'groups'. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .
[batoto][debug] Traceback
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/gallery_dl/job.py", line 59, in run
    for msg in self.extractor:
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/gallery_dl/extractor/common.py", line 182, in items
    chapters = self.chapters(page)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/gallery_dl/extractor/batoto.py", line 105, in chapters
    self.parse_chapter_string(data)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/gallery_dl/extractor/batoto.py", line 69, in parse_chapter_string
    volume, chapter, data["chapter_minor"], title = match.groups()
AttributeError: 'NoneType' object has no attribute 'groups'

[tracker] imageboards & engines

There are numerous imageboards and I don't think that there should be separate issues for each one, so in this way it's more plausible to track changes and requests for support

  • yotsuba (4chan)
  • 8chan
  • fuuka/foolfuuka (list of relevant boards)
    • archive.4plebs.org
    • arch.b4k.co
    • archive.loveisover.me
    • archive.nyafuu.org
    • archive.rebeccablacktech.com
    • archive.whatisthisimnotgoodwithcomputers.com
    • archive.yeet.net
    • archived.moe
    • archiveofsins.com
    • boards.fireden.net
    • desuarchive.org
    • thebarchive.com
    • warosu.org
  • futaba (futaba channel)

Visual indicator for multiple URLs progress

So it can be easier to keep track for how much links are processed already. Can be good for -i option.

Maybe something like that:

[N/XXX] input url

...

[N+1/XXX] input url

...

Or like that, always in the last line:

...

...

[N/XXX] input url

If it could be done, it should play along with -g option, ideally - use stderr instead of stdout.

how it works, please help

hi, dear developer.. i can't get to how use it on windows (8 64), trying to start *.exe cmd show for a sec and dissapear.. ( cant do anything help me please

Image corruption / distortion / artifacting

edit: This could be considered an issue that the user is responsible for managing.

There's a possibility this issue is irrelevant to this module / program. After downloading a comic, I run a script that uses zipfile module to zip each image from each chapter into its own zip file separated by chapters.

The possible issue with gallery-dl is:

  • I may send a KeyBoardInterupt or kill the process running the script to stop the downloading of a comic, then resume the same comic later.
  • I think I may have turned on or off my VPN mid download which would be like disabling my internet connection mid download.

I've had similar issues w/ other image downloading scripts.

image corruption e.g.s:

edit: to confirm it's not the source that is corrupt (http://www.mangareader.net/feng-shen-ji/26/13)

feng shen ji_c026_013
feng shen ji_c022_027

[imagefap] Handle broken galleries

Just randomly tried some galleries, and got this:

* .\gallery-dl\imagefap\5486966 Selfies 6\imagefap_5486966_1667987992.jpg
* .\gallery-dl\imagefap\5486966 Selfies 6\imagefap_5486966_1669902986.jpg
* .\gallery-dl\imagefap\5486966 Selfies 6\imagefap_5486966_1688219690.jpg
* .\gallery-dl\imagefap\5486966 Selfies 6\imagefap_5486966_1690322866.jpg
* .\gallery-dl\imagefap\5486966 Selfies 6\imagefap_5486966_1695954932.jpg
* .\gallery-dl\imagefap\5486966 Selfies 6\imagefap_5486966_1697168893.jpg
* .\gallery-dl\imagefap\5486966 Selfies 6\imagefap_5486966_1709152554.jpg
* .\gallery-dl\imagefap\5486966 Selfies 6\imagefap_5486966_1714237566.jpg
* .\gallery-dl\imagefap\5486966 Selfies 6\imagefap_5486966_1721171723.jpg
* .\gallery-dl\imagefap\5486966 Selfies 6\imagefap_5486966_1737165840_1.jpg
Traceback (most recent call last):
  File "__main__.py", line 20, in <module>
  File "E:\asd\gallery_dl\__init__.py", line 181, in main
  File "E:\asd\gallery_dl\job.py", line 113, in run
  File "E:\asd\gallery_dl\job.py", line 40, in run
  File "E:\asd\gallery_dl\extractor\imagefap.py", line 41, in items
  File "E:\asd\gallery_dl\extractor\imagefap.py", line 68, in get_images
AttributeError: 'NoneType' object has no attribute 'rsplit'

E:\

I looked into it a bit, and I think the problem that's causing this error is galleries which are no longer complete, i.e. report a wrong number of items.

The gallery used here, as an example: http://www.imagefap.com/gallery/5486966

The gallery overview page states that this gallery is supposed to have 60 images.
But I counted them, it actually are only 59 images if browsing through this gallery.

Last File standing download option

Hello there)Could you add an additional console parameter to the program that would do the following:

  1. The program accesses the directory and checks whether the pictures of this author have already been downloaded
  2. Read the last name of the downloaded file
  3. Got a list of links to the entire gallery from the site
  4. Cut off all links to old pictures before the given name
  5. Downloaded all new
    This would really make life easier, because with the available functionality it is difficult to navigate through each link to the gallery and manually read the files and specify the numbers from where to download(Its really hard when u have 50+ downloaded galleries and u want them to be up-to-date every week in automate mode). And so - indicated the link + this parameter and he himself will do everything on the machine.
    This can also be implemented as a record of the source data of the last session in a log.log file in each gallery folder, for example:
  • Hentai-Foundry/com/user/gallery/123pony321
  • File_Count = 213
  • Last_Down_File_MD5 = df6sd6fs6ac5xc5zx67zxv(for example :D)
  • File_Name = Big_Balls_Playing.jpg

Cuz gallery-dl now has only one option, that is related for this one - --images, but it still not working as I want.
P.S. Sorry for my english speach - its really fcking badly ._.

Feature: Image Selection / Filtering

Selecting only a certain type/volume/etc of manga chapters has been asked about a couple of times (#25, #11 (comment)) and using metadata, which is usually only used to build filenames, to select/filter images or image-groups seems like a generally useful feature.

The foundation for this has already been put into place with 9b21d3f and 0dedbe7, which add the --filter and --chapter-filter cmdline options. These allow users to specify their own Python expressions to filter out images for which the expression evaluates to False. The available variable/keyword names for a specific URL can be looked up using -K or -j.

For example, to only download chapter 3 to 5 in English:
gallery-dl --chapter-filter "lang == 'en' and 3 <= chapter <= 5" https://bato.to/comic/_/comics/fury-r12548

These user-specified Python expression are currently evaluated using the builtin eval function, which is generally considered unsafe for input from untrusted sources (Eval really is dangerous), but that shouldn't be an issue for this particular use-case (unless someone has a good argument against using it). On the plus side: eval is a lot faster and more powerful then for example youtube-dl's --match-filter, but there are probably good reasons why youtube-dl doesn't use eval for their filter implementation.

Todo:

  • Implement --filter and --chapter-filter
  • Provide manga-chapter metadata
  • Decide on sensible globals for filter expressions

[deviantart] Add support for Groups galleries

gallery-dl --version

0.9.1-dev

OS: Windows 10 CU x64
Python 3.6.1. x64

Found something on DeviantArt (again) ๐Ÿ˜„
Groups..

1: Home URL of a group
http://cgpinups.deviantart.com/

PS D:\Stuff> gallery-dl -v http://cgpinups.deviantart.com/
[gallery-dl][debug] Starting DownloadJob for 'http://cgpinups.deviantart.com/'
[deviantart][debug] Using DeviantartGalleryExtractor for http://cgpinups.deviantart.com/
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.deviantart.com
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/all?username=cgpinups&offset=0&limit=10&mature_content=true HTTP/1.1" 200 70
PS D:\Stuff>

2: Gallery URL of group
http://cgpinups.deviantart.com/gallery/

Same result as above.

3: Gallery Folders in the Group (The actual galleries, so to speak) (One example of them)
http://cgpinups.deviantart.com/gallery/25871850/Fantasy-and-Sci-Fi

[gallery-dl][error] No suitable extractor found for 'http://cgpinups.deviantart.com/gallery/25871850/Fantasy-and-Sci-Fi'

I think the question is how the API handles this stuff.
It doesn't make sense to me right now, but I guess this is related to the difference between Gallery and Favourites. If this is the same distinction by the API...

For example, a user's gallery has Gallery Folders (just as a group), while the user's favourites has collections.

Gallery folders also don't work.
Example:

PS D:\Stuff> gallery-dl "http://arsenixc.deviantart.com/gallery/11314091/Backgrounds"
[gallery-dl][error] No suitable extractor found for 'http://arsenixc.deviantart.com/gallery/11314091/Backgrounds'
PS D:\Stuff>

No point in having two separate issues here, I think. Depends on the API results, I guess.

Sankaku exception

Sankaku post Id: 6443742. cause exception.

gallery-dl --verbose "https://chan.sankakucomplex.com/?tags=date:2017-10-01..2017-10-31T23:55:00:00+-rating:s+-animated+-webm+-mp4&commit=Search"

[gallery-dl][debug] Version 1.0.2
[gallery-dl][debug] Python 3.4.2 - Linux-4.9.47-v7+-armv7l-with-debian-8.0
[gallery-dl][debug] Starting DownloadJob for 'https://chan.sankakucomplex.com/?tags=date:2017-10-01..2017-10-31T23:55:00:00+-rating:s+-animated+-webm+-mp4&commit=Search'
[sankaku][debug] Using SankakuTagExtractor for https://chan.sankakucomplex.com/?tags=date:2017-10-01..2017-10-31T23:55:00:00+-rating:s+-animated+-webm+-mp4&commit=Search
[urllib3.connectionpool][info] Starting new HTTPS connection (1): chan.sankakucomplex.com
[urllib3.connectionpool][debug] "GET /?page=1&tags=date%3A2017-10-01..2017-10-31T23%3A55%3A00%3A00+-rating%3As+-animated+-webm+-mp4 HTTP/1.1" 200 None
[urllib3.connectionpool][debug] "GET /post/show/6443843 HTTP/1.1" 200 None

/home/scott/PIC/sankaku/6443843.png

[urllib3.connectionpool][debug] "GET /post/show/6443842 HTTP/1.1" 200 None
[sankaku][error] An unexpected error occurred: TypeError - argument of type 'NoneType' is not iterable. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .
[sankaku][debug] Traceback
Traceback (most recent call last):
File "/usr/local/lib/python3.4/dist-packages/gallery_dl/job.py", line 58, in run
for msg in self.extractor:
File "/usr/local/lib/python3.4/dist-packages/gallery_dl/extractor/sankaku.py", line 55, in items
for image in self.get_images():
File "/usr/local/lib/python3.4/dist-packages/gallery_dl/extractor/sankaku.py", line 76, in get_images
image = self.get_image_metadata(image_id)
File "/usr/local/lib/python3.4/dist-packages/gallery_dl/extractor/sankaku.py", line 96, in get_image_metadata
"file_url": "https:" + text.unescape(image_url),
File "/usr/lib/python3.4/html/init.py", line 130, in unescape
if '&' not in s:
TypeError: argument of type 'NoneType' is not iterable

Timeouts and retries

As far as I see, gallery-dl in current state cannot detect stalled download. These are vital for grabbing big galleries, so I request to add arguments for num. of retries and timeout intervals in future release.

Problem with http://kissmanga.com/Manga/Onepunch-Man-ONE

$ gallery-dl --verbose -d . http://kissmanga.com/Manga/Onepunch-Man-ONE
[gallery-dl][debug] Version 1.1.1
[gallery-dl][debug] Python 3.6.3 - Linux-4.14.8-1-ARCH-x86_64-with-arch
[gallery-dl][debug] Starting DownloadJob for 'http://kissmanga.com/Manga/Onepunch-Man-ONE'
[kissmanga][debug] Using KissmangaMangaExtractor for 'http://kissmanga.com/Manga/Onepunch-Man-ONE'
[urllib3.connectionpool][debug] Starting new HTTP connection (1): kissmanga.com
[urllib3.connectionpool][debug] http://kissmanga.com:80 "GET /Manga/Onepunch-Man-ONE HTTP/1.1" 200 None
[kissmanga][error] An unexpected error occurred: AttributeError - 'NoneType' object has no attribute 'groups'. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .
[kissmanga][debug] Traceback
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/gallery_dl/job.py", line 59, in run
    for msg in self.extractor:
  File "/usr/lib/python3.6/site-packages/gallery_dl/extractor/common.py", line 182, in items
    chapters = self.chapters(page)
  File "/usr/lib/python3.6/site-packages/gallery_dl/extractor/kissmanga.py", line 101, in chapters
    self.parse_chapter_string(data)
  File "/usr/lib/python3.6/site-packages/gallery_dl/extractor/kissmanga.py", line 65, in parse_chapter_string
    volume, chapter, minor, title = match.groups()
AttributeError: 'NoneType' object has no attribute 'groups'

[imagefap] Subextractor/subcategory for Profiles...

... is not used properly, or maybe it is, but just in a semi-useful way?

PS D:\> gallery-dl --list-extractors | sls "imagefap"

ImagefapGalleryExtractor
Extractor for image galleries from imagefap.com
Category: imagefap - Subcategory: gallery
Example : http://www.imagefap.com/gallery/6318447
ImagefapImageExtractor
Extractor for single images from imagefap.com
Category: imagefap - Subcategory: image
Example : http://www.imagefap.com/photo/1616331218/
ImagefapUserExtractor
Extractor for all galleries from a user at imagefap.com
Category: imagefap - Subcategory: user
Example : http://www.imagefap.com/profile/Mr Bad Example/galleries


PS D:\> gallery-dl -K "http://www.imagefap.com/gallery/6318447"
Keywords for directory names:
-----------------------------
category
  imagefap
count
  5
gallery_id
  6318447
section
  Fetish
subcategory
  gallery
title
  Wife cork sabot
uploader
  feman8008

Keywords for filenames:
-----------------------
category
  imagefap
count
  5
extension
  jpg
filename
  1.jpg
gallery_id
  6318447
image_id
  2106342264
name
  1
num
  1
section
  Fetish
subcategory
  gallery
title
  Wife cork sabot
uploader
  feman8008
PS D:\> gallery-dl -K "http://www.imagefap.com/photo/1616331218/"
Keywords for directory names:
-----------------------------
category
  imagefap
date
  21/01/2013
extension
  jpg
filename
  4.jpg
gallery_id
  6318447
height
  644
image_id
  1616331218
name
  4
section
  Fetish
subcategory
  image
title
  Wife cork sabot
uploader
  feman8008
width
  1000

Keywords for filenames:
-----------------------
category
  imagefap
date
  21/01/2013
extension
  jpg
filename
  4.jpg
gallery_id
  6318447
height
  644
image_id
  1616331218
name
  4
section
  Fetish
subcategory
  image
title
  Wife cork sabot
uploader
  feman8008
width
  1000
PS D:\> gallery-dl -K "http://www.imagefap.com/profile/Mr Bad Example/galleries"
Keywords for chapter filters:
-----------------------------
gallery_id
  3591227
name
  Juli Ashton gallery
PS D:\>

It seems that profile URLs get handled properly, so matching the pattern seems to work.
What happens then, I assume, is that the profile sub-extractor gathers gallery IDs/URLs and delegates (I love this) them to the gallery sub-extractor.

The problem is that options for a profile are now pretty useless, although they are set in imagefap.py, or do I get this wrong?

class ImagefapUserExtractor(Extractor):
"""Extractor for all galleries from a user at imagefap.com"""
category = "imagefap"
subcategory = "user"
directory_fmt = ["{category}", "{gallery_id} {title}"]
filename_fmt = "{category}_{gallery_id}_{name}.{extension}"

Kissmanga - no suitable extractor

http://kissmanga.com/manga/feng-shen-ji

PS C:\Users\James\Downloads> gallery-dl http://kissmanga.com/manga/feng-shen-ji
[gallery-dl][error] No suitable extractor found for 'http://kissmanga.com/manga/feng-shen-ji'

using gallery-dl-0.9.1 installed via pip

I had another issue with this site. Was that commit that fixed it included in one of the latest releases? Yea I'm pretty sure it was.

AttributeError: "NoneType" object has no attribute 'capitalize'

Hello

I have started meddling with the gallery-dl, but have come across some errors, to which I am now seeking answers.
The error in question this:

C:\Users\Musashi>gallery-dl "https://luscious.net/albums/delicious-thigh-pics_28
9107/"
[luscious][error] An unexpected error occurred:
Traceback (most recent call last):
File "E:\gallery-dl\gallery_dl\job.py", line 44, in run
File "E:\gallery-dl\gallery_dl\extractor\luscious.py", line 43, in items
File "E:\gallery-dl\gallery_dl\extractor\luscious.py", line 64, in get_job_met
adata
File "E:\gallery-dl\gallery_dl\util.py", line 90, in language_to_code
AttributeError: 'NoneType' object has no attribute 'capitalize'

C:\Users\Musashi>

So, could you please tell me what is wrong and how can I remedy this in the current and similar future cases?
Thanks for your time

Batoto English chapters only

I only want to download English chapters, how can I set it to do so?

I don't actually know Python and I'm trying to modify it by something like this:

    def chapters(self, page):
        # TODO: filter by language / translator
        needle = ('<tr class="row lang_English chapter_row">\n          <td style="border-top:0;">\n           '
                  '<a href="http://bato.to/reader#')
        return [self.root + "/reader#" + mangahash
                for mangahash in text.extract_iter(page, needle, '"')]

Doesn't seem to work very well.

limit on Gelbooru and Sankaku tag downloading

Try these two lines

https://chan.sankakucomplex.com/?tags=bodysuit&commit=Search
https://gelbooru.com/index.php?page=post&s=list&tags=bodysuit

The former only allows 500 images, and the latter only allows 20K

mangastream.com changed their website

gallery-dl https://mangastream.com/r/onepunch_man/083/4685/1
./gallery-dl/mangastream/One-Punch Man/โ€ฆd Road Uphill/One-Punch Man_c083_001.png
[mangastream][error] An unexpected error occurred:
Traceback (most recent call last):
  File "/home/user/.local/lib/python3.5/site-packages/gallery_dl/job.py", line 44, in run
    for msg in self.extractor:
  File "/home/user/.local/lib/python3.5/site-packages/gallery_dl/extractor/common.py", line 89, in __iter__
    raise task
  File "/home/user/.local/lib/python3.5/site-packages/gallery_dl/extractor/common.py", line 96, in async_items
    for task in self.items():
  File "/home/user/.local/lib/python3.5/site-packages/gallery_dl/extractor/mangastream.py", line 38, in items
    page = self.request(next_url).text
  File "/home/user/.local/lib/python3.5/site-packages/gallery_dl/extractor/common.py", line 65, in request
    response = safe_request(self.session, url, *args, **kwargs)
  File "/home/user/.local/lib/python3.5/site-packages/gallery_dl/extractor/common.py", line 142, in safe_request
    r = session.request(method, url, *args, **kwargs)
  File "/home/user/.local/lib/python3.5/site-packages/requests/sessions.py", line 494, in request
    prep = self.prepare_request(req)
  File "/home/user/.local/lib/python3.5/site-packages/requests/sessions.py", line 437, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "/home/user/.local/lib/python3.5/site-packages/requests/models.py", line 305, in prepare
    self.prepare_url(url, params)
  File "/home/user/.local/lib/python3.5/site-packages/requests/models.py", line 379, in prepare_url
    raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL '/r/onepunch_man/083/4685/2': No schema supplied. Perhaps you meant http:///r/onepunch_man/083/4685/2?

ValueError: Not enough values to unpack

win10, powershell
python 3.6
gallery-dl 1.0.3-dev

PS C:\Users\James\Downloads> gallery-dl http://kissmanga.com/Manga/Uzumaki --verbose
[gallery-dl][debug] Version 1.0.3-dev
[gallery-dl][debug] Python 3.6.0 - Windows-10-10.0.15063-SP0
[gallery-dl][debug] Starting DownloadJob for 'http://kissmanga.com/Manga/Uzumaki'
[kissmanga][debug] Using KissmangaMangaExtractor for http://kissmanga.com/Manga/Uzumaki
[urllib3.connectionpool][debug] Starting new HTTP connection (1): kissmanga.com
[urllib3.connectionpool][debug] http://kissmanga.com:80 "GET /Manga/Uzumaki HTTP/1.1" 200 None
[kissmanga][debug] Using KissmangaChapterExtractor for http://kissmanga.com/Manga/Uzumaki/Uzumaki-Chapter-1?id=56778
[urllib3.connectionpool][debug] Starting new HTTP connection (1): kissmanga.com
[urllib3.connectionpool][debug] http://kissmanga.com:80 "GET /Manga/Uzumaki/Uzumaki-Chapter-1?id=56778 HTTP/1.1" 302 Non
e
[urllib3.connectionpool][debug] http://kissmanga.com:80 "GET /Message/AreYouHuman?reUrl=%2fManga%2fUzumaki%2fUzumaki-Cha
pter-1%3fid%3d56778 HTTP/1.1" 200 None
[kissmanga][error] An unexpected error occurred: ValueError - not enough values to unpack (expected 2, got 0). Please ru
n gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/
issues .
[kissmanga][debug] Traceback
Traceback (most recent call last):
  File "c:\python36\lib\site-packages\gallery_dl\job.py", line 58, in run
    for msg in self.extractor:
  File "c:\python36\lib\site-packages\gallery_dl\extractor\kissmanga.py", line 122, in items
    data = self.get_job_metadata(page)
  File "c:\python36\lib\site-packages\gallery_dl\extractor\kissmanga.py", line 133, in get_job_metadata
    manga, cinfo = title.split("\n")[1:3]
ValueError: not enough values to unpack (expected 2, got 0)
[kissmanga][debug] Using KissmangaChapterExtractor for http://kissmanga.com/Manga/Uzumaki/Uzumaki-Chapter-2?id=56786

Feature: Custom string format options

Python strings come with a builtin str.format method, which interprets the actual string as format string and uses data supplied through the args and kwargs arguments to fill in replacement fields.
gallery-dl utilizes this to build its directory- and filenames by applying extracted metadata to either default or user-defined format strings.

To allow developers to customize and extend the format string syntax, Python also provides a Formatter class, which offers the same functionality as str.format and also enables developers to overwrite and extend its methods and functionality. Using this class comes at the cost of performance: str.format is at least 30 times faster than string.Formatter.format.

With c1f0afe I've taken the code of string.Formatter and stripped it down to a bare minimum, while keeping all functionality necessary for gallery-dl intact and optimizing it for performance. This made it only 5 times slower than str.format and also enabled same extra formatting options:

  • Conversions

    • l: converts all characters to lowercase (abc)
    • u: converts all characters to uppercase (ABC)
    • c: capitalizes the first letter and lowercases the rest (Abc)
    • C: calls string.capwords, which performs the same operation as c on each word (Abc Def)
  • Format Specifiers

    • ?<before>/<after>/: Adds <before> and <after> to the actual value if it evaluates to True, otherwise the whole replacement field becomes an empty string. This allows to add some extra characters before/after optional keyword values.
      Example: {f:?-+/+-/} -> "-+Example+-", if f contains "Example" or "" if f is empty
      See also:
      filename_fmt = "{category}_{hash}{title:?_//}.{extension}"

Todo:

  • Improve Python's custom string formatter
  • Implement basic custom formatting features

If anyone has any other ideas for useful formatting options, let me know.

Use .netrc

So, there's a file with format designed for ftp connections $HOME/.netrc. Syntax info.

This file can be used for keeping credentials for specific locations. In this way config.json could be safely shared without user/pass pairs,

youtube-dl actually uses .netrc in this way for its extractors as alternative to direct plain user/pass arguments.

XVideos Support (xVideos.com)

How difficult would it be to modify existing script/plugins (such as xHamster) to allow for downloading of xVideos image galleries/albums?

[pixiv] support shorten URLs

Currently are not supported short versions for pixiv links, i.e:

pixiv.net/i/xxxxx for images, pixiv.net/u/xxxxx for users. Maybe there's more, have no idea.

Deviantart, not all images are downloaded.

First, big thankx for your great programm. Now... When I download galleries from deviantart, many of them are downloaded partially. If I restart the download, the program will download the remaining files. Sometimes I have to restart 5 times. Do you need any more information from me to solve this? Links to galleries for example

[deviantart] Favorites extraction issue, Literature and Journals

I just noticed something while downloading some DeviantArt profiles as well as their favorites.

First the good news, given these URLs

[1] http://roperookie.deviantart.com/
[2] http://michaelpe.deviantart.com/
[3] http://rick35mm.deviantart.com/
[4] http://ultradevious.deviantart.com/
[5] http://sanfrancysco.deviantart.com/
[6] http://hart-worx.deviantart.com/

gallery-dl downloaded each and every deviation/submission from the profiles' galleries. 100/100 Points.

But Favorites are entirely another story:

Profile 1: Profile web page states: 1,316 Favourites.
In 'favorite\roperookie - Featured': 348

Profile 2: Profile web page states: 92 Favourites.
In 'favorite\michaelpe - Featured': 3

Profile 3: Profile web page states: 7 Favourites.
In 'favorite\rick35mm - Featured': 4

Profile 4: Profile web page states: 382 Favourites.
In 'favorite\ultradevious - Featured': 284

Profile 5: Profile web page states: 48 Favourites.
In 'favorite\sanfrancysco - Featured': 34

Profile 6: Profile web page states: 547 Favourites.
In 'favorite\hart-worx - Featured': 333

That some pretty wild variation there.
I think that two things need to be taken into account:

  • Favorites can also include Journal entries from any profile. This means that there eventually is no picture, for example, so nothing can be downloaded.
  • Favorites count might not reflect references to deviations that don't exist anymore or have been moved somewhere else.

Some of these profiles are pretty old, so the numbers of favorites mentioned on the profile pages have to be taken with a grain of salt, maybe.

My first assumption was that this might be caused by the at least two different favorite listings that each profile has. Featured and All.

The name of the target directory indicates that: "{profile-name} - Featured"

Also, the corresponding URLs:

Favorites tab in the horizontal category menu bar: http://sanfrancysco.deviantart.com/favourites/
Featured view at the left, under the profile badge: http://sanfrancysco.deviantart.com/favourites/
All favorites view link below: http://sanfrancysco.deviantart.com/favourites/?catpath=/

This does not affect the result, gallery-dl always returns the same files for these.

Logic indicates that Featured is a subset of All, which is sometimes true (http://sanfrancysco.deviantart.com/), sometimes not (http://rick35mm.deviantart.com/).

The last one is also an example for the Favorites chaos, the number listed in the profile being totally off.

Not sure what to make of it. Would be interesting to know what the API returns.

Use youtube-dl if available

How's this idea to you?

Pretty straightforward explanation:

Let's imagine we are parsing some thread/gallery. By default gallery-dl seeks only for links it can handle on its' own. If we are passing corresponding option on start it also seeks for links supported by youtube-dl and saves them in memory. After finishing thread/gallery, youtube-dl opened for downloading these supported links. After finishing them, youtube-dl is closed and gallery-dl proceeds.

python 3 requried error

when i tried to run it on ubuntu 16.04 but it does not work because in ubuntu default python is 2.7 and u can install python 3 too they will work.

is there any way to change it to python3?

nvm used dev version it fixed

[deviantart] Changed keywords, or API query?

OS: Windows 10 x64 [Version 10.0.15063]
Python: 3.6.1 amd64
gallery-dl: git master

PS F:\> gallery-dl --verbose "http://bentanart.deviantart.com/favourites/"
[gallery-dl][debug] Starting DownloadJob for 'http://bentanart.deviantart.com/favourites/'
[deviantart][debug] Using DeviantartFavoriteExtractor for http://bentanart.deviantart.com/favourites/
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.deviantart.com
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/collections/folders?username=bentanart&offset=0&limit=50&mature_content=true HTTP/1.1" 200 274
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/collections/3A2F3B70-6714-52AF-8D16-2BAD70BB6809?username=bentanart&offset=0&limit=24&mature_content=true HTTP/1.1" 200 6971
[deviantart][error] An unexpected error occurred: KeyError - 'collection'. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .
[deviantart][debug] Traceback
Traceback (most recent call last):
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\job.py", line 45, in run
    self.dispatch(msg)
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\job.py", line 78, in dispatch
    self.handle_directory(msg[1])
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\job.py", line 130, in handle_directory
    self.pathfmt.set_directory(keywords)
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\util.py", line 188, in set_directory
    for segment in self.directory_fmt
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\util.py", line 188, in <listcomp>
    for segment in self.directory_fmt
KeyError: 'collection'
PS F:\>

(Same error for different .deviantart.com/favourites URLs)

Error appears when setting directory, i.e. in extractor.deviantart.favorite.directory

I tested it with --ignore-config, which seemed to work.

So it has to be something in my config, here the part for DeviantArt:

"deviantart":
        {
            "gallery":
            {
                "directory": ["DeviantArt", "Galleries", "{author[username]}"],
                "filename": "{index}_{title}.{extension}"
            },
            "favorite":
            {
                "directory": ["DeviantArt", "Favorites", "{collection[owner]}", "{collection[title]}"],
                "filename": "{index}_{title}_by_{author[username]}.{extension}"
            },
            "deviation":
            {
                "directory": ["DeviantArt", "Deviations"],
                "filename": "{index}_{title}_by_{author[username]}-({author[urlname]}).{extension}"
            },
            "folder":
            {
                "directory": ["DeviantArt", "Folders", "{folder[owner]}", "{folder[title]}"],
                "filename": "{index}_{title}_by_{author[username]}.{extension}"
            },
            "collection":
            {
                "directory": ["DeviantArt", "Collections", "{collection[owner]}", "{collection[title]}"],
                "filename": "{index}_{title}_by_{author[username]}.{extension}"
            },
            "mature": true
        },

(Without the part for Journals...)

Not sure if I get this right..

Well, the Favorites-subextractor should be used, and that's the same folder structure setting I've used in the past.

Not sure. Did something change with their API? Or did some recent commit changed the endpoint used, and I missed it somehow?

[kissmanga] AttributeError: 'NoneType' object has no attribute 'group'

At a glance, it looks like some regex failed to match and the program continued to use the None value expecting it be a match.

cmd: gallery-dl http://kissmanga.com/Manga/Liar-Game
python 3.6
gallery-dl v0.9.1

PS C:\Users\James\Downloads> gallery-dl http://kissmanga.com/Manga/Liar-Game
[kissmanga][error] An unexpected error occurred:
Traceback (most recent call last):
  File "c:\python36\lib\site-packages\gallery_dl\job.py", line 44, in run
    for msg in self.extractor:
  File "c:\python36\lib\site-packages\gallery_dl\extractor\kissmanga.py", line 73, in items
    data = self.get_job_metadata(page)
  File "c:\python36\lib\site-packages\gallery_dl\extractor\kissmanga.py", line 88, in get_job_metadata
    chminor = match.group(3) or match.group(6)
AttributeError: 'NoneType' object has no attribute 'group'
[kissmanga][error] An unexpected error occurred:
Traceback (most recent call last):
  File "c:\python36\lib\site-packages\gallery_dl\job.py", line 44, in run
    for msg in self.extractor:
  File "c:\python36\lib\site-packages\gallery_dl\extractor\kissmanga.py", line 73, in items
    data = self.get_job_metadata(page)
  File "c:\python36\lib\site-packages\gallery_dl\extractor\kissmanga.py", line 88, in get_job_metadata
    chminor = match.group(3) or match.group(6)
AttributeError: 'NoneType' object has no attribute 'group'
[kissmanga][error] An unexpected error occurred:
Traceback (most recent call last):
  File "c:\python36\lib\site-packages\gallery_dl\job.py", line 44, in run
    for msg in self.extractor:
  File "c:\python36\lib\site-packages\gallery_dl\extractor\kissmanga.py", line 73, in items
    data = self.get_job_metadata(page)
  File "c:\python36\lib\site-packages\gallery_dl\extractor\kissmanga.py", line 88, in get_job_metadata
    chminor = match.group(3) or match.group(6)
AttributeError: 'NoneType' object has no attribute 'group'
[kissmanga][error] An unexpected error occurred:
Traceback (most recent call last):
  File "c:\python36\lib\site-packages\gallery_dl\job.py", line 44, in run
    for msg in self.extractor:
  File "c:\python36\lib\site-packages\gallery_dl\extractor\kissmanga.py", line 73, in items
    data = self.get_job_metadata(page)
  File "c:\python36\lib\site-packages\gallery_dl\extractor\kissmanga.py", line 88, in get_job_metadata
    chminor = match.group(3) or match.group(6)
AttributeError: 'NoneType' object has no attribute 'group'

[flickr] Complete extractor support

Good news from the commit log, flickr album extractor (93e5d8c), and image extractor (659c65d) are already there, apparently.

So I thought that adding complete support for Flickr now might be a good idea. ๐Ÿ˜„

Support for the Flickr API being part of the implementation already, at least that's what I got from reading the source code a bit, it should be relatively straightforward now, I would assume.

All Flickr variants, I think, are:

  • Supported:
  • Individual Image
  • An Album (a set of 1 or more images)
  • To do:
  • Profile favorites
  • The user profile itself, what is called "Photostream", I believe

So what do you think?
Unless I'm wrong about the Flickr API, you already made the foundation, so it should not be too much work, I'd assume.

gallery-dl doesn't work with hentai2read

It seems gallery-dl no longer works with hentai2read. I'm getting AttributeError: 'NoneType' object has no attribute 'group' when trying to download from the site.

I

Download limit

Hi mikf,

Is there a quick way to limit the number of download items? Since I found no such setting in the option.
e.g. there are 1114k images here https://danbooru.donmai.us/posts?tags=long_hair, I want to download the first 10k.

Thanks,
Jie

Sankaku Channel - [Error] HTTP status "429 Too many requests - please slow down..."

When downloading many images at once (100+) gallery-dl fails to download some images. It has some errors like this.

[Error] HTTP status "429 Too many requests - please slow down..." (1/5)
[sankaku][error] An unexpected error occurred:
Traceback (most recent call last):
File "E:\gallery-dl\gallery_dl\job.py", line 44, in run
File "E:\gallery-dl\gallery_dl\extractor\common.py", line 89, in iter
File "E:\gallery-dl\gallery_dl\extractor\common.py", line 96, in async_items
File "E:\gallery-dl\gallery_dl\extractor\sankaku.py", line 40, in items
File "E:\gallery-dl\gallery_dl\extractor\sankaku.py", line 63, in get_images
File "E:\gallery-dl\gallery_dl\extractor\sankaku.py", line 72, in get_image_me
tadata
File "E:\gallery-dl\gallery_dl\extractor\common.py", line 65, in request
File "E:\gallery-dl\gallery_dl\extractor\common.py", line 155, in safe_request
File "C:\Python34\lib\site-packages\requests\models.py", line 937, in raise_fo
r_status
requests.exceptions.HTTPError: 429 Client Error: Too many requests - please slow
down... for url: x

Questions, Feedback and Suggestions

A central place for these things might be a good idea.

This thread could serve as a starting point, results will eventually be collected in the project wiki, if appropriate and useful.

Edited 2017-04-15
For conciseness

Edited 2017-05-04
Removed nonsensical checklist thing


[imgbox] Extraction error, site change (late 2017)

First, congratulations on 1.0.0 . Really good job! ๐Ÿ˜‰

Imgbox is not working atm, caused by a site change, as confirmed by Google Web Search turning up issues for this site with other programs etc.

gallery-dl 1.0.0
Windows 10 x64 Fall Creators Update
Python 3.6.1 x64

Imgbox, Test URL 1:
http://imgbox.com/g/cUGEkRbdZZ

PS D:\> gallery-dl --verbose "http://imgbox.com/g/cUGEkRbdZZ"
[gallery-dl][debug] Version 1.0.0
[gallery-dl][debug] Python 3.6.1 - Windows-10-10.0.16299-SP0
[gallery-dl][debug] Starting DownloadJob for 'http://imgbox.com/g/cUGEkRbdZZ'
[imgbox][debug] Using ImgboxGalleryExtractor for http://imgbox.com/g/cUGEkRbdZZ
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): imgbox.com
[urllib3.connectionpool][debug] https://imgbox.com:443 "GET /g/cUGEkRbdZZ HTTP/1.1" 200 None
[urllib3.connectionpool][debug] https://imgbox.com:443 "GET /fFnvPLeZ HTTP/1.1" 200 None
[imgbox][error] An unexpected error occurred: TypeError - must be str, not NoneType. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .
[imgbox][debug] Traceback
Traceback (most recent call last):
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\job.py", line 58, in run
    for msg in self.extractor:
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\extractor\common.py", line 139, in __iter__
    raise task
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\extractor\common.py", line 146, in async_items
    for task in self.items():
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\extractor\imgbox.py", line 52, in items
    yield Message.Url, self.get_file_url(imgpage), data
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\extractor\imgbox.py", line 78, in get_file_url
    return base + path
TypeError: must be str, not NoneType
PS D:\>

Imgbox, Test URL 2:
https://imgbox.com/g/m3hiqHrE0e

PS D:\> gallery-dl --verbose "https://imgbox.com/g/m3hiqHrE0e"
[gallery-dl][debug] Version 1.0.0
[gallery-dl][debug] Python 3.6.1 - Windows-10-10.0.16299-SP0
[gallery-dl][debug] Starting DownloadJob for 'https://imgbox.com/g/m3hiqHrE0e'
[imgbox][debug] Using ImgboxGalleryExtractor for https://imgbox.com/g/m3hiqHrE0e
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): imgbox.com
[urllib3.connectionpool][debug] https://imgbox.com:443 "GET /g/m3hiqHrE0e HTTP/1.1" 200 None
[urllib3.connectionpool][debug] https://imgbox.com:443 "GET /pvWHIHlp HTTP/1.1" 200 None
[imgbox][error] An unexpected error occurred: TypeError - must be str, not NoneType. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .
[imgbox][debug] Traceback
Traceback (most recent call last):
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\job.py", line 58, in run
    for msg in self.extractor:
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\extractor\common.py", line 139, in __iter__
    raise task
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\extractor\common.py", line 146, in async_items
    for task in self.items():
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\extractor\imgbox.py", line 52, in items
    yield Message.Url, self.get_file_url(imgpage), data
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\extractor\imgbox.py", line 78, in get_file_url
    return base + path
TypeError: must be str, not NoneType
PS D:\>

Well, fuck me sideways, was just about to click on 'Submit', but then decided to try another URL..
http://imgbox.com/g/wg9TY0XSg7

And this seems to work. Oh my, even if something is broken, it has to be broken in an inconsistent manner. Obviously.

Can't tell why No. 3 is working. I think this is a rather old gallery, while the other examples are pretty new. Maybe the site change only applies to new galleries..

The other difference I can think of is the gallery title. The first two have a specific title, while no. 3 does not, only a generic 'Unnamed Gallery etc.pp.'..

How to use it?

I set command in cmd like
"C:\Python27\Scripts>gallery-dl URL [https://luscious.net/..etc.../view/]"
And got:

Traceback (most recent call last):
File "c:\python27\lib\runpy.py", line 162, in run_module_as_main
"main", fname, loader, pkg_name)
File "c:\python27\lib\runpy.py", line 72, in run_code
exec code in run_globals
File "C:\Python27\Scripts\gallery-dl.exe__main
.py", line 5, in
File "c:\python27\lib\site-packages\gallery_dl__init__.py", line 116
print("No suitable extractor found for URL '", url, "'", sep="")
^
SyntaxError: invalid syntax

Really waiting for your kind answer^^

Batoto don't work and large folder names not found

Batoto seem to not work fine, i get an error related to url (error 405).
Also if the manga name cotain some weird characters or is too large, the script fail to get the folder.

Alt text

Edit: same error on v0.5.2

imageFap 404

Some time imagefap get 404 and drop down donload. Is it possible to add an exception to imagefap extractor. if 404 go to next image ?

[luscious]wrong parser result

[kissmanga][error] An unexpected error occurred: AttributeError - 'NoneType' object has no attribute 'group'

Is the kissmanga downloader working? Did I miss something?

PS C:\Users\James\Downloads> gallery-dl http://kissmanga.com/Manga/Monster
[kissmanga][info] Solving Cloudflare challenge
[kissmanga][error] An unexpected error occurred: AttributeError - 'NoneType' object has no attribute 'group'
[kissmanga][error] An unexpected error occurred: AttributeError - 'NoneType' object has no attribute 'group'
[kissmanga][error] An unexpected error occurred: AttributeError - 'NoneType' object has no attribute 'group'
[kissmanga][error] An unexpected error occurred: AttributeError - 'NoneType' object has no attribute 'group'
[kissmanga][error] An unexpected error occurred: AttributeError - 'NoneType' object has no attribute 'group'
[kissmanga][error] An unexpected error occurred: AttributeError - 'NoneType' object has no attribute 'group'
[kissmanga][error] An unexpected error occurred: AttributeError - 'NoneType' object has no attribute 'group'
[kissmanga][error] An unexpected error occurred: AttributeError - 'NoneType' object has no attribute 'group'
[kissmanga][error] An unexpected error occurred: AttributeError - 'NoneType' object has no attribute 'group'
[kissmanga][error] An unexpected error occurred: AttributeError - 'NoneType' object has no attribute 'group'
[kissmanga][error] An unexpected error occurred: AttributeError - 'NoneType' object has no attribute 'group'
[kissmanga][error] An unexpected error occurred: AttributeError - 'NoneType' object has no attribute 'group'
[kissmanga][error] An unexpected error occurred: AttributeError - 'NoneType' object has no attribute 'group'
[kissmanga][error] An unexpected error occurred: AttributeError - 'NoneType' object has no attribute 'group'
[kissmanga][error] An unexpected error occurred: AttributeError - 'NoneType' object has no attribute 'group'
[kissmanga][error] An unexpected error occurred: AttributeError - 'NoneType' object has no attribute 'group'
[kissmanga][error] An unexpected error occurred: AttributeError - 'NoneType' object has no attribute 'group'
[kissmanga][error] An unexpected error occurred: AttributeError - 'NoneType' object has no attribute 'group'
[kissmanga][error] An unexpected error occurred: AttributeError - 'NoneType' object has no attribute 'group'
[kissmanga][error] An unexpected error occurred: AttributeError - 'NoneType' object has no attribute 'group'

KeyboardInterrupt
PS C:\Users\James\Downloads>

Also, if it is the cloudfare bypasser that's failing, why not use cfscrape module? It was easy for me to use and integrate into one of my projects.

Multithreaded downloading

I know no python so I have no idea how tricky it can be.

There are some required limits:

  • Simultaneously for specific servers can be run limited number of downloads, so it should be flagged by extractor somehow. For these server downloads should be only 1 at a time for safety reasons.

  • If searching for another download on available server, only X lines of input can be read, not whole file. E.g.: if maximum number of threads is 2 and we have input of "server1 server1 server2" it shouldn't start downloading third link before finishing first or second.

  • Visual output in terminal should be updated accordingly to this feature without mixing lines of different colour.

readcomics.tv is down...

gur@gur____:/media/gur/_/tmp/comix$ gallery-dl "http://www.readcomics.tv/comic/back-to-the-future-2015"
[requests.packages.urllib3.connectionpool][info] Starting new HTTPS connection (1): readcomics.tv
[requests.packages.urllib3.connectionpool][info] Starting new HTTPS connection (2): readcomics.tv
[requests.packages.urllib3.connectionpool][info] Starting new HTTPS connection (3): readcomics.tv
[requests.packages.urllib3.connectionpool][info] Starting new HTTPS connection (4): readcomics.tv
[requests.packages.urllib3.connectionpool][info] Starting new HTTPS connection (5): readcomics.tv
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 137, in _new_conn
(self.host, self.port), self.timeout, **extra_kw)
File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 67, in create_connection
for res in socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM):
File "/usr/lib/python3.5/socket.py", line 732, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -5] No address associated with hostname

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 560, in urlopen
body=body, headers=headers)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 346, in _make_request
self._validate_conn(conn)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 787, in _validate_conn
conn.connect()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 217, in connect
conn = self._new_conn()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 146, in _new_conn
self, "Failed to establish a new connection: %s" % e)
requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f6d518fcb70>: Failed to establish a new connection: [Errno -5] No address associated with hostname

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 376, in send
timeout=timeout
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 610, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 273, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
requests.packages.urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='readcomics.tv', port=443): Max retries exceeded with url: /comic/back-to-the-future-2015 (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f6d518fcb70>: Failed to establish a new connection: [Errno -5] No address associated with hostname',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/bin/gallery-dl", line 9, in
load_entry_point('gallery-dl==0.8.3.dev0', 'console_scripts', 'gallery-dl')()
File "/usr/local/lib/python3.5/dist-packages/gallery_dl-0.8.3.dev0-py3.5.egg/gallery_dl/init.py", line 105, in main
jobtype(url).run()
File "/usr/local/lib/python3.5/dist-packages/gallery_dl-0.8.3.dev0-py3.5.egg/gallery_dl/job.py", line 115, in run
Job.run(self)
File "/usr/local/lib/python3.5/dist-packages/gallery_dl-0.8.3.dev0-py3.5.egg/gallery_dl/job.py", line 41, in run
for msg in self.extractor:
File "/usr/local/lib/python3.5/dist-packages/gallery_dl-0.8.3.dev0-py3.5.egg/gallery_dl/extractor/readcomics.py", line 31, in items
for issue in self.get_issues():
File "/usr/local/lib/python3.5/dist-packages/gallery_dl-0.8.3.dev0-py3.5.egg/gallery_dl/extractor/readcomics.py", line 36, in get_issues
page = self.request(self.url).text
File "/usr/local/lib/python3.5/dist-packages/gallery_dl-0.8.3.dev0-py3.5.egg/gallery_dl/extractor/common.py", line 42, in request
response = safe_request(self.session, url, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/gallery_dl-0.8.3.dev0-py3.5.egg/gallery_dl/extractor/common.py", line 85, in safe_request
r = session.request(method, url, *args, **kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 468, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 576, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 437, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='readcomics.tv', port=443): Max retries exceeded with url: /comic/back-to-the-future-2015 (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f6d518fcb70>: Failed to establish a new connection: [Errno -5] No address associated with hostname',))

gur@gur-Satellite-L305:/media/gur/M_kitchen/tmp/comix$ gallery-dl --version
0.8.3-dev

For the record the only reason I installed from source, is that I couldn't get it installed from pip3.

Reddit images+links (recursive) dump

What is your opinion on this, can you do it? Not requesting it directly, just asking because it can be tricky.

Plenty of subreddits are used as galleries. Comment section is no less important, so it deserves grabbing too.

Key pooint is that other modules of gallery-dl can be used for link processing.

no extractor found

hi,
using the command "gallery-dl gelbooru" gives me the error
gelbooru: no extractor found

I'm using windows 7 and installed on python 3.51

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.