packtpub-crawler

Download FREE eBook every day from www.packtpub.com

This crawler automates the following step:

grab the hidden form parameters
access to private account
claim the daily free eBook
parse title, description and useful information
download favorite format .pdf .epub .mobi
download source code and book cover
upload files to Google Drive

Default command

# upload pdf to drive
python script/spider.py -c config/prod.cfg -u drive

Other options

# download all format
python script/spider.py --config config/prod.cfg --all

# download only one format: pdf|epub|mobi
python script/spider.py --config config/prod.cfg --type pdf

# download also additional material: source code (if exists) and book cover
python script/spider.py --config config/prod.cfg -t pdf --extras
# equivalent (default is pdf)
python script/spider.py -c config/prod.cfg -e

# download and then upload to Drive (given the download url anyone can download it)
python script/spider.py -c config/prod.cfg -t epub --upload drive
python script/spider.py --config config/prod.cfg --all --extras --upload drive

Configuration

You need to create config/prod.cfg file with your Packt Publishing credential, look at config/prod_example.cfg for a sample.

From documentation, Drive API requires OAuth2.0 for authentication, so to upload files you should:

Go to APIs Console and make a new project named PacktpubDrive
On Services menu, turn Drive API on
On API Access menu, create OAuth client ID
- Application type: Installed application
- Installed application type: Other
Click Download JSON and save the file config/client_secrets.json.
Documentation: OAuth, Quickstart, example and permissions

Development (only for spidering)

Run a simple static server with

node dev/server.js

and test the crawler with

python script/spider.py --dev --config config/dev.cfg --all

Possible improvements

compress files before upload
add uploading service for Dropbox
notify via email
log to file and console: example
cron

diwahars / packtpub-crawler Goto Github PK

packtpub-crawler's Introduction

packtpub-crawler

Download FREE eBook every day from www.packtpub.com

Default command

Other options

Configuration

Development (only for spidering)

Possible improvements

packtpub-crawler's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs