GithubHelp home page GithubHelp logo

chusiang / crawler-book-info Goto Github PK

View Code? Open in Web Editor NEW
5.0 2.0 1.0 91 KB

A crawler for quick parser the book information

License: MIT License

Python 85.49% Makefile 10.52% Dockerfile 3.99%
crawler book python

crawler-book-info's Introduction

Crawler Book Info

Travis CI Python Version Docker Hub Download Size License: MIT

A sample crawler for quick parser some books information.

Initialization

  1. Install the virtualenv.

    [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ sudo pip3 install virtualenv
    
  2. create virtualenv.

    [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ virtualenv -p python3 .venv
    
  3. Enter the virtualenv.

    [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ . .venv/bin/activate
    
  4. Install packages with pip.

    (.venv) [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ pip3 install -r requirements.txt
    

Usage

tenlong.com.tw

  1. Run crawler with ISBN-13.

    (.venv) [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ python3 tenlong.py 9781491915325
    
  2. (option) Run crawler via make.

    (.venv) [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ make telong 9781491915325
    

books.com.tw

  1. Run crawler with url.

    (.venv) [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ python3 books.py https://www.books.com.tw/products/0010810939
    
  2. Run crawler with product number.

    (.venv) [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ python3 books.py 0010810939
    

Not support the ISBN-13 args yet on books.com.tw.

View Result

  1. Open html via Firefox on GNU/Linux.

    (.venv) [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ firefox index.html
    

    ansiblebook

  2. We can see the https://www.tenlong.com.tw/products/9781491915325, it is clean, now.

Run local Nginx for Evernote Web Clipper

The Evernote Web Clipper is not support local files, so we can clip it with Nginx.

  1. Run Nginx container.

    $ docker run --name nginx -v "$(pwd)":/usr/share/nginx/html/ -p 80:80 -d nginx
    
  2. Open html via Firefox on GNU/Linux.

    (.venv) [ jonny@xenial ~/vcs/crawler-book-outline ]
    $ firefox http://localhost
    
  3. (option) Run Nginx container via make.

    $ make run_containers
    
  4. (option) Open web via make.

    $ make review_serve
    
  5. Finally, we can clip the information to Evernote with Evernote Web Clipper.

License

Copyright (c) chusiang from 2017-2022 under the MIT license.

crawler-book-info's People

Contributors

chusiang avatar dependabot[bot] avatar

Stargazers

 avatar  avatar catcatcatcat avatar Bear Su avatar  avatar

Watchers

 avatar  avatar

Forkers

elleryq

crawler-book-info's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.