GithubHelp home page GithubHelp logo

cutout's Introduction

cutout

A toolbox for data grabbing and processing in python 3

Introduction

cutout is a Python toolbox for data grabbing and processing. Types::

  • Grab html page to get data
  • Download files from the Internet
  • ProgressBar for download
  • Cache use memory or file system
  • Generate SQL statements
  • More powerful features

This software is still under development, improvement and perfection.

Pre-requisites

  • Python 3.4+
  • import PyMySQL

Installation

You can download cutout by click here, and use it in your code like this::

from cutout import download, cutout
from cutout.common import get_html, get_argv_dict
from cutout.util import sec2time
...

Documentation

中文API手册

To get baidu music pc software download url, like this::

>>> from cutout import cutout
>>> para = {} #p aram
>>> para['url'] = 'http://music.baidu.com/'
>>> para['start'] = '<a class="downloadlink-pc"'
>>> para['end'] = '>下载PC版</a>'
>>> para['dealwith'] = { 'start':'href="', 'rid':'"', 'end':'"' } # get href url
>>> cutout(**para) # do grab
'http://qianqian.baidu.com/download/BaiduMusic-12345630.exe'

To create a cache, like this::

>>> from cutout.cache import FileCache
>>> c = FileCache('./cache') # set cache dir './cache'
>>> c.set("foo", "value")
>>> c.get("foo")
'value'
>>> c.get("missing") is None
True

To create a ProgressBar for download, like this::

>>> from cutout import download
>>> from cutout.common import ProgressBar
>>> bar = ProgressBar(piece_total=1);
>>> face = { 'sh_piece_division':1024, 'sh_piece_unit':'KB' }
>>> bar.face(**face)
>>> download('http://qianqian.baidu.com/download/BaiduMusic-12345630.exe',showBar=bar)
'[=============================>                    ]  59.23%  14.81%/s  1280.00KB/s  5120.00KB/8644.81KB  00:00:04'

Read or run the example.py to get more example.

Download and use the browser to open the document.zh.htm, A detailed understanding of all API.

$ python3 cutout/test.py

Author

cutout is developed and maintained by Yang Jie ([email protected]). It can be found here: http://github.com/yangjiePro/cutout

Contact way:

cutout's People

Watchers

James Cloos avatar yhaoyan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.