GithubHelp home page GithubHelp logo

hail-cali / db_modeling Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 1.0 1.86 MB

Package for DB processor and Asyncio Web Scraper based on coroutine

Python 100.00%
database query-builder stream web-scraper crawling coroutine python

db_modeling's Introduction

Coroutines Web Scrapper & DB Processor

What is Coroutines ?

  • asynchronous programming
  • stable, also useful to exception handling

Based on asyncio Stream

sample

Feature

  • coroutines web scraper: run_web_scrapper.py in test
  • coroutines selenium scrapper : run_selenium.py in test
  • db processor: DBConnector in db_connector
  • query builder: dev for sql query builder & http query builder

How to use

Web Scraper (crawling)

  • run code inside test dir
  • when use tasks.csv
python run_web_scrapper.py --tasks path/to/tasks.csv --save_file crawling  \
                            --result_path result --result_type text
  • when edit url list inside code, skip tasks option
python run_web_scrapper.py  --save_file crawling  \
                            --result_path result --result_type text
  • sample run sh
python run_web_scrapper.py --tasks ../tasks.csv --save_file crawling  \
                            --result_path result --result_type text

Selenium Scrapper

python run_selenium.py --save_file selenium  \
                            --result_path result --result_type text

DB processor

  • shell 'dev'
python run.py

Modules

  • stream with request module: Reader, Writer, Stream, Session in stream.map

How to Custom

  • inherit stream.map.BaseSession, make CustomSession

  • edit code inside async def aenter

  • edit params base_session of func asyncio_scraper in run_web_scrapper.py

  • Same as selenium scrapper

example


Task lists

  • selenium scrapper
  • db processor code
  • web scrapper code
  • Dev crawler using API

db_modeling's People

Contributors

hail-cali avatar

Watchers

 avatar

Forkers

kail-cali

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.