Coroutines Web Scrapper & DB Processor
- asynchronous programming
- stable, also useful to exception handling
Based on asyncio Stream
- coroutines web scraper:
run_web_scrapper.py
in test
- coroutines selenium scrapper :
run_selenium.py
in test
- db processor:
DBConnector
in db_connector
- query builder:
dev for sql query builder & http query builder
- run code inside
test
dir
- when use tasks.csv
python run_web_scrapper.py --tasks path/to/tasks.csv --save_file crawling \
--result_path result --result_type text
- when edit url list inside code, skip tasks option
python run_web_scrapper.py --save_file crawling \
--result_path result --result_type text
python run_web_scrapper.py --tasks ../tasks.csv --save_file crawling \
--result_path result --result_type text
python run_selenium.py --save_file selenium \
--result_path result --result_type text
- stream with request module:
Reader, Writer, Stream, Session
in stream.map
-
inherit stream.map.BaseSession
, make CustomSession
-
edit code inside async def aenter
-
edit params base_session of func asyncio_scraper
in run_web_scrapper.py
-
Same as selenium scrapper