GithubHelp home page GithubHelp logo

davidycliao / legiscrawler Goto Github PK

View Code? Open in Web Editor NEW
8.0 2.0 0.0 6.1 MB

An automation webcrawler based on Selenium library for retrieving parliamentary questions on The Website of Taiwan Legislative Yuan (https://lis.ly.gov.tw/). 🕸️ 🕸️ 爬立法委員問政專輯的爬蟲小幫手 🛠️🧰

License: MIT License

Python 100.00%
web-scraping legislative-yuan legislators legislation parliamentary-questions selenium chromedriver python selenium-webdriver

legiscrawler's Introduction

legisCrawler: An Automation Webcrawling Toolkit for Retrieving Taiwan Parliamentary Questions 🛠️🧰

CodeQL CI

An automation web crawling framework for retrieving parliamentary questions on The Website of Taiwan Legislative Yuan 立法院 (https://lis.ly.gov.tw/) based on Selenium library in Python and Chrome browser.

Requirements

  • python>=3.7.3 🐍
  • pip>=19.2
  • numpy=1.16.2
  • pandas=0.24.2
  • matplotlib=3.0.3
  • selenium
  • webdriver-manager

Instruction

git clone  [email protected]:davidycliao/legisCrawler.git
  • Copy the commands below and paste them into the terminal:

# Change the directory by typing `cd` command once `legisCrawler` repository is downloaded.
cd legisCrawler

# Create the enviroment by using conda and name the enviroment `legisCrawler`.
conda create -n legisCrawler python=3.7 

# Activate the pre-named enviroment. 
conda activate legisCrawler 

# Install the dependencies from `requirements.txt` using `pip` methond.
pip install -r requirements.txt   
  • Run legisCrawler in your Python:
# Note: you need to run it in the terminal where you activated the enviroment.
python legisCrawler.py
  • When legisCrawler is running, you will be asked which term (2nd - 10th) you would like to scrape (please, type any single digit from 2 to 10). Then legisCrawler will automatically create a folder to restore the retrieval of parliamentary questions by individual member.

Workflow

What legisCrawler Scrapes

This designed crawler automatically webscrapes the parliamentary questions (專案質詢) from The Website of Legislative Yuan, including a bunch of information with regards to the topic, keywords and the type. An additional module for getting a corpus of grand parliamentary debates (總質詢) is in progress and will be available soon.

Note

If there’s anything you need about running legisCrawler, please don’t hesitate to post a message in Discussion 📣. 如果有任何需要幫忙的地方,歡迎到留言在發問區,或者email 給我。我會抽空來幫忙解決問題!

Cite

For citing this work, you can refer to the present GitHub project. For example, with BibTeX:

@misc{legisCrawler,
    howpublished = {\url{https://github.com/davidycliao/legisCrawler}},
    title = {An Automation Webcrawling Toolkit for Retrieving Taiwan Parliamentary Questions},
    author = {David Yen-Chieh Liao and Calvin Yu-Ceng Liao},
    publisher = {GitHub},
    year = {2021}
}

legiscrawler's People

Contributors

davidycliao avatar imgbotapp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.