GithubHelp home page GithubHelp logo

davidycliao / biscrawler Goto Github PK

View Code? Open in Web Editor NEW
8.0 2.0 1.0 61.13 MB

An Automation Webcrawler for Extracting Central Bankers' Speeches

License: MIT License

Python 89.32% PowerShell 0.01% Shell 0.01% C 1.10% Cython 1.81% C++ 0.24% Batchfile 0.01% JavaScript 4.93% CSS 1.18% HTML 0.47% Jupyter Notebook 0.79% Jinja 0.11% Less 0.01% Fortran 0.01% Smarty 0.01% XSLT 0.01%
speeches python bank-for-international-settlements central-bankers-speeches text-as-data scraper scraping central-banker

biscrawler's Introduction

bisCrawler: An Automation Webcrawler for Extracting Central Bankers' Speeches 🛠️🧰

CI

An automation web crawling framework for retrieving for Extracting Central Bankers' Speeches on the Website of Bank for International Settlements (https://www.bis.org) based on Selenium and Chrome browser.

Environment Setup

  1. Need to install Anaconda Navigator and Python>=3.9 beforehand. And then, open the terminal and download bisCrawler repository by using git clone. About how to use git and Github, please have a look at this Tutorial for Beginners.
git clone  [email protected]:davidycliao/bisCrawler.git
  1. Copy the commands below and paste them into the terminal:
# Change the directory by typing `cd` command once `bisCrawler` repository is downloaded.
cd bisCrawler

# Create the enviroment by using conda and name the enviroment `bisCrawler`.
conda create -n bisCrawler python=3.9

Instruction

  1. Activate the pre-named enviroment. Alternatively, the environment for bisCrawler can be opened via Anaconda Navigator
conda activate bisCrawler 
  1. Install the dependencies from requirements.txt using pip methond.
pip install -r requirements.txt   
  1. Call bisCrawler Moduel
  • In the terminal:
# Note: you need to run it in the terminal where you activated the enviroment.
python bisCrawler.py
  • In Jupyter Notebook:
from bisCrawler import scraper 
scraper()
  1. When Running bisCrawler

When bisCrawler is running, you will be asked which page you would like to scrape (please, type any single digit from 1 to last page). Then bisCrawler will automatically generate pandas dataframe to restore the banker speeches and the urls to the textual document.

What bisCrawler Scrapes

This designed crawler automatically webscrapes the central bankers' speeches from the offical website, including a bunch of information with regards to each name of central banker, date and title and corresponding url to the textual document.

Websraped Data

The scraped dataframe will be stored as central_bank_speeches.csv in the bisCrawler folder.

Cite

Please cite this page if you use this toolkit for your research.

For example, with BibTeX:

@misc{bisCrawler,
    howpublished = {\url{https://github.com/davidycliao/bisCrawler}},
    title = {bisCrawler: An Automation Webcrawler for Extracting Central Bankers' Speeches},
    author = {David Yen-Chieh Liao and Li Tang},
    publisher = {GitHub},
    year = {2021}
}

biscrawler's People

Contributors

davidycliao avatar trellixvulnteam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

trellixvulnteam

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.