GithubHelp home page GithubHelp logo

ilhamqasse / dapps-scraping Goto Github PK

View Code? Open in Web Editor NEW
6.0 2.0 6.0 432 KB

This Projects scrapes the Dapps websites and repositories such as the state of Dapps and Dappradar

Python 98.96% Shell 1.04%

dapps-scraping's Introduction

DApps-Scraping

The main objective of this project is to study and analyze the quality of the decentralised applications available in some public repositories. This project scrapes the DApps websites and repositories such as the state of DApps and Dappradar

The extracted datasets are available in Zenodo : https://zenodo.org/record/3382127.

Installation

Use the package manager pip to install the required packages.

pip install -r requirements.txt

The used python package Selenium requires a chrome driver to be downloaded. Please downloaded following the below URL.

(https://chromedriver.chromium.org/downloads)

For Linux and Mac OS users, run the following script to download the chrome driver:

./download.sh 

Please make sure that the version of you chrome is 76, otherwise update your chrome or install the chrome driver for your version.

For Mac users, please make sure you have wget installed in your system, use the following command to install it:

brew install wget 

Once the driver is downloaded, please check the path of the chrome driver to both scripts (DappRadar.py, stateDapps.py). if you have downloaded the chrome driver manually, please change the path specified in the codes to your own path. You don't have to change the path if you have used the script to download your driver.

Usage

To scrape the required websites we have used the package Selenium. We have created three scripts:

  1. the first script is to crawl the DappRadar webpage. To run the script use the following comand:
python DappRadar.py

For testing purposes you can specify the number of pages you want to scrape. The command below crawles only three pages.

python DappRadar.py 3 
  1. The second script scrapes the State of the Dapps website. The command to run the script is:
python stateDapps.py

For testing purposes you can specify the number of pages you want to scrape. The command below crawles only three pages.

python stateDapps.py 3
  1. The third script scrapes the dapp.com website. The command to run the script is:
python dappcom.py

For testing purposes you can specify the number of pages you want to scrape. The command below crawles only three pages.

python dappcom.py 3

The scraping time depends on the number of the pages, and it may take 1 to 2 hours to fully run the script. Once the extraction are done, the scripts will generate plots from the extracted data and automatically save them in a folder with the website name and date of the run.

Disclaimer

Be aware that web scraping is considered a bad practice. Please be advised that this was created for research and education purposes only.

dapps-scraping's People

Contributors

ilhamqasse avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.