GithubHelp home page GithubHelp logo

sangarshanan / webscraping-and-analysis-of-medium-articles Goto Github PK

View Code? Open in Web Editor NEW
28.0 4.0 17.0 861 KB

Scraping medium articles tagged under ML,DL and AI and performing Analysis

Jupyter Notebook 100.00%
webscraping beautifulsoup selenium medium-article python

webscraping-and-analysis-of-medium-articles's Introduction

Web Scraping and Analysis of Medium articles

Web scraping automatically extracts data and presents it in a format you can easily make sense of.We are going to use Python as our scraping language, together with a simple and powerful library, BeautifulSoup

For Dynamically loading the webpage we use Selenium along with chromedriver

Selenium WebDriver is a collection of open source APIs which are used to automate the testing of a web application. Description: Selenium WebDriver tool is used to automate web application testing to verify that it works as expected. It supports many browsers such as Firefox, Chrome, IE, and Safari.

Scraping Rules

  • You should check a website’s Terms and Conditions before you scrape it. Be careful to read the statements about legal use of data. Usually, the data you scrape should not be used for commercial purposes.
  • Do not request data from the website too aggressively with your program (also known as spamming), as this may break the website. Make sure your program behaves in a reasonable manner (i.e. acts like a human). One request for one webpage per second is good practice.
  • The layout of a website may change from time to time, so make sure to revisit the site and rewrite your code as needed

Exploratory analysis of the data has also been done

Analysis is done

  • Author wise
  • Month wise
  • Tag wise and so on...

The resulting visualizations help us understand data science based medium articles better...

webscraping-and-analysis-of-medium-articles's People

Contributors

sangarshanan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.