GithubHelp home page GithubHelp logo

isabella232 / newsarticleclustering Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mangothecat/newsarticleclustering

0.0 0.0 0.0 39 KB

Proof of concept work for clustering of news articles from RSS feeds

Python 25.59% R 74.41%

newsarticleclustering's Introduction

NewsArticleClustering

The scripts in this repository represent a proof of concept in clustering news articles from RSS feeds.

Usage

The main tm_analysis.R script starts the analysis, and calls out to process_feeds.py to fetch the article feeds before performing the clustering analysis. Additional utility functions to manipulate the resulting JSON and parse in the correct metadata to the VCorpus object in tm are included in processing_utils.R

Dependencies:

Python:

  • requests
  • BeautifulSoup
  • feedparser

R:

  • jsonlite
  • tm
  • SnowballC
  • proxy
  • dendextend

Example Visualisations:

An example of the clusters formed from 475 articles published over a 4 day period is shown below where the leaf nodes are coloured according to their source, with blue corresponding to BBC News, green to The Guardian, and indigo to The Independent. The utility function plot_dend in processing_utils.R was used to make the figures.

GitHub Logo

Zooming in on a cluster of articles around Storm Desmond and the flooding in Cumbria in Dec 2015.

GitHub Logo

newsarticleclustering's People

Contributors

mrkriss avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.