GithubHelp home page GithubHelp logo

mstrlaw / thoro.news Goto Github PK

View Code? Open in Web Editor NEW
11.0 2.0 2.0 9.62 MB

A real-time news aggregator

Home Page: https://frontcover.ai

License: MIT License

JavaScript 20.99% CSS 17.15% Vue 61.86%
vue nuxtjs javascript news-aggregator news

thoro.news's Introduction

Automated news aggregator.

Sources Badge Articles Badge Uptime

Header

What is Thoro News

Thoro News - or simply Thoro - is a free news aggregator service that scrapes thousands of articles on a daily basis and groups them into common topics.

It allows to understand what are the dominating conversations and themes around the media landscape as well as discovering articles that could otherwise be missed under traditional platforms for news consumption [i.e. Facebook, Apple News, Twitter, Reddit, etc).

It is @mstrlaw's side project.

The case for creating a news aggregator

Thoro was born out multiple needs that weren't met from other services, mainly:

  • Ability to follow a very large amount of news sources and still be able to consume the information in a time-conscious way;
  • Avoid the prevalent filter-bubble from social media algorithms that show you "what we think you'll want to see";
  • Provide a way to see what are trending topics at a glance and easily explore each trend cluster;

Features

  • +120 unique news sources and counting;
  • Near real-time retrieval of articles;
  • Classification of articles into categories (Business, Politics[Global, U.S.A. & European] and Technology )
  • Clustering of common news articles globally and per category;
  • Display of realtime cryptocurrency market capitalization % change (provided by coincap.io);
  • Display of related top Tweets when exploring a trend cluster;

Stack

Frontend using Bulma + custom SCSS. Individual components can be checked here.

Thoro is entirely written using Javascript.

Architecture

Thoro follows a microservice architecture pattern comrpised of a Nuxt.Js web app, a Node/Express API, two NodeJs scraping services (getter_1 & getter_2) and one data cruncher service.

The flow is +/- as follows:

  1. scraper_1 & scraper_2 each run multiple scheduled cron jobs to retrieve articles and insert them in the DB;
  2. cruncher service runs multiple scheduled cron jobs that: a) clean and normalize articles' data for later processing; b) classify articles into categories; c) generate theme clusters globally and per each categories by aggregating up to 5000 articles inserted since start of given day;
  3. Web app requests data through the API;

Architecture

TODO/Future

  • Merge tests branch
  • Move remaining ToDo list to here

thoro.news's People

Contributors

mstrlaw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

thoro.news's Issues

Coincap.io assets endpoint returning 404

Issue

Requesting an asset's data from Coincap.io returns a 404.

Current State

For instance when requesting BTC, we're using the endpoint coincap.io/page/BTC which does not work anymore.

Desired State

We should migrate to the API v2.0 which uses the endpoint api.coincap.io/v2/assets/bitcoin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.