GithubHelp home page GithubHelp logo

rss-puppy's Introduction

RSS Puppy

RSS Puppy is an BuzzFeed Open Lab project from our first cohort. You're welcome to fork it and build on it and learn from it, but unfortunately we're not able to provide support. 😢

A watchdog tool for monitoring RSS feeds

This tool is designed to monitor RSS feeds in bulk, and to generate machine friendly notifications when new entries appear. While there exists no shortage of RSS readers and web-based notification services, nothing we found combines easy managment of hundreds of RSS feeds with the flexibility to direct output to a variety of data stores or over disparate protocols.

This monitor can be run on any cloud service provider, and requires only Nodejs and a PostgreSQL database. Also, it is trivial to add output handlers which can pipe feed entry data to any service you use.

Read more about the motivation and design.

How to run

Get the code

  • git clone https://github.com/buzzfeed-openlab/rss-puppy.git
  • cd rss-puppy; npm install
  • cp ./sample-config.json ./config.json

Set up a database

The monitor uses a PostgreSQL database to keep track of feeds and entries.

An easy way to get a reliable, automatically backed up database is to use AWS. Log into the management console, navigate to the RDS dashboard, click Launch a DB instance, select PostgreSQL, and follow the rest of the configuration steps.

Once you have your DB, you can tell the monitor about it using the config.json file:

{
    "dbconfig": {
        "user": "DB USER",
        "password": "DB USER PW",
        "url": "POSTGRES DB PATH",
        "port": 5432,
        "dbname": "DB NAME",
        "initScript": "./monitor/init-feed-db.sql"
    },
    ...
}

Configure your feeds

In the config file there will be a section called "feeds" and a section called "throttling".

{
    "feeds": [
        "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001440512&type=&dateb=&owner=exclude&start=0&count=40&output=atom",
        ...
    ],

    "throttling": {
        "monitorFrequency": 8000,
        "maxConcurrent": 10,
        "concurrentInterval": 1000
    },
    ...
}
  • "feeds" is an array of RSS feed urls that will be monitored.
  • "throttling" is broken into several parts
    • "monitorFrequency": How often the monitor will check to see if it needs to query any "old" RSS feeds (ie: ones that haven't been queried in awhile).
    • "maxConcurrent": The maximum number of concurrent queries the monitor will make (excess queries will be queued).
    • "concurrentInterval": The interval to wait between making "maxConcurrent" queries (ie: X queries per 10 seconds, or X queries per 60 seconds).

Configure your outputs

Outputs are modules of code that listen for events that the monitor emits and do something useful with the resulting data.

There are several different kinds of events:

  • "new-entry": Emitted when the monitor encounters an entry that it has not seen before. Handlers will be invoked with the entry as a json object, and the feed url as a string.
  • "checking-old-feeds": Emitted whenever the monitor wakes up to look for feeds to query (approx every "monitorFrequency" seconds).
  • "old-feed": Emitted whenever the monitor finds a feed that hasn't been queried in awhile and needs to be checked. Handlers will be invoked with the feed url as a string.
  • "entry": Emitted whenever an entry is parsed from a feed. Note that feeds will be queried and parsed over and over again, so this will be emitted for the same entry many times. Handlers will be called with the entry as a json object and the feed url as a string.
{
	"outputs": [
        {
            "file": "./outputs/debug-logger.js",
            "config": {
                "showOldFeedMessages": true
            }
        },
        ...
    ],
    ...
}

Run the monitor!

node ./run.js [/path/to/config.json]

rss-puppy's People

Contributors

westleyargentum avatar amandabee avatar dekhtiarjonathan avatar

Stargazers

AcidDS avatar Mustain avatar Coool (github.com/Coool) avatar  avatar  avatar tg-z avatar Jake avatar Kelvin Lockwood avatar  avatar Scott Ivey avatar  avatar Sylvan Ravinet avatar Lucky avatar Patrik avatar Berkay UNAL avatar  avatar NourEddine Yassine avatar Michael Fox avatar Ankur avatar  avatar Nat Welch avatar  avatar Alexis avatar batuhan avatar Ruben Schade avatar Nathan avatar Jacob Fenton avatar  avatar Michael O'Brien avatar Chris Zubak-Skees avatar Jacqui Lough avatar  avatar Hunter avatar Matt Southwell avatar Greg avatar Robert Cerasa avatar AmirHossein avatar NIck avatar Meteorsnows avatar ᴘɪᴇʀᴄᴇ ᴍᴏᴏʀᴇ™ avatar  avatar  avatar Felix  avatar  avatar Anatoli Nicolae avatar Talia Retelny avatar John Pyper avatar James B. Pollack, MFA  avatar Moritz Mädler avatar Martin Stabe avatar Cam Peterson avatar Chris Witko avatar Andrew Shaffer avatar Renato Besen avatar Shiv Prakash avatar Lorenz Schmid avatar Arturo Escartin avatar  avatar Satnam Narang avatar Justin Myers avatar John Olinda avatar Guttorm avatar Scott Walter avatar Jeremy Sutton avatar Matt H avatar  avatar Michael Mizrahi avatar  avatar  avatar  avatar Eido Inoue avatar  avatar compwron avatar Jim Procter avatar  avatar Alexandr Zahatski avatar Elmer Masters avatar Chris avatar  avatar Paul Zenke avatar Charles Beganskas avatar Adam avatar dalan avatar Martin Lacayo-Emery avatar David Somers avatar Erlend Klakegg Bergheim avatar Chris Peterson avatar Robert Audi avatar Dewang Mistry avatar Ryan Richards avatar Logan Spangler avatar Matthew Gipp avatar  avatar Greg Ferro avatar Mark Lapierre avatar Kota Fullsour avatar Dewa Widyakumara avatar Matthew Cassinelli avatar Yuya Saito avatar Terry Chou avatar

Watchers

Frederic Jacobs avatar Arnstein Henriksen avatar Barrett A. avatar James Cloos avatar  avatar Saran Siriphantnon avatar (◕ᴥ◕) avatar Jake Schwartz avatar  avatar Arsalan Arif avatar Coool (github.com/Coool) avatar  avatar

rss-puppy's Issues

containerize

If this had a docker file it would be even easier to throw it up into a cloud provider

On error event

Hi. It's me again. Can we have some kind of on-error event added please? It'd be nice if it also passed some clear text about error.

SQL

From a look at it - there aren't any Postgres specific SQL commands on there - is it just mentioned as a requirement as it's your SQL server of choice - or am I missing something?

add cli options

specifically, it would be nice to support either wiping out the "entries" table or leaving it alone on startup

Error : ENOTFOUND

Hello. I've been trying to set RSS puppy up. I'm getting { [Error: getaddrinfo ENOTFOUND] code: 'ENOTFOUND', errno: 'ENOTFOUND', syscall: 'getaddrinfo' } and I have no idea what's referring to. Here's config file:
{
"feeds": [ "https://globenewswire.com/Rss/search/XwXK98ZDWLc8N4m3FssORcAcOHv2hsB6FxDQQN3Yc3oAyuPrMfAte0Pc9aqo_dXL"
],

"throttling": {
    "monitorFrequency": 8000,
    "oldFeedThreshold": 30,
    "maxConcurrent": 10,
    "concurrentInterval": 1000
},

"dbconfig": {
    "user": "rsspuppy",
    "password": "pwhere",
    "url": "127.0.0.1",
    "port": 5432,
    "dbname": "rsspuppy",
    "initScript": "./monitor/init-feed-db.sql"
},

"outputs": [
    {
        "file": "./outputs/debug-logger.js",
        "config": {
            "showOldFeedMessages": true
        }
    }
],

"exitOnError": true

}

Also, it'd be nice if there was more documentation available

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.