GithubHelp home page GithubHelp logo

siathalysedi / rss-puppy Goto Github PK

View Code? Open in Web Editor NEW

This project forked from buzzfeed-openlab/rss-puppy

0.0 3.0 0.0 41 KB

A watchdog tool for monitoring RSS feeds

License: MIT License

JavaScript 100.00%

rss-puppy's Introduction

RSS Puppy

A watchdog tool for monitoring RSS feeds

This tool is designed to monitor RSS feeds in bulk, and to generate machine friendly notifications when new entries appear. While there exists no shortage of RSS readers and web-based notification services, nothing we found combines easy managment of hundreds of RSS feeds with the flexibility to direct output to a variety of data stores or over disparate protocols.

This monitor can be run on any cloud service provider, and requires only Nodejs and a PostgreSQL database. Also, it is trivial to add output handlers which can pipe feed entry data to any service you use.

Read more about the motivation and design.

How to run

Get the code

  • git clone https://github.com/buzzfeed-openlab/rss-puppy.git
  • cd rss-puppy; npm install
  • cp ./sample-config.json ./config.json

Set up a database

The monitor uses a PostgreSQL database to keep track of feeds and entries.

An easy way to get a reliable, automatically backed up database is to use AWS. Log into the management console, navigate to the RDS dashboard, click Launch a DB instance, select PostgreSQL, and follow the rest of the configuration steps.

Once you have your DB, you can tell the monitor about it using the config.json file:

{
    "dbconfig": {
        "user": "DB USER",
        "password": "DB USER PW",
        "url": "POSTGRES DB PATH",
        "port": 5432,
        "dbname": "DB NAME",
        "initScript": "./monitor/init-feed-db.sql"
    },
    ...
}

Configure your feeds

In the config file there will be a section called "feeds" and a section called "throttling".

{
    "feeds": [
        "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001440512&type=&dateb=&owner=exclude&start=0&count=40&output=atom",
        ...
    ],

    "throttling": {
        "monitorFrequency": 8000,
        "maxConcurrent": 10,
        "concurrentInterval": 1000
    },
    ...
}
  • "feeds" is an array of RSS feed urls that will be monitored.
  • "throttling" is broken into several parts
    • "monitorFrequency": How often the monitor will check to see if it needs to query any "old" RSS feeds (ie: ones that haven't been queried in awhile).
    • "maxConcurrent": The maximum number of concurrent queries the monitor will make (excess queries will be queued).
    • "concurrentInterval": The interval to wait between making "maxConcurrent" queries (ie: X queries per 10 seconds, or X queries per 60 seconds).

Configure your outputs

Outputs are modules of code that listen for events that the monitor emits and do something useful with the resulting data.

There are several different kinds of events:

  • "new-entry": Emitted when the monitor encounters an entry that it has not seen before. Handlers will be invoked with the entry as a json object, and the feed url as a string.
  • "checking-old-feeds": Emitted whenever the monitor wakes up to look for feeds to query (approx every "monitorFrequency" seconds).
  • "old-feed": Emitted whenever the monitor finds a feed that hasn't been queried in awhile and needs to be checked. Handlers will be invoked with the feed url as a string.
  • "entry": Emitted whenever an entry is parsed from a feed. Note that feeds will be queried and parsed over and over again, so this will be emitted for the same entry many times. Handlers will be called with the entry as a json object and the feed url as a string.
{
	"outputs": [
        {
            "file": "./outputs/debug-logger.js",
            "config": {
                "showOldFeedMessages": true
            }
        },
        ...
    ],
    ...
}

Run the monitor!

node ./run.js [/path/to/config.json]

rss-puppy's People

Contributors

westleyargentum avatar

Watchers

Arnstein Henriksen avatar James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.