GithubHelp home page GithubHelp logo

doersino / wikipediastats Goto Github PK

View Code? Open in Web Editor NEW
2.0 3.0 0.0 69 KB

A Haskell-powered Twitter bot that posts milestones and statistics of various Wikipedias.

Home Page: http://twitter.com/wikipediastats

License: MIT License

Haskell 100.00%
twitter-bot haskell wikipedia wikipedia-scraper mediawiki

wikipediastats's Introduction

wikipediastats

As it turns out, the otherwise-excellent shared hosting plan I'm running all of my Twitter bots on limits RAM use to 1.5 GB per user, which is insufficient for building some of the dependencies of this bot. As a result, I've decided to reimplement it in JavaScript โ€“ another language that I wanted to get up to speed on (in a server context, anyway, i.e. with Node.js). Long story short: This repository will remain unmaintained.


A Haskell-powered Twitter bot that posts milestones and statistics of various Wikipedias.

While the main purpose of building this Twitter bot was to get myself reacquainted with Haskell, it's actually doing semi-interesting stuff. Whenever you run this program, it

  1. downloads and parses a list of all the different-language Wikipedias,
  2. scrapes some of the more interesting statistics for each of them,
  3. compares these stats to previously scraped and cached values (unless the cache doesn't exist, in which case goto 5),
  4. fires off a tweet if a milestone has been reached, i.e. the first digit of a stat has changed (e.g. 49894 โ†’ 50002), and
  5. refreshes the now-stale cache with the newly scraped values.

Now witness the firepowerresults of this fully armed and operational battle stationTwitter bot and check out @wikipediastats!

Setup

Fairly typical for a modern Haskell thing, I believe. First, install a reasonably recent release of Stack. Then:

$ git clone https://github.com/doersino/wikipediastats
$ cd wikipediastats
$ stack build

If that's been successful, make a copy of config.sample.ini, name it config.ini and fill in your Twitter API credentials as described in the helpful comment you'll find in there.

Run the bot at least once during setup to build the initial cache:

$ stack run

Optionally, you can play around a bit:

$ stack repl

Or verify that everything's working just swell:

$ stack test

If you're actually intending to use this as a Twitter bot, set up a cronjob to execute stack run every hour or so, roughly like this:

0 * * * * cd PATH_TO_WIKIPEDIASTATS && stack run

Notes

  • This three-afternoon project was my first foray into Haskell after not touching it for a couple of years (and back then, I was firmly lodged in the beginner-to-intermediate gap). Don't expect elegance, custom monads or adherence to best practices.
  • I haven't bothered listing version ranges for the dependencies of this tool in package.yaml because I don't know which past or future versions will invariably break things, but I think the fixed Stackage resolver version makes this less problematic than it used to be before Stack was around? Not sure. If you, a future software historian, can't seem to get the dependencies to play along nicely, I'm quite sorry.
  • An improvement I didn't care to implement: Store the largest tweeted value (for each stat, for each Wikipedia) in the cache in order to avoid duplicate tweets when the stat reaches a milestone, falls below it again due to article deletions or similar, then reaches the milestone again.

wikipediastats's People

Contributors

doersino avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.