clifg / beernotifier Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 0.0 275 KB

Watch local taplists and send notifications

JavaScript 88.77% HTML 10.95% CSS 0.28%

beernotifier's People

Contributors

Stargazers

Watchers

beernotifier's Issues

Dependencies should be split to "prod" and "dev"

So we don't need to deploy a bunch of testing dependencies on the production environment.

Add bar location data for each data source

We'll want to be able to scope notifications by distance, so we'll need some lat/long type info for each data source. This will also enable us to start tracking other cities if we wanted to.

Make the home page beer list scrollable

That way you don't have to scroll all the way to the top in order to see the filter checkboxes.

Sessions collection is overloaded, occasionally resulting in db contention

I have no idea why. It doesn't repro on a local build. I turned on mongodb profiling the other day and after seeing the issue like 5 times I looked and only had one request go over 100ms, and it was only 235ms, while the delays were on the order of 2-8 seconds.

The response is 304 (not updated) so basically the server is grabbing data from mongo, see if it's changed since the last time that client requested it (through an etag) and returning 304 to tell the client that nothing has changed and it should use the cached version, so we're not even sending much data.

My best guess now is that there's something busted with the node/express JS I wrote. Probably some call that needs to be made asynchronous. The Chrome developer tools network tab is useful for seeing what's slow when clicking around or reloading a page on the site.

I just went to get a screenshot, and right now literally everything is slow, so... yeah, I dunno. Node has a built-in profiling flag that might be of use?

Set up continuous integration on github

Travis CI looks like the most popular framework, so that's a good starting point.

Naked City scraper is failing to return any beers

"ERROR: Found no beers at Naked City "

Add a scraper for Brouwer's

http://brouwerscafe.com/draught.pdf

This one is a pdf, so it'll be a new kind of challenge. There are lots of javascript libraries out there for parsing PDFs, so we need to find one that works and find a way to get just the beers. The problem is that the headers are of the states/countries and don't say "beer" anywhere, but I think we may be able to get away with a heuristic like: Anything in the tables on the front page that are under a heading that doesn't have "Ciders" in it.

Clearly a scraper we'll want to keep a close eye on as we may need to tweak that heuristic... If we're lucky, maybe we can find some other data source that they're generating the PDF from, like the Yard/Dray or Toronado?

Location update frequency charts should have a common scale

Which would make it easier to compare them. In fact, there should be a page where they're all charted side-by-side. But anyway, some hard-coded y-axis max of 15 or 20 is probably fine.

Add Facebook and/or Google Auth

Nobody likes having to set up yet another username/password. It should be an option, but FB/Google are pretty easy to configure with PassportJS so we should use them too.

Not sure if we need to support both. Maybe just local account and Facebook auth? I have an example of FB auth already working at https://github.com/clifg/daily-fantasy-football

Filter state is lost when navigating to a location page then back again

I think we probably need an AngularJS service backing the model for the home page to remember state we want to persist.

Set up an email account from the custom domain

Rather than using the burner gmail account I borrowed from jankkings.com

Set up SSL

We'll be sending usernames and passwords so we can't use plain http

Store listings that have been filtered out during scraping

When we have a heuristic for skipping some listings, like flights from chucks or any code trying to skip over root beer, kombucha, or CHARADANAY, we should stuff those in a new collection in the database so we can skim through occasionally to make sure we're not filtering out anything important.

Should store the listing, date, and data source. Maybe also have a "notes" section where we can put some context on why it was filtered (because filters may change over time).

updateDataSources should only ping slack on success if it was pinged for failure

Bug in the fix for #36 -- We have a threshold for when we ping on failure, but not on when we should send a success message. I thought I would cut down on complexity and not store state about slack pings in the dataSource, but we probably actually need it.

Ideally, we should add some methods to the DataSource model for this because we'll now have multiple properties that should be updated together, so some public methods would be good. But... for now we should at least just patch it up.

Barking Dog Alehouse listings should be camel-cased

RIGHT NOW THEY'RE ALL-CAPS AND THAT'S ANNOYING.

The site is ugly

Plain UI is fine, but right now it's kind of broken. The main table doesn't really format well on mobile. This is probably an issue that should be a bunch of sub-issues, or if someone knows what they're doing or has a designer friend we could just make them do it all.

Sixgill listings are missing brewery names

Looks like the Sixgill layout is like Noble Fir now, and we're only recording the beer name. Code for this fix can be stolen (or refactored, if we want to be fancy) from the Noble Fir scraper.

Add a scraper for Toronado Seattle

http://toronadoseattle.com/

There are actually two taplist displays. The first is on the http://toronadoseattle.com/ page, but then they have a link to the "live beer board" here: http://toronadoseattle.com/beers/index.html

I'm assuming the live beer board is updated more frequently, but it is going to be much harder to parse. The layout of that page is the most ridiculous one yet.

Add scraper unit tests

We can use the sinon library to stub out request.get and supply some sample HTML for parsing validation (http://bulkan-evcimen.com/testing_with_mocha_sinon)

Coming up with the sample HTML can probably be simple at first -- just take whatever it there at the time, but modify some of the more plain entries to ensure we hit all of the special case code, like skipping flights @ chucks.

Remove "(1/2 pint)" type strings from Chuck's listings

Some Chuck's GW/Central listings will have "1/2 pint", "(1/2 pint)" or "1/2 pint -->" type strings in them, so we should remove those. The "1/2 pint" part seems to be pretty consistent, so maybe strip the following from the listings, in order: "1/2 pint" "()" "-->"?

Update the Sixgill scraper

Looks like the Sixgill updated their website. New url (http://thesixgill.com/tap-list/) and new format to parse.

Add a switch to temporarily disable scraping of a datasource.

When a scraper is acting up, we need a way to disable it without removing it from the database entirely.

Remove jade dependency

All it's doing is serving up the frame for Angular, which can be a static file.

Build a web admin interface for managing data sources

Need to CRUD the data sources in a way that's nicer than the database command shell. This looks useful: http://www.forms-angular.org/

Save rich beer listing data, when available

Some sites like Noble Fir or Chuck's provide better data (brewery and beer name are separated for us) and we should save that. The ABV and whatever else is there should be kept too. We need a schema for this rich beer listing data, but the record should still keep a "display name" property that the scraper generates, as that will be all we have from some places.

This allows our clients to show more data when it's available, at least until we get to a point where we start trying to create our database of "proper" listings.

Secure REST API endpoints

Right now it's totally open, which is great for development, but needs to be fixed.

In a previous project, I just used the client session to handle authentication and to know who was making the call, but for this project we should be requiring auth to be passed on every hit to protected endpoints, since we'll want mobile apps and those won't have browser cookies/sessions.

This will probably also help a lot with some of the perf issues we're (still) seeing every once in a while on the live site.

I don't really know how to do this yet. I believe the docs for PassportJS (which we're using for authentication) have some details on how to do this, at least at a high level.

Better custom domain?

I think this will be useful enough that at some point we should let the general public use it and see what people think. I grabbed seattlebeerfinder.com because it was available, but someone more creative than me can probably think of a better domain. Something shorter, perhaps.

Show full tap listing history on the location page

Right now it's only an update frequency chart. We should show the full history. Not sure if we should bother paginating or not. I suppose someday it might be enough data that we'd need to but probably not necessary yet, unless someone is very enthusiastic about it...

Don't ping slack unless a scraper has failed 2 or 3 times in a row

Lots of one-off failures. Some of these places have horrible web hosting. This should bring the noise in slack way down.

Add a scraper for Mammoth in Eastlake

They have a good taplist and the site looks pretty easy to scrape: http://mammothseattle.com/

Decide if we actually need email confirmation for account activation

I think this is nice to have, but it's an extra barrier to usage and it seems pretty much every site has done away with this these days.

We need to set up Google Analytics

Before going live, we need analytics for the client site. Should also see if GA is useful for mobile apps as well, or if there are better-suited frameworks for tracking app usage.

Add angular form validation to the sign up page

I had hoped that we could just use the nice-looking browser-based validation, but it looks like that doesn't work on Safari on iOS, which is a pretty common browser, so we'll need to use this stuff:
https://docs.angularjs.org/guide/forms

We need a testing framework

I don't know much about Javascript testing frameworks, but Mocha seems popular. The scrapers should be easy to isolate and test, and would be a good place to start. The next obvious spot for straight-forward tests would be the API endpoints.

Move code coverage command to packages.json scripts

We should be able to run it with "npm" just like our regular tests. No need for an extra shell script...

Noble Fir listings are missing brewery data

Right now we're just grabbing the beer name, but we should concatenate the brewery info into the listing as well.

Try to correlate beers across data sources

This is a big one. Some data sources are awesome and split the listing into brewery and beer components, but others don't.

Ideally it would be cool if I could click a beer listing and see every time that beer has been on tap at any data source, but that can get tough. For example, these should all be the same beer:
Fremont IPA
Fremont Brewing IPA
Fremont Interurban IPA
Fremont Brewing Co Interurban
etc...

One way of doing this would be to try to generate a master list of all beers that we know of, and then have an algorithm for trying to match the listing we see with one of them. Part of it would be simple edit distance calculations, but we'd need a lot of custom logic. Then we would probably need some confidence thresholds so some matches are automatic and others are pending.

BeerAdvocate.com are jerks about programmatically accessing their data. RateBeer apparently has an API but they don't give out keys anymore. Untappd seems to be the most open, so it might be worth seeing if they can become our beer database. I think there are other great features we could have by integrating with ratings from Untappd as well.

Intermittent issue with Naked City

Happened on every update for a few days. Something busted in the parsing. If nothing else, scrapers should not return any empty strings in the beers array.

Feb 28 10:36:19 seattlebeerfinder app/scheduler.6995: Updating data source: Naked City
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: +
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: - Naked City Magnificent Seven Anniversary IPA
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: /app/bin/updateDataSources:182
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: console.log('ERROR: Error saving tap listing: ' + listing.rawListing);
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: ^
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: TypeError: Cannot read property 'rawListing' of undefined
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: at /app/bin/updateDataSources:182:86
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: at /app/node_modules/mongoose/lib/document.js:1813:19
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: at handleError (/app/node_modules/hooks-fixed/hooks.js:40:22)
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: at _next (/app/node_modules/hooks-fixed/hooks.js:46:22)
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: at fnWrapper (/app/node_modules/hooks-fixed/hooks.js:186:18)
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: at /app/node_modules/mongoose/lib/schema.js:236:13
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: at complete (/app/node_modules/mongoose/lib/document.js:1164:7)
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: at /app/node_modules/mongoose/lib/document.js:1195:20
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: at ObjectId.SchemaType.doValidate (/app/node_modules/mongoose/lib/schematype.js:682:12)
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: at /app/node_modules/mongoose/lib/document.js:1191:9
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: at nextTickCallbackWith0Args (node.js:453:9)
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: at process._tickCallback (node.js:382:13)

Email or Slack notifications on scraper errors

It would be good to know when a scraper fails (like right now, the Sixgill is failing every time, and earlier this week it was Prost). However, we'll need a way to not spam when they fail every 10 mins. Maybe ping slack when we see an error, but then don't ping again until there's been a successful scrape. Probably put a new property (properties?) on the dataSource model to keep track of errors and notifications.

clifg / beernotifier Goto Github PK

beernotifier's People

Contributors

Stargazers

Watchers

beernotifier's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs