GithubHelp home page GithubHelp logo

beernotifier's People

Contributors

clifg avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

beernotifier's Issues

Add bar location data for each data source

We'll want to be able to scope notifications by distance, so we'll need some lat/long type info for each data source. This will also enable us to start tracking other cities if we wanted to.

Sessions collection is overloaded, occasionally resulting in db contention

I have no idea why. It doesn't repro on a local build. I turned on mongodb profiling the other day and after seeing the issue like 5 times I looked and only had one request go over 100ms, and it was only 235ms, while the delays were on the order of 2-8 seconds.

The response is 304 (not updated) so basically the server is grabbing data from mongo, see if it's changed since the last time that client requested it (through an etag) and returning 304 to tell the client that nothing has changed and it should use the cached version, so we're not even sending much data.

My best guess now is that there's something busted with the node/express JS I wrote. Probably some call that needs to be made asynchronous. The Chrome developer tools network tab is useful for seeing what's slow when clicking around or reloading a page on the site.

I just went to get a screenshot, and right now literally everything is slow, so... yeah, I dunno. Node has a built-in profiling flag that might be of use?

image

Add a scraper for Brouwer's

http://brouwerscafe.com/draught.pdf

This one is a pdf, so it'll be a new kind of challenge. There are lots of javascript libraries out there for parsing PDFs, so we need to find one that works and find a way to get just the beers. The problem is that the headers are of the states/countries and don't say "beer" anywhere, but I think we may be able to get away with a heuristic like: Anything in the tables on the front page that are under a heading that doesn't have "Ciders" in it.

Clearly a scraper we'll want to keep a close eye on as we may need to tweak that heuristic... If we're lucky, maybe we can find some other data source that they're generating the PDF from, like the Yard/Dray or Toronado?

Set up SSL

We'll be sending usernames and passwords so we can't use plain http

Store listings that have been filtered out during scraping

When we have a heuristic for skipping some listings, like flights from chucks or any code trying to skip over root beer, kombucha, or CHARADANAY, we should stuff those in a new collection in the database so we can skim through occasionally to make sure we're not filtering out anything important.

Should store the listing, date, and data source. Maybe also have a "notes" section where we can put some context on why it was filtered (because filters may change over time).

updateDataSources should only ping slack on success if it was pinged for failure

Bug in the fix for #36 -- We have a threshold for when we ping on failure, but not on when we should send a success message. I thought I would cut down on complexity and not store state about slack pings in the dataSource, but we probably actually need it.

Ideally, we should add some methods to the DataSource model for this because we'll now have multiple properties that should be updated together, so some public methods would be good. But... for now we should at least just patch it up.

The site is ugly

Plain UI is fine, but right now it's kind of broken. The main table doesn't really format well on mobile. This is probably an issue that should be a bunch of sub-issues, or if someone knows what they're doing or has a designer friend we could just make them do it all.

Sixgill listings are missing brewery names

Looks like the Sixgill layout is like Noble Fir now, and we're only recording the beer name. Code for this fix can be stolen (or refactored, if we want to be fancy) from the Noble Fir scraper.

Add scraper unit tests

We can use the sinon library to stub out request.get and supply some sample HTML for parsing validation (http://bulkan-evcimen.com/testing_with_mocha_sinon)

Coming up with the sample HTML can probably be simple at first -- just take whatever it there at the time, but modify some of the more plain entries to ensure we hit all of the special case code, like skipping flights @ chucks.

Remove "(1/2 pint)" type strings from Chuck's listings

Some Chuck's GW/Central listings will have "1/2 pint", "(1/2 pint)" or "1/2 pint -->" type strings in them, so we should remove those. The "1/2 pint" part seems to be pretty consistent, so maybe strip the following from the listings, in order: "1/2 pint" "()" "-->"?

Save rich beer listing data, when available

Some sites like Noble Fir or Chuck's provide better data (brewery and beer name are separated for us) and we should save that. The ABV and whatever else is there should be kept too. We need a schema for this rich beer listing data, but the record should still keep a "display name" property that the scraper generates, as that will be all we have from some places.

This allows our clients to show more data when it's available, at least until we get to a point where we start trying to create our database of "proper" listings.

Secure REST API endpoints

Right now it's totally open, which is great for development, but needs to be fixed.

In a previous project, I just used the client session to handle authentication and to know who was making the call, but for this project we should be requiring auth to be passed on every hit to protected endpoints, since we'll want mobile apps and those won't have browser cookies/sessions.

This will probably also help a lot with some of the perf issues we're (still) seeing every once in a while on the live site.

I don't really know how to do this yet. I believe the docs for PassportJS (which we're using for authentication) have some details on how to do this, at least at a high level.

Better custom domain?

I think this will be useful enough that at some point we should let the general public use it and see what people think. I grabbed seattlebeerfinder.com because it was available, but someone more creative than me can probably think of a better domain. Something shorter, perhaps.

Show full tap listing history on the location page

Right now it's only an update frequency chart. We should show the full history. Not sure if we should bother paginating or not. I suppose someday it might be enough data that we'd need to but probably not necessary yet, unless someone is very enthusiastic about it...

We need to set up Google Analytics

Before going live, we need analytics for the client site. Should also see if GA is useful for mobile apps as well, or if there are better-suited frameworks for tracking app usage.

We need a testing framework

I don't know much about Javascript testing frameworks, but Mocha seems popular. The scrapers should be easy to isolate and test, and would be a good place to start. The next obvious spot for straight-forward tests would be the API endpoints.

Try to correlate beers across data sources

This is a big one. Some data sources are awesome and split the listing into brewery and beer components, but others don't.

Ideally it would be cool if I could click a beer listing and see every time that beer has been on tap at any data source, but that can get tough. For example, these should all be the same beer:
Fremont IPA
Fremont Brewing IPA
Fremont Interurban IPA
Fremont Brewing Co Interurban
etc...

One way of doing this would be to try to generate a master list of all beers that we know of, and then have an algorithm for trying to match the listing we see with one of them. Part of it would be simple edit distance calculations, but we'd need a lot of custom logic. Then we would probably need some confidence thresholds so some matches are automatic and others are pending.

BeerAdvocate.com are jerks about programmatically accessing their data. RateBeer apparently has an API but they don't give out keys anymore. Untappd seems to be the most open, so it might be worth seeing if they can become our beer database. I think there are other great features we could have by integrating with ratings from Untappd as well.

Intermittent issue with Naked City

Happened on every update for a few days. Something busted in the parsing. If nothing else, scrapers should not return any empty strings in the beers array.

Feb 28 10:36:19 seattlebeerfinder app/scheduler.6995: Updating data source: Naked City
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: +
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: - Naked City Magnificent Seven Anniversary IPA
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: /app/bin/updateDataSources:182
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: console.log('ERROR: Error saving tap listing: ' + listing.rawListing);
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: ^
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: TypeError: Cannot read property 'rawListing' of undefined
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: at /app/bin/updateDataSources:182:86
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: at /app/node_modules/mongoose/lib/document.js:1813:19
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: at handleError (/app/node_modules/hooks-fixed/hooks.js:40:22)
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: at _next (/app/node_modules/hooks-fixed/hooks.js:46:22)
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: at fnWrapper (/app/node_modules/hooks-fixed/hooks.js:186:18)
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: at /app/node_modules/mongoose/lib/schema.js:236:13
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: at complete (/app/node_modules/mongoose/lib/document.js:1164:7)
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: at /app/node_modules/mongoose/lib/document.js:1195:20
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: at ObjectId.SchemaType.doValidate (/app/node_modules/mongoose/lib/schematype.js:682:12)
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: at /app/node_modules/mongoose/lib/document.js:1191:9
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: at nextTickCallbackWith0Args (node.js:453:9)
Feb 28 10:36:24 seattlebeerfinder app/scheduler.6995: at process._tickCallback (node.js:382:13)

Email or Slack notifications on scraper errors

It would be good to know when a scraper fails (like right now, the Sixgill is failing every time, and earlier this week it was Prost). However, we'll need a way to not spam when they fail every 10 mins. Maybe ping slack when we see an error, but then don't ping again until there's been a successful scrape. Probably put a new property (properties?) on the dataSource model to keep track of errors and notifications.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.