GithubHelp home page GithubHelp logo

lwadya / boozehound Goto Github PK

View Code? Open in Web Editor NEW
1.0 0.0 1.0 4.76 MB

A web app that uses Natural Language Processing to recommend cocktail recipes based on a search string.

Home Page: https://lw-boozehound.herokuapp.com/

License: MIT License

Python 5.70% CSS 1.43% HTML 2.20% Jupyter Notebook 90.66%

boozehound's Introduction

Boozehound Cocktail Recommendation App

Try out Boozehound on Heroku!

This project was a labor of love for me since it combines two of my favorite things: data and cocktails. Ever since my wife signed us up for a drink-mixing class a few years ago I've been stirring up various concoctions, both established recipes and original creations. My goal was to create a simple app that would let anyone discover new drink recipes based upon their current favorites or just some descriptive words. Recipe books can be fantastic resources, but I often just want something that tastes like some other drink but different or some combination of flavors and a specific spirit and that's where a table of contents fails. While I never thought of Boozehound as a replacement for my favorite recipe books I was hoping it could serve as an effective alternative when I just don't have the patience to thumb through dozens of recipes to find what I want to make.

Photo of a book and index cards containing cocktail recipes

Data Collection

Because I wanted Boozehound to work with descriptions and not just cocktail and spirit names I knew I would be relying upon Natural Language Processing and would need a fair amount of descriptive text with which to work. I also wanted the app to look good so I needed to get my recipes from a resource that also has images for each drink. I started by scraping the well-designed Liquor.com, which has a ton of great recipes and excellent photos. Unfortunately the site has extremely inconsistent write-ups on each cocktail: some are paragraphs long and others only a sentence or two. I wanted consistent, longer drink descriptions and I found them at The Spruce Eats, which I scraped using BeautifulSoup in the 'scrape_spruce_eats' notebook. Spruce Eats doesn't have the greatest list of recipes, but I was still able to collect roughly 980 separate drink entries from the site, each with an ingredient list, description, and an image URL.

Text Pre-Processing

After getting all of my cocktail recipe data into a Pandas DataFrame, I still needed to format my corpus to prepare it for modeling. I used the SpaCy library to lemmatize words and keep only the nouns and adjectives. SpaCy is both fast and easy to use, which made it ideal for my relatively simple pre-processing. I then used scikit-learn's TF-IDF implementation to create a matrix of each recipe's word frequency vector. I chose TF-IDF over other vectorizers because it accounts for word count disparities and some of my drink descriptions are twice as long as others. The app also runs user search strings through the same process and those should certainly be shorter than any cocktail description. My pre-processing and modeling work is stored in the 'model_spruce_eats' notebook.

Models

Building my initial model was a fairly simple process of dimension reduction and distance calculations. Since I had a TF-IDF matrix with significantly more columns than rows, I needed some way of condensing those features. I tried a few solutions and Non-Negative Matrix Factorization gave me the most sensible groupings. From there I just calculated pairwise Euclidean distances between the NMF description vectors and a similarly vectorized search string. There was just one problem: my model was such a good recommender that it was way too boring. It relied far too heavily on the frequency of cocktail and spirit names in determining similarity, so a search for margarita would just return ten different variations of margaritas. To make my model more interesting I created a second model that is only able to use words from the description that are not the names of drinks or spirits. Both models are then blended together, which the user can control through the Safe-Weird slider in the Boozehound app.

The image below gives an example of how both models work. Drink and spirit names dominate the descriptions, so the Safe model in the top half has a good chance of connecting words like tequila. The Weird model on the bottom has to make connections using other words, in this case refreshing. Because it has less data with which to work, the Weird model tends to make less relevant recommendations, but they’re often more interesting.

Chart showing an example of Safe and Fun model word similarities

The App

Boozehound is up and running on Heroku: feel free to give it a try. I built the app in Flask and wanted a simple, clean aesthetic reminiscent of a classy cocktail bar. As a result I spent just as much time using CSS to stylize content as I did just getting the app to work properly. Luckily I enjoy design and layout so it was a real pleasure seeing everything slowly come together. My Metis instructors would often bring up the idea of completing the story and to me this project would have been incomplete without a concise and visually-pleasing presentation of recipe recommendations.

Picture of the Boozehound app

Conclusions

While I am pleased with the final product I presented in class at Metis, there's a lot I still want to address with the Boozehound app. Some of the searches I tested returned odd results that could be improved upon. I'd also like to add some UX improvements like helpful feature descriptions and some initial search suggestions for users who don't know what they want. Another planned feature is a one-click search button to allow the user to find recipes similar to a drink that shows up as a recommendation without having to type it into the search bar. Boozehound is all about exploration and I want to make rifling through a bunch of new recipes as easy as possible.

boozehound's People

Contributors

lwadya avatar

Stargazers

 avatar

Forkers

sshuster

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.