GithubHelp home page GithubHelp logo

green-restaurant-analysis's Introduction

Green Restaurant Analysis

Data Sources

  • Yelp: https://www.yelp.com/dataset
    • Businesses and Reviews files downloaded from the link above
  • Seafood Watch: https://www.seafoodwatch.org/
    • csv file provided by a representative of the organization containing their restaurant partners (email me for access)
  • Green Restaurant Association: http://www.dinegreen.com/
    • csv file provided by a representative of the organization containing the restaurants they have rated (email me for access)

Results

  • How different green ratings columns correlate with one another and the yelp star ratings Correlations
  • The combined Green Rating for restaurants for each yelp star rating Graph

To run any of the python scripts you must have the above data in data/original_sources in files named:

  • YelpBusiness.csv - generated by json_to_csv.py from json file downloaded from yelp
  • YelpReview.csv - generated by json_to_csv.py from json file downloaded from yelp
  • GRA.csv
  • SeafoodWatch.csv

Scripts

  1. json_to_csv.py
    • Imports: pandas
    • Converts JSON files downloaded from Yelp dataset challenge into csv files.
  2. clean_yelp_businesses.py
    • Imports: pandas
    • Removes unwanted columns and rows from yelp businesses csv file
      • Yelp data contains businesses in addition to restaurants, so we filter the categories column for these words: ['RESTAURANTS','BARS','FOOD','BREAKFAST & BRUNCH','DESSERTS','BAKERIES, DELIS, SANDWICHES', 'COFFEE & TEA', 'DINERS', 'CAFES']
    • files required: data/original_sources/YelpBusiness.csv
    • files generated: data/clean_yelp_restaurants.csv
  3. clean_GRA_data.py
    • Imports: pandas
    • Removes unnecessary columns and standardizes text
    • files required: data/original_sources/GRA.csv
    • files generated: data/clean_green.csv
  4. merge_restaurants_and_reviews.py
    • Imports: pandas
    • Merges yelp business and review data into one pandas dataframe by combining the text from each review about a restaurant into one large block of text
    • Merges entries for separate locations of the same franchise into one row in our dataset.
    • files required: data/original_sources/YelpReview.csv, data/clean_yelp_restaurants.csv
    • files generated: data/big_restaurants_and_reviews.csv, data/small_restaurants_and_reviews.csv
  5. environmental_term_analysis.py
    • Can be run with --Small True to use the small dataset so it does not take as long
    • Imports: pandas
    • Create a green rating for each restaurant based on whether it’s reviews contains “environmental” terms.
      • Examples of environmental terms: compost, recycle, green, local, vegan, vegetarian
      • If 1% or more of the total words in the reviews were environmental words, the restaurant got a score of 3, the rest got scores of 0, 1, or 2 but in our final dataset we only counted those with a score of 3 as “green”
    • files required: data/helper_files/environmentalTerms.txt, data/big_restaurants_and_reviews.csv or data/small_restaurants_and_reviews.csv
    • file generated: data/big_term_based_green_rating_results.csv or data/small_term_based_green_rating_results.csv
  6. merge_all_ratings.py
    • Can be run with --Small True to use the small dataset so it does not take as long
    • Imports: pandas
    • Creates the Final dataset which contains a row for each restaurant
      • columns: name, review text, yelp stars, GRA rating, seafood watch rating, term based rating, overall green rating
    • files required: data/small_restaurants_and_reviews.csv or data/big_restaurants_and_reviews.csv data/small_term_based_green_rating_results.csv or data/big_term_based_green_rating_results.csv
    • file generated: data/small_restaurants_reviews_ratings.csv or data/big_restaurants_reviews_ratings.csv,

Analysis

  • In the jupyter notebook notebooks/restaurants_reviews_ratings_analysis.ipynb I generate the dataset generated by pythonScripts/merge_all_ratings.py (notebook cells are copied from there) and do some basic data exploration and analysis. The visualization and graph above come from this notebook. There are no striking conclusions based on this analysis.

Future Work

  • Run better stemming and lemmatization algorithms on the reviews and determine topics for green restaurant reviews and non green restaurant reviews.
  • Expand list of green restaurants by using more alternative data sources (such as blogs or web scraping)
  • Use nyc open data in addition to yelp stars to measure if green restaurants are more successful
This project stems partially from the NYU Big Data Science course project by Nellie Spektor, Valerie Angulo, and Andrea Waxman. here
Nellie Spektor has since continued working on this project under Professor Anasse Bari

green-restaurant-analysis's People

Contributors

andreawaxman avatar nspektor avatar vangul01 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.