GithubHelp home page GithubHelp logo

yelp's Introduction

Yelp Restaurant Photo Classification

My solution that scored 0.82246 and finished Yelp Restaurant Photo Classification on 22 position out of 355 teams (top 10%).

Implements the following pipeline:

  • Extract image features using four Caffe models. Results in eight 9GB image feature files (two train/test sets for every model)
  • Combine image features into business features in two different ways. Results in 16 feature sets, much smaller this time
  • Train Logistic Regression classifiers on business features via 10-Fold CV on every business feature set. Stack classifier output into single set of meta features. Train Logistc Regression on meta features and generate an intermediate submission
  • Train Neural Net and XGBoost on meta features, predict label probabilities, average predictions, generate submission without F1 Maximization
  • Adjust probabilities via Maximum Expected Utility Framework for F-Measure Maximization, generate final submission.

Requires:

  • Caffe, deep learning framework
  • Scientific Python Stack (including NumPy, SciPy, Pandas. All this can be obtaned with Anaconda distribution)
  • XGBoost
  • Theano
  • Keras
  • About 100 GB of free disk space is needed for train/test images, extracted image features, model dumps.

NVIDIA GPU is not required but recommended. Extracting image features on CPU may take several days.

Download:

  • The training and test datasets and other data can be downloaded from here
  • Get pretrained Caffe models BVLC Reference CaffeNet and BVLC AlexNet as described here. Download the other two models from Places CNN project: Places205-AlexNet, Hybrid-AlexNet

Notice: customized prototxts and mean files already available in the models folder

How to generate the solution(s):

  1. After you downloaded and extracted datasets and models, adjust paths in paths.py and set caffe_mode (currently set to CPU)
  2. Successively run (make sure you have enough disk space, see above):
    python Stage1_ExtractImageFeatures.py
    python Stage2_CreateBuisnessFeatures.py
    python Stage3_BlendLRModelsCV.py
    python Stage4_KerasXGBoostMEUFsubmission.py
  3. You will get three submissions:
    all_models_blendLR_CV.csv
    keras_xgboost_blend_noMEUF.csv
    keras_xgboost_blend_MEUF.csv

Enjoy!

Read my article "What restaurant would your computer like to go to?" and like it if you like it.

yelp's People

Contributors

alexander-rakhlin avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.