GithubHelp home page GithubHelp logo

omcaaaar / ifood_2019 Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 1.0 3 MB

This is a competition project which is a part of the fine-grained visual-categorization workshop (FGVC6 workshop) at CVPR 2019

Jupyter Notebook 100.00%

ifood_2019's Introduction

iFood_2019

This is a part of the fine-grained visual-categorization workshop (FGVC6 workshop) at CVPR 2019.

Description:

alt text What did you eat today? Wondering if you are eating a healthy diet? Automatic food identification can assist towards food intake monitoring to maintain a healthy diet. Food classification is a challenging problem due to the large number of food categories, high visual similarity between different food categories, as well as the lack of datasets that are large enough for training deep models. In this competition, we extend our last year's dataset to 251 fine-grained (prepared) food categories with 118,475 training images collected from the web. We provide human verified labels for both the validation set of 11,994 images and the test set of 28,377 images. The goal is to build a model to predict the fine-grained food-category label given an image.

The main challenges are:

  1. Fine-grained Classes: The classes are fine-grained and visually similar. For example, the dataset has 15 different types of cakes, and 10 different types of pastas.
  2. Noisy Data: Since the training images are crawled from the web, they often include images of raw ingredients or processed and packaged food items. This is referred to as cross-domain noise. Further, due to the fine-grained nature of food-categories, a training image may either be incorrectly labeled into a visually similar class or be annotated with with a single label despite having multiple food items.

Evaluation:

For each image , an algorithm will produce 3 labels , . For this competition each image has one ground truth label , and the error for that image is:

Where

The overall error score for an algorithm is the average error over all test images:

Submisstion file format:

image_name,label1 label2 label3 
test_0001.jpg,0 1 10 
test_0002.jpg,1 3 5 
test_0003.jpg,0 5 1 

Please include the header as shown above for correct parsing. Each line will correspond to one test image and will be identified by the id (e.g test_0001.jpg refers to image test_0001.jpg) for computing accuracy.

Data:

There is a total of 251 food categories in the dataset. A complete list of classes is available here.

Training data:

The training data consists of 118,475 images from 251 classes. The training data is collected from web images and consists of noisy labels.

Validation data:

The validation data consists of 11,994 images from 251 classes. The test data is collected from web images and the labels are human verified. It does not contain noisy labels.

Test data:

The test data consists of 28,377 images from 251 classes. The test data is collected from web images and the labels are human verified. It does not contain noisy labels.

Data download and format:

Data can be downloaded from the links below or from Kaggle.

Annotations (3.0 MB)

  • Running md5sum annot.tar on the tar file should produce 0c632c543ceed0e70f0eb2db58eda3ab
  • The tar contains 4 files
    • class_list.txt: Contains the names of 251 class labels. This can be used to map class_ids with class names.
    • train_info.csv: Each line of this csv containing the "image_name,label" pair for training data. For example, "train_00000.jpg,94" refers to image train_00000.jpg having class_id 94. The class_id can be mapped to class name using class_list.txt.
    • val_info.csv: Each line of this csv containing the "image_name,label" pair for validation data.
    • test_info.csv: csv only provides the list of test images.
  • We provide separate tars for train, val and test images as mentioned below.

Train Images (2.3 GB)

  • Running md5sum train.tar on the tar file should produce 8e56440e365ee852dcb0953a9307e27f
  • Contains training images.
  • For label information see annotation file train_info.csv.

Validation Images (231 MB)

  • Running md5sum val.tar on the tar file should produce fa9a4c1eb929835a0fe68734f4868d3b
  • Contains validation images.
  • For label information see annotation file val_info.csv.

Test Images (548 MB)

  • Running md5sum train.tar on the tar file should produce 32479146dd081d38895e46bb93fed58f
  • Contains testing images.
  • The label will be evaluation on the evaluation server.

Annotations:

This folder contains some important files which we'll be using while training our models.

  1. class_balance.csv : Used to analyse class imbalance in the training data.

  2. outliers.txt : Contains list of all noisy/misclassified images in the training data.

  3. train_info_v2.csv, val_info_v2,csv : Used to make data folders so that we can load the data using PyTorch's DataLoader before training starts.

Notebooks:

This folder contains all the notebook files we've used during this competition.

We trained 4 networks seperately and ensembled them at the end. The typical training flow was to fine-tune a network which is pretrained on ImageNet for 15 epochs and then train a full network for 3-5 epochs. We used BCEWithLogitsLoss for this problem and optimizer was Adam with initial learning rate of 1e-4 and reducing it by the factor of 10 after certain steps using MultiStepLR scheduler, beta values were 0.9 and 0.999.

The networks are : pnasnet, senet154, polynet and densenet201. The highest scoring model was polynet followed by senet154, pnasnet and densenet201 at last.

We also tried cleaning dirty labels and then augmenting the data externally but unfortunately that didn't give promising results.

The trained model files can be found here

Results:

1. polynet : 86.26% (top-3 accuracy)
2. senet154 : 85.76% (top-3 accuracy)
3. pnasnet : 84.77% (top-3 accuracy)
4. densenet201 : 81.20 (top-3 accuracy)

After ensembling these 4 networks we got 90.54% top-3 accuracy on the test data.

Scope of improvement:

Due to time constraint we could not try following techniques, but would have certainly helped us improving the accuracy by atleast 2-3%.

  1. Training and testing on different scales
  2. mixup
  3. label smoothing
  4. DropBlock
  5. we also could have created food pretrained training data by selecting only food-related data in openImage, ImageNet Fall 2011 (inspired by the following paper: Domain Adaptive Transfer Learning with Specialist Models)

References:

  1. Competition link : kaggle
  2. Competition link : github
  3. models : pretrained-models

ifood_2019's People

Contributors

omcaaaar avatar

Stargazers

 avatar

Watchers

 avatar paper2code - bot avatar

Forkers

chaucergit

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.