GithubHelp home page GithubHelp logo

lannguyen0910 / food-recognition Goto Github PK

View Code? Open in Web Editor NEW
228.0 5.0 66.0 364.7 MB

๐Ÿ”๐ŸŸ๐Ÿ— Food analysis baseline with Theseus. Integrate object detection, image classification and multi-class semantic segmentation ๐Ÿž๐Ÿ–๐Ÿ•

License: MIT License

Python 65.00% JavaScript 5.64% CSS 11.00% HTML 18.18% Shell 0.14% Batchfile 0.03%
object-detection yolov5 pytorch food-detection efficientnet computer-vision deep-learning semantic-segmentation image-classification unetplusplus

food-recognition's Introduction

๐Ÿ”๐ŸŸ๐Ÿ— Meal analysis with Theseus ๐Ÿž๐Ÿ–๐Ÿ•


MIT CodeFactor Python

Dev logs [01/05/2024] Fix ngrok bug on Colab #32 (Migrate to pyngrok).
[24/10/2023] Clean and refactor repo. Integrate YOLOv8 to food detection.
[07/03/2022] Big refactor. Integrate object detection, image classification, semantic segmentation into one Ship of Theseus.
[31/01/2022] Update to new YOLOv5 latest versions P5-P6. Can load checkpoints from original repo.
[26/12/2021] Update app on Android.
[12/09/2021] Update all features to the web app.
[16/07/2021] All trained checkpoints on custom data have been lost. Now use pretrained models on COCO for inference.

๐Ÿ“” Notebook

  • For inference, use this notebook to run the web app Notebook
  • For training, refer to these notebooks for your own training:
    • Detection: Notebook
    • Classification: Notebook
    • Semantic segmentation: Notebook

๐Ÿฅ‡ Pretrained-weights

Models Image Size Epochs [email protected] [email protected]:0.95
YOLOv5s 640x640 172 0.907 0.671
YOLOv5m 640x640 112 0.897 0.666
YOLOv5l 640x640 118 0.94 0.73
YOLOv5x 640x640 62 0.779 0.533
YOLOv8s 640x640 70 0.963 0.82
  • Segmentation:
Models Image Size Epochs Pixel AP Pixel AR Dice score
UNet++ 640x640 5 0.931 0.935 99.95
  • Classification:
Models Image Size Epochs Acc Balanced Acc F1-score
EfficientNet-B4 640x640 7 84.069 86.033 84.116

๐ŸŒŸ Logs detail

In total, there are 3 implementation versions:

  1. Training using our own object detection's template. The model's source code is inherited from the Ultralytics source code repo, the dataset is used in COCO format and the training and data processing steps are reinstalled by us using Pytorch. Ensemble technique, merge result of 4 models, only for images. Label enhancement technique, if the output label (after detection) is either "Food" or "Food-drinks", we use a pretrained Efficientnet-B4 classifier (on 255 classes) to re-classify it to another reasonable label.
  2. Big refactor, update the training steps, used from Ultralytics source code repo too. The models yield better accuracy. Test-time augmentation technique is added to the web app.
  3. Update Theseus template, currently supports food detection, food classification, multi-class food semantic segmentation only on images. For this version, we introduce Theseus, which is just a part of Theseus template. Moreover, we omitted some weak or unnecessary features to make the project more robust. Theseus adapted from big project templates such as: mmocr, fairseq, timm, paddleocr,...

For those who want to play around with the first version, which remains some features, differ from the new version. You can check out the v1 branch.

๐ŸŒŸ Inference

  • Install requirements.
pip install -e .
  • Start the app (Windows). Safe to run in insecure connection http on localhost. You can generate SSL certificate to run the app in https.
run.bat

or

python3 app.py

๐ŸŒŸ Dataset

  • Detection: link (merged OID and Vietnamese Lunch dataset)
  • Classification: link (MAFood121)
  • Semantic segmentation: link (UECFood)

๐ŸŒŸ Dataset details

To train the food detection model, we survey the following datasets:
  • Open Images V6-Food: Open Images V6 is a huge dataset from Google for Computer Vision tasks. To solve our problem, we extracted from a large dataset on food related labels. The extracted set includes 18 labels with more than 20,000 images.
  • School Lunch Dataset: includes 3940 photos of a lunch of Japanese high school students, taken at the same frontal angle with the goal of assessing student nutrition. Labels consist of coordinates and types of dishes are attached and divided into 21 different dishes, in the dataset there is also a label "Other Foods" if the dishes do not belong to the remaining 20 dishes.
  • Vietnamese Food: a self-collected dataset on Vietnamese dishes, including 10 simple dishes of our country such as: Pho, Com Tam, Hu Tieu, Banh Mi,... Each category has about 20-30 images, divided 80-20 for training and evaluation.

We aggregate all the above datasets to proceed training. Dishes that appear in different sets will be grouped into one to avoid duplication. After aggregating, a large data set of 60,305 images with 44 different foods from all regions of the world.

In addition, we find that if we expand the problem to include classification, the dataset will increase significantly. Therefore, to further enhance the diversity of dishes, we collect additional datasets to additionally train a classification model:

  • MAFood-121: consisting of 21,175 training image samples. The dishes are selected from the top 11 most popular cuisines in the world according to Google Trends statistics, these cuisines come from many countries around the world, especially Vietnam. For each type of cuisine, 11 typical traditional dishes are selected. The dataset has a total of 121 different types of dishes, each belonging to at least 1 of 10 food categories: Bread, Eggs, Fried, Meat, Noodles, Rice, Seafood, Soup, Dumplings, and Vegetables . 85% of the images are used for training and the remaining 15% for evaluation.
  • Food-101: includes 101 different types of dishes, with 101,000 sets of photos. For each dish, 250 images were used as test images and the remaining 750 images were used for training. The training images in this set still have a lot of noise, sometimes the colors are too sharp or some of the data samples are mislabeled, these noises are intentional by the author (mentioned in the study).

We also perform the aggregation of the two data sets above into one. The new set includes 93,748 training images and 26,825 evaluation images with a total of 180 different dishes. It can be seen that the number of dishes has increased significantly, if the model detects a dish labeled "Other Foods", the classification model will be applied to this dish and classified again.

๐ŸŒŸ Server

Implementation details

The function get_prediction is an inference function for detection, classification and semantic segmentation tasks, depends on which inputs you choose. Implemented in modules.py, where the image detection process will call the Edamam API to get nutritional information in the food. We also save nutritional information in csv files in the folder /static/csv.

We provide the user with the ability to customize the threshold of confidence and iou so that the user can find a suitable threshold for the input image. In order not to have to rerun the whole model every time these parameters are changed, when the image is sent from the client, the server will perform a perceptual hash encryption algorithm to encrypt the image and using that resulting string to name the image when saving to the server. This helps when the client sends an image whose encoding already exists in the database, the server will only post-process the previously predicted result without having to re-execute the prediction.

๐ŸŒŸ Additional Methods

To increase the variety of dishes, we apply a classification model:
After testing and observing, we use a simple and effective model: EfficientNet. EfficientNet is proposed by Google and is one of the state-of-the-art models in this classification problem, and efficiency is also guaranteed. We apply the EfficientNet model source code from rwightman, we select the EfficientNet-B4 version for retraining on the aggregated dataset. This model is used as an additional improvement to the YOLOv5 model in case the model detects a dish labeled as "Other Foods", only then EfficientNet is applied to predict the label again for this dish.
To increase the accuracy of the algorithm, we use the ensemble models technique:

For each image, models with different versions are used to predict, the results are then aggregated using the "weighted box fusion" method to give the final result.

To increase users' interactivity with the application:

When a dish is predicted, we provide more information about the nutritional level of that dish to the user. This information is queried from the application's database, which will be periodically updated from the Edamam API - an API that allows querying the nutrition of a dish by dish name. When doing prediction, the nutrition information will be saved along with the dish name under CSV format. We then fetch the CSV file on the client site to proceed drawing nutritrion statistics chart using Chart.js library. There are a total of 2 chart types, which appear when the user clicks on that chart type.

๐Ÿฑ Sample Results

๐Ÿ“™ Credits

food-recognition's People

Contributors

code-factor avatar deepsource-io[bot] avatar kaylode avatar lannguyen0910 avatar snyk-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

food-recognition's Issues

Error with Inference Code

Hi! Amazing repo, and I have downloaded the code and tried to make it run from my local computer. However, the program runs into this issue that I can't seem to resolve (tried running it in Windows OS as well). I've tried online solutions to similar problems, but none of them were successful. Here is the error message I get:

Traceback (most recent call last):
File "/mnt/c/FoodImageRecognition/app.py", line 128, in
model_best = load_model('best_model_101class.hdf5', compile=False) #die
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/keras/saving/save.py", line 205, in load_model
return saved_model_load.load(filepath, compile, options)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/keras/saving/saved_model/load.py", line 122, in load
meta_graph_def = loader_impl.parse_saved_model(path).meta_graphs[0]
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/saved_model/loader_impl.py", line 116, in parse_saved_model
raise IOError(
OSError: SavedModel file does not exist at: best_model_101class.hdf5/{saved_model.pbtxt|saved_model.pb}

If you could help me resolve this issue, that would be greatly appreciated! Thank you!

Inference Colab notebook ngrok agent version too old

Hi,
Thank you for putting together this app !
I tried to run the inference notebook on Colab, but I got an error with the ngrok version:

Your ngrok-agent version "2.3.41" is too old. The minimum supported agent version for your account is "3.2.0". Please update to a newer version with `ngrok update`, by downloading from https://ngrok.com/download, or by updating your SDK version. Paid accounts are currently excluded from minimum agent version requirements. To begin handling traffic immediately without updating your agent, upgrade to a paid plan: https://dashboard.ngrok.com/billing/subscription.

ERR_NGROK_121

However, when I run ngrok update in the notebook, it says it's up to date:
!/ngrok update
No update available, this is the latest version.
Output of !/ngrok --version:
ngrok version 2.3.41

I'm not familiar with ngrok, so I don't quite know how to solve the issue right now.

Yolov5

Hi, this food detection project was well put together, thank you.
However, I faced some issues when running inference on Colab.

The issues i faced were when running either Yolov5 model it keeps having error, like module error (no modules named 'models'). Yolov8 works fine, but for both models when I enabled ensemble models it gives the same error, no module named 'models'. Also, in inference, it is supposed to only run on Ngrok app?

I also tried running on local machine where the port 5000 was only available to run, it worked on local machine. But the same issues arose with Yolov5 not able to run and ensemble models error.

Is there a way to solve this?
I am new to this and I am trying to learn for my upcoming project.

Thank you for your time if you could help!

Question about training datasets

Hi, this is a very impressive work! May I know the public food datasets name that you used for training the model? From the google drive, I only recognize "school lunch", where are the other training data comes from such as old_v1, vn_food? Thanks!

Missing yaml files in configs

When I run the project through the python app.py, it noticed that the 'models'.yaml file is missed in 'utilities/configs' like, FileNotFoundError: [Errno 2] No such file or directory: 'utilities/configs/yolov5m.yaml' and it is also missed in './models/configs/detection' there even have no detection file under the configs path

Issues with Custom model

Hi, the repo is great!

The weights associated with this repo are in .pth format, usually the weights are .pt format and the pretrained weights from yolov5 repo are not working here.

Is there a way to convert the .pt weights to .pth and use with your model?

Segmentation function not working

Thank @kaylode and @lannguyen0910! you guys did a well-done project. I have tried to run the app notebook and found that the segmentation function is not working basically. the detection is working fine. Could you please check it out?
I see you using Unet++ from segmentation from segmentation_models.pytorch. Could you please share with me a training notebook? I find it hard to make the right setup and data preparation. I would like to understand how to implement it correctly. You can share with me via my email at [email protected]. Thank you so much!

ERROR: No matching distribution found for opencv-python-headless==4.2.0.32

I'm getting the following error after cloning the project and running pip install -e .. Any idea what might be causing this? Thanks!

ERROR: Could not find a version that satisfies the requirement opencv-python-headless==4.2.0.32 (from food-recognition) (from versions: 3.4.10.37, 3.4.11.39, 3.4.11.41, 3.4.11.43, 3.4.11.45, 3.4.13.47, 3.4.15.55, 3.4.16.57, 3.4.16.59, 3.4.17.61, 3.4.17.63, 3.4.18.65, 4.3.0.38, 4.4.0.40, 4.4.0.42, 4.4.0.44, 4.4.0.46, 4.5.1.48, 4.5.3.56, 4.5.4.58, 4.5.4.60, 4.5.5.62, 4.5.5.64, 4.6.0.66, 4.7.0.68, 4.7.0.72)
ERROR: No matching distribution found for opencv-python-headless==4.2.0.32

System Information:
Apple M1
macOS 13.1
python 3.10.6
pip 23.0.1

Upload Food

Hi I upload food photo, press Detect and see Internal Server Error message. Please advise how could I use it. Thanks.

messageImage_1677752240016
messageImage_1677752196191

Incredibly Low Accuracy

Hello! This repo looks great. I have tested this out on my local machine, with and without ngrok. I've noticed an abnormally low accuracy, with most pictures I try resulting in an incorrect result. This is very different from the accuracy in the README, which states an accuracy of 84.069 and a balanced accuracy of 86.033. Why would this be?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.