GithubHelp home page GithubHelp logo

nuwandda / yolov7-hand-detection Goto Github PK

View Code? Open in Web Editor NEW
11.0 1.0 0.0 67.44 MB

Hand detection using YOLOv7 with COCO dataset.

License: MIT License

Shell 0.40% Python 98.81% Dockerfile 0.15% Jupyter Notebook 0.63%
coco hand-detection yolov7 yolov7-hand-detection

yolov7-hand-detection's Introduction

Hand Detection - YOLOv7

Introduction

Object detection is a computer vision technique that allows us to identify and locate objects in an image or video. With this kind of identification and localization, object detection can be used to count objects in a scene and determine and track their precise locations while accurately labeling them. Object detection is commonly confused with image recognition, so before we proceed, it’s important that we clarify the distinctions between them. Image recognition assigns a label to an image. A picture of a dog receives the label “dog”. A picture of two dogs still receives the label “dog”. Object detection, on the other hand, draws a box around each dog and labels the box “dog”. The model predicts where each object is and what label should be applied. The purpose of this project is training a hand detection model using YOLOv7 with COCO dataset.

sample_image

Architecture

In this model, we used YOLOv7 as the architecture. YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56.8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100. YOLO architecture is FCNN(Fully Connected Neural Network) based. However, Transformer-based versions have also recently been added to the YOLO family.

The YOLO has three main components.

  • Backbone
  • Head
  • Neck

The Backbone mainly extracts essential features of an image and feeds them to the Head through Neck. The Neck collects feature maps extracted by the Backbone and creates feature pyramids. Finally, the head consists of output layers that have final detections.

YOLOv7 improves speed and accuracy by introducing several architectural reforms. The following major changes have been introduced in the YOLOv7 paper.

  • Architectural Reforms

    • E-ELAN (Extended Efficient Layer Aggregation Network)
    • Model Scaling for Concatenation-based Models
  • Trainable BoF (Bag of Freebies)

    • Planned re-parameterized convolution
    • Coarse for auxiliary and Fine for lead loss

You can see detailed information about these additions in this paper.

(Back to Top)

Dataset

The COCO-Hand dataset contains annotations for 25K images of the Microsoft's COCO dataset. To see the details of the dataset, please visit this page.

(Back to Top)

Getting Started

Instructions on setting up your project locally. To get a local copy up and running follow these simple steps.

Download dataset

To download the dataset, run getCoco.sh.

sudo apt-get install unzip
sh data/getCoco.sh

It will be downloaded inside to data/coco folder.

Download base models

You can download the base model by visiting the link below.

Install dependencies

To install the required packages. In a terminal, type:

pip install -r src/requirements.txt

Convert the annotations

Now that we have our dataset, we need to convert the annotations into the format expected by YOLOv7. YOLOv7 expects data to be organized in a specific way, otherwise it is unable to parse through the directories.

python src/convert_annotations.py --images 'path to coco images folder' --annotations 'path to coco annotations txt'

To see if the conversion is correct, run.

python src/convert_annotations.py --images 'path to coco images folder' --annotations 'path to coco annotations txt' --plot

Partition the Dataset

Next, we need to partition the dataset into train, validation, and test sets. These will contain 80%, 10%, and 10% of the data, respectively.

python src/prepare_data.py --path 'path to coco images folder'

Training

The training specifications are:

  • Epoch: 300
  • Dataset: Hand COCO
  • Batch size: 4
  • Image size: 640
  • GPU: NVIDIA GeForce RTX 3060 Laptop GPU

If you are having fitting the model into the memory:

  • Use a smaller batch size.
  • Use a smaller network: the yolov7-tiny.pt checkpoint will run at lower cost than the basic yolov7_training.pt.
  • Use a smaller image size: the size of the image corresponds directly to expense during training. Reduce the images from 640 to 320 to significantly cut cost at the expense of losing prediction accuracy.

To start the training:

python src/yolov7/train.py --img-size 640 --cfg src/cfg/training/yolov7.yaml --hyp data/hyp.scratch.yaml --batch 4 --epoch 300 --data data/hand_data.yaml --weights src/models/yolov7_training.pt --workers 2 --name yolo_hand_det --device 0

You can also train the model on Google Colab.

Open In Colab

Inference

To test the training model:

python src/yolov7/detect.py --source data/sample/test --weights runs/train/yolo_hand_det/weights/best.pt --conf 0.25 --name yolo_hand_det

(Back to Top)

yolov7-hand-detection's People

Contributors

nuwandda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

yolov7-hand-detection's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.