GithubHelp home page GithubHelp logo

quic / sense Goto Github PK

View Code? Open in Web Editor NEW
729.0 31.0 104.0 151.15 MB

Enhance your application with the ability to see and interact with humans using any RGB camera.

Home Page: https://20bn.com/products/datasets

License: MIT License

Python 65.65% CSS 0.31% HTML 21.71% Shell 0.62% JavaScript 11.71%
pytorch neural-networks deep-learning computer-vision activity-recognition gesture-recognition fitness-app calorie-estimation video

sense's Introduction

State-of-the-art Real-time Action Recognition


WebsiteBlogpostGetting StartedBuild Your Own ClassifieriOS DeploymentGalleryDatasetsSDK License

Documentation GitHub GitHub release Contributor Covenant


sense is an inference engine to serve powerful neural networks for action recognition, with a low computational footprint. In this repository, we provide:

  • Two models out-of-the-box pre-trained on millions of videos of humans performing actions in front of, and interacting with, a camera. Both neural networks are small, efficient, and run smoothly in real time on a CPU.
  • Demo applications showcasing the potential of our models: action recognition, gesture control, fitness activity tracking, live calorie estimation.
  • A pipeline to record and annotate your own video dataset and train a custom classifier on top of our models with an easy-to-use script to fine-tune our weights.
Action Recognition

Fitness Activity Tracker and Calorie Estimation

Gesture Control


Requirements and Installation

The following steps are confirmed to work on Linux (Ubuntu 18.04 LTS and 20.04 LTS) and macOS (Catalina 10.15.7).

Step 1: Clone the repository

To begin, clone this repository to a local directory of your choice:

git clone https://github.com/TwentyBN/sense.git
cd sense

Step 2: Install Dependencies

We recommend creating a new virtual environment to install our dependencies using conda or virtualenv. The following instructions will help create a conda environment.

conda create -y -n sense python=3.6
conda activate sense

Install Python dependencies:

pip install -r requirements.txt

Note: pip install -r requirements.txt only installs the CPU-only version of PyTorch. To run inference on your GPU, another version of PyTorch should be installed (e.g. conda install pytorch torchvision cudatoolkit=10.2 -c pytorch). See all available install commands here.

Step 3: Download the SenseKit Weights

Pre-trained weights can be downloaded from here, subject to separate terms. Follow the instructions to create an account, agree to evaluation license and download the weights. Once downloaded, unzip the folder and move the contents into sense/resources. In the end, your resources folder structure should look like this:

resources
├── backbone
│   ├── strided_inflated_efficientnet.ckpt
│   └── strided_inflated_mobilenet.ckpt
├── fitness_activity_recognition
│   └── ...
├── action_recognition
│   └── ...
└── ...

Note: The remaining folders in resources/ will already have the necessary files -- only some additional larger folders need to be downloaded separately.


Getting Started

To get started, try out the demos we've provided. Inside the sense/examples directory, you will find multiple Python scripts that each apply our pre-trained models to a specific use-case. Launching each demo is as simple as running the script in terminal as described below.

The examples will display information on the achieved frame rate in the lower left corner, so you can verify that your installation is running well.

  • Camera FPS is the rate at which frames are read from the webcam or from the provided file. Per default, 16fps is the maximum that was configured as a trade-off between high input frame rate and low computational footprint of the model. The input video stream will be up- or down-sampled accordingly, so that all processing happens in real-time.
  • Model FPS is the rate at which the model produces predictions. In order to keep computations low, our model always collects four frames before passing them through the network, so the expected output frame rate is 4fps. Through temporal convolutions with striding, the model still maintains a larger receptive field.

Demo 1: Action Recognition

examples/run_action_recognition.py applies our pre-trained models to action recognition. 30 actions are supported (see full list here).

Usage:

PYTHONPATH=./ python examples/run_action_recognition.py

Demo 2: Fitness Activity Tracking

examples/run_fitness_tracker.py applies our pre-trained models to real-time fitness activity recognition and calorie estimation. In total, 80 different fitness exercises are recognized (see full list here).

Usage:

PYTHONPATH=./ python examples/run_fitness_tracker.py --weight=65 --age=30 --height=170 --gender=female

Weight, age, height should be respectively given in kilograms, years and centimeters. If not provided, default values will be used.

Some additional arguments can be used to change the streaming source:

  --camera_id=CAMERA_ID           ID of the camera to stream from
  --path_in=FILENAME              Video file to stream from. This assumes that the video was encoded at 16 fps.

It is also possible to save the display window to a video file using:

  --path_out=FILENAME             Video file to stream to

For the best performance, the following is recommended:

  • Place your camera on the floor, angled upwards with a small portion of the floor visible
  • Ensure your body is fully visible (head-to-toe)
  • Try to be in a simple environment (with a clean background)

Demo 3: Gesture Control

examples/run_gesture_control.py applies our pre-trained models to the detection of 8 hand gesture events (6 swiping gestures + thumbs up + thumbs down). Compared to Demo 1, the model used in this case was trained to trigger the correct class for a short period of time right after the hand gesture occurred. This behavior policy makes it easier to quickly trigger multiple hand gestures in a row.

Usage:

PYTHONPATH=./ python examples/run_gesture_control.py

Demo 4: Calorie Estimation

In order to estimate burned calories, we trained a neural net to convert activity features to the corresponding MET value. We then post-process these MET values (see correction and aggregation steps performed here) and convert them to calories using the user's weight.

If you're only interested in the calorie estimation part, you might want to use examples/run_calorie_estimation.py which has a slightly more detailed display (see video here which compares two videos produced by that script).

Usage:

PYTHONPATH=./ python examples/run_calorie_estimation.py --weight=65 --age=30 --height=170 --gender=female

The estimated calorie estimates are roughly in the range produced by wearable devices, though they have not been verified in terms of accuracy. From our experiments, our estimates correlate well with the workout intensity (intense workouts burn more calories) so, regardless of the absolute accuracy, it should be fair to use this metric to compare one workout to another.

Demo 5: Repetition Counting

This demo turns our models into a repetition counter for 2 fitness exercises: jumping jacks and squats.

Usage:

PYTHONPATH=./ python examples/run_fitness_rep_counter.py

Build Your Own Classifier with SenseStudio

This section will describe how you can use our SenseStudio tool to build your own custom classifier on top of our models. Our models will serve as a powerful feature extractor that will reduce the amount of data you need to build your project.

Step 1: Project Setup

First, run the tools/sense_studio/sense_studio.py script and open http://127.0.0.1:5000/ in your browser. There you can set up a new project in a location of your choice and specify the classes that you want to collect.

The tool will prepare the following file structure for your project:

/path/to/your/dataset/
├── videos_train
│   ├── class1
│   ├── class2
│   └── ...
├── videos_valid
│   ├── class1
│   ├── class2
│   └── ...
└── project_config.json
  • Two top-level folders: one for the training data, one for the validation data.
  • One sub-folder for each class that you specify.

Step 2: Data Collection

You can record videos for each class right in your browser by pressing the "Record" button. Make sure that you have ffmpeg installed for that.

Otherwise, you can also just move existing videos into the corresponding project folders. Those should have a framerate of 16 fps or higher.

In the end you should have at least one video per class and train/valid split, but preferably more. In some cases, as few as 2-5 videos per class have been enough to achieve excellent performance with our models!

Step 3: Training

Once your data is prepared, go to the training page in SenseStudio to train a custom classifier. You can specify, which of our pretrained feature extractors should be used and how many of its layers should be fine-tuned. Setting this parameter to 0 means that only your new classification head will be trained.

Step 4: Running your model

The training script will produce a checkpoint file called best_classifier.checkpoint in the checkpoints/<your-output-folder-name>/ directory of your project. You can now run it live using the following script:

PYTHONPATH=./ python tools/run_custom_classifier.py --custom_classifier=/path/to/your/checkpoint/ [--use_gpu]

Advanced Options

You can further improve your model's performance by training on top of temporally annotated data; individually tagged frames that identify the event locally in the video versus treating every frame with the same label. For instructions on how to prepare your data with temporal annotations, refer to this page.

After preparing the temporal annotations for your dataset in SenseStudio, you can run the training with the Temporal Annotations flag enabled to train on those frame-wise tags instead of the whole-video classes.


iOS Deployment

If you're interested in mobile app development and want to run our models on iOS devices, please check out sense-iOS for step by step instructions on how to get our gesture demo to run on an iOS device. One of the steps involves converting our Pytorch models to the TensorFlow Lite format.

Conversion to TensorFlow Lite

Our models can be converted to TensorFlow Lite using the following script:

python tools/conversion/convert_to_tflite.py --backbone_name=StridedInflatedEfficientNet --backbone_version=pro --classifier=gesture_recognition --output_name=model

If you want to convert a custom classifier, set the classifier name to "custom_classifier", and provide the path to the dataset directory used to train the classifier using the "--path_in" argument.

python tools/conversion/convert_to_tflite.py --classifier=custom_classifier --path_in=/path/to/your/checkpoint/ --output_name=model

Gallery

Our gallery lists cool external projects that were built using Sense. Check it out!

Citation

We now have a blogpost you can cite:

@misc{sense2020blogpost,
    author = {Guillaume Berger and Antoine Mercier and Florian Letsch and Cornelius Boehm and 
              Sunny Panchal and Nahua Kang and Mark Todorovich and Ingo Bax and Roland Memisevic},
    title = {Towards situated visual AI via end-to-end learning on video clips},
    howpublished = {\url{https://medium.com/twentybn/towards-situated-visual-ai-via-end-to-end-learning-on-video-clips-2832bd9d519f}},
    note = {online; accessed 23 October 2020},
    year=2020,
}

License

The code is copyright (c) 2020 Twenty Billion Neurons GmbH under an MIT Licence. See the file LICENSE for details. Note that this license only covers the source code of this repo. Pretrained weights come with a separate license available here.

The code makes use of these sounds from freesound:

sense's People

Contributors

antoinemrcr avatar corneliusboehm avatar dependabot[bot] avatar guillaumebrg avatar ingooooooo avatar jlindermeir avatar manikd31 avatar nabeel1234444 avatar nahuakang avatar sunny-panchal avatar yasheshsavani avatar yoga-0125 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sense's Issues

how to use annotation on sense studio?

There are splitted video frames on website of sense video.

Any guidelines or tutoral for that?

Background is supposed to be no action clip.
whats the difference about category_tag1, category_tag2. Both of them distinguish the start and end of each action?

Training code

a great work!Are you willing to provide training code and training set?
Hope your reply! thks

Scripts run and crash out without an error message

ENVIRONMENT
Python = 3.6.12 (attempted on Python versions 3.6.0 - 3.6.8 with the same result)
Conda = 4.9.2
MacOS Catalina 10.15.7

PROBLEM
After following the installation instructions and running PYTHONPATH=./ python3.6 scripts/run_gesture_recognition.py, the camera window is created and subsequently torn down after about 3 seconds (no error reported). Running in verbose mode -v shows that the program reaches this import before cleanup operations for teardown begins.

import 'realtimenet.downstream_tasks.nn_utils' # <_frozen_importlib_external.SourceFileLoader object at 0x123feaa90>

If you need the whole log to help with debugging, please let me know :)

paper

Hi, where can I get your paper?

train my-datasets

1: You didn't say how to make your data set (train your own data set). When you make it on the website, many operations are not good. Didn't you make a detailed readme of the data? I would like to talk about some of the problems I encountered

2: I create a new project through the website. At this time, I write the path as / data, which is a feasible path. But after the creation, I report 404, which shows: 127.0.0.1 -- [03 / Feb / 2021 14:59:41] "get / project / / data http / 1.1" 404-

3: a large number of folders are generated: features train / void logreg, etc., but no matter how I click the annotation button inside, no data is generated

4: You should give the folder of the generated data according to the data.

How to use GPU correctly

Hello, could you please tell me how to start the GPU of this project? I tried '--use_gpu' in the command line, but did not open the GPU. I hope you can answer my question as soon as possible after you see it. Thank you very much

Better reporting of the inference engine fps

Our scripts don't do a good job at warning the user when the script isn't running in optimal conditions. We do have a few prints (e.g. *** Unused frames ***) but these are easy to overlook, especially if you're focused on the debug window. Should the debug window report the inference engine fps? What about the camera fps?

Related issue: #16

dataset

Hello, I downloaded the jester-v1 data set, but using cat 20bn-jester-v1-?? | tar zx to extract the data keeps reporting errors. How did you extract the data?

always shows "doing other things"!

THanks for the model, i downloaded and tested it on my notebook (4 core cpu) using the command
python run_gesture_recognition.py

But the video always shows the result "doing other things" , i made many gestures such as "thumb up,nodding"
the output prints "*** Frame skipped ***"
Is it means the fps 16 is too high?

device

I have tested it on my computer, but performs is not good. what is wrong?

dataset

It is a great work! Is the dataset of fitness activity release?

Gesture recognition script does not quit on Ctrl+C

Running:

(realtimenet) ➜  20bn-realtimenet git:(playground) PYTHONPATH=. python scripts/gesture_recognition.py  

and then input Ctrl+C in the terminal does not properly quit the program. The following output is displayed:

^CTraceback (most recent call last):
  File "scripts/gesture_recognition.py", line 78, in <module>
    path_out)
  File "/home/nahua/projects/20bn/20bn-realtimenet/realtimenet/engine.py", line 97, in run_inference_engine
    img_tuple = framegrabber.get_image()
  File "/home/nahua/projects/20bn/20bn-realtimenet/realtimenet/camera.py", line 63, in get_image
    return self.frames.get()
  File "/home/nahua/anaconda3/envs/realtimenet/lib/python3.6/queue.py", line 164, in get
    self.not_empty.wait()
  File "/home/nahua/anaconda3/envs/realtimenet/lib/python3.6/threading.py", line 295, in wait
    waiter.acquire()
KeyboardInterrupt
*** Frame skipped ***
*** Frame skipped ***
*** Frame skipped ***
*** Frame skipped ***
*** Frame skipped ***
*** Frame skipped ***
*** Frame skipped ***
*** Frame skipped ***
*** Frame skipped ***
*** Frame skipped ***
*** Frame skipped ***
^CException ignored in: <module 'threading' from '/home/nahua/anaconda3/envs/realtimenet/lib/python3.6/threading.py'>
Traceback (most recent call last):
  File "/home/nahua/anaconda3/envs/realtimenet/lib/python3.6/threading.py", line 1294, in _shutdown
    t.join()
  File "/home/nahua/anaconda3/envs/realtimenet/lib/python3.6/threading.py", line 1056, in join
    self._wait_for_tstate_lock()
  File "/home/nahua/anaconda3/envs/realtimenet/lib/python3.6/threading.py", line 1072, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
KeyboardInterrupt

And the program is terminated on the second input of Ctrl+C.

What method is used from 2D convolution expansion to 3D convolution?

As you said on your twentybn community blog, "We took inspiration from Carreira and Zisserman (2018) and temporarily inflated 8 2D convolutions, effectively converting them to 3D convolutions", from 2D convolution to Is the three-dimensional convolution performed using the temporal shift module (TSM)?

Add descriptions to existing gestures

For some of the supported gestures in the demos we provide, it's unclear what activity triggers them. We should add a description for each and maybe even a little gif.

Need Help, very bad performance with my own dataset

Halloechen~~~, many thanks for this great project in Advance.

When I tried this project with my own dataset, it shows very bad performance. The task is addressed to classify the body rotation (min 3* turn left and turn right)【A】and other body pose 【B】.

I followed this tutorial in the readme step by step, the accuracy stepped to 100% both in train and valid very immediately. When I checked with run_custom_classifier.py , the results would be that it showed A when my body even did very tiny movement, so B appears quite rarely.

The data samples in both labels are almost balanced. The average length of video clips is around 10 seconds.

what would be the root of my problem? how can I fix it? In this case, which parameters are needed to modified.

Thank you guys.

Can you provide the format of the “project_config.json”

I run this tools/sense_studio/sense_studio.py successfully.But I don’t have a camera and I can’t record.I moved the training and test videos to the corresponding directory.But ‘classes’ in 'project_config.json' has not changed.So I failed in training.Can you provide the format of the “project_config.json”.Thanks

Modularize run_* scripts into a controller

For the run_*.py scripts living under the 20bn-realtimenet/scripts/ directory, there seems to be some opportunity for modularization to extract some shared functions and be DRY.

I'll work on abstracting a controller.py file.

Unable to download Jester

Hello,

I can't download Jester, is there a way I could have the pretrained models from you?

Would really appreciate it.

Training 20bn data

Hello, 20bn data has been cut into frames. What should you do with sense? Is there a training course for this? We're doing it

Need help!!!very bad performance with hmdb51 dataset!!!

Hallo echen~~~, many thanks for this great project in Advance.

When I tried this project with hmdb51 dataset, it shows very bad performance!!!I followed this tutorial in the readme step by step, the accuracy stepped to 100% both in train and valid very immediately. When I checked with run_custom_classifier.py , the results would be that it showed A when my body even did very tiny movement, so B appears quite rarely.

why are my fps rates so low

Thank you for your amazing work!

I was trying to run the examples/ run_gesture_recognition.py , with cpu (i5-8400 CPU @ 2.80GHz) on win10
I thought it would be a real- time recognizer but the camera / model fps seemed to be very low:
around 13 / 1.2 fps for efficientnet
and around 14 / 3.5 fps for mobilenetv2.

I don't know if it is normal,
do you have any suggestions ?
image
image

model fps

Excuse me, I have tested your model and my own training model, and I have some doubts!

1: Whether a CPU device can use mobilenet (not using efficiency net) and achieve normal effect without frame skipped

2: The GPU of your model's efficient net and mobile net can't frame skip. Would you like to know if your camera FPS is simply displayed by opencv? Is your model FPS reasoning speed? I can see by adjusting FPS and step_ Size to change

3: I mainly want to ask about speed optimization. Does your realtime mean camera FPS or? How to optimize our CPU if we want to run in real time? My deployment device is CPU

model fps
请问一下,我分别测试了您的模型和我自己训练的模型,有了一些些疑惑!
1:一个较若的cpu设备,能否使用mobilenet (不使用efficient net), 能不能达到正常的效果,不出现Frame skipped.
2: 您模型的efficient net 和mobile net 我这边gpu都不会frame skip ,想请问您的camera fps 是单纯opencv显示的fps吗? 您的model fps 是推理速度吗? 我看到可以通过调整fps 和step_size 来改变
3:我主要还是想问速度优化部分,您的realtime 意思是camera fps还是? 我们cpu想跑实时该怎么优化呢?我部署设备是cpu

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.