GithubHelp home page GithubHelp logo

grseb9s / 3d-densenet-1 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bityangke/3d-densenet

0.0 3.0 0.0 299 KB

3D Dense Connected Convolutional Network (3D-DenseNet for action recognition)

License: MIT License

Shell 7.20% Python 92.80%

3d-densenet-1's Introduction

3D-DenseNet with TensorFlow

Expand the `Densely Connected Convolutional Networks DenseNets to 3D-DenseNet for action recognition (video classification):

  • 3D-DenseNet - without bottleneck layers
  • 3D-DenseNet-BC - with bottleneck layers

Each model can be tested on such datasets:

A number of layers, blocks, growth rate, video normalization and other training params may be changed trough shell or inside the source code.

There are also many other implementations, they may be useful also.

Pre-request libraries

  • python2
  • tensorflow 1.0
  • opencv2 for python2

Step 1: Data preparation (UCF dataset example)

  1. Download the UCF101 (Action Recognition Data Set).
  2. Extract the UCF101.rar file and you will get UCF101/<action_name>/<video_name.avi> folder structure.
  3. Use the ./data_prepare/convert_video_to_images.sh script to decode the UCF101 video files to image files.
    • run ./data_prepare/convert_video_to_images.sh ../UCF101 5 (number 5 means the fps rate)
  4. Use the ./data_prepare/convert_images_to_list.sh script to create/update the {train,test}.list according to the new UCF101 image folder structure generated from last step (from images to files).
    • run ./data_prepare/convert_images_to_list.sh .../UCF101 4, this will update the test.list and train.list files (number 4 means the ratio of test and train data is 1/4)
    • train.list:
      database/ucf101/train/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01 0
      database/ucf101/train/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c02 0
      database/ucf101/train/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c03 0
      database/ucf101/train/ApplyLipstick/v_ApplyLipstick_g01_c01 1
      database/ucf101/train/ApplyLipstick/v_ApplyLipstick_g01_c02 1
      database/ucf101/train/ApplyLipstick/v_ApplyLipstick_g01_c03 1
      database/ucf101/train/Archery/v_Archery_g01_c01 2
      database/ucf101/train/Archery/v_Archery_g01_c02 2
      database/ucf101/train/Archery/v_Archery_g01_c03 2
      database/ucf101/train/Archery/v_Archery_g01_c04 2
      database/ucf101/train/BabyCrawling/v_BabyCrawling_g01_c01 3
      database/ucf101/train/BabyCrawling/v_BabyCrawling_g01_c02 3
      database/ucf101/train/BabyCrawling/v_BabyCrawling_g01_c03 3
      database/ucf101/train/BabyCrawling/v_BabyCrawling_g01_c04 3
      database/ucf101/train/BalanceBeam/v_BalanceBeam_g01_c01 4
      database/ucf101/train/BalanceBeam/v_BalanceBeam_g01_c02 4
      database/ucf101/train/BalanceBeam/v_BalanceBeam_g01_c03 4
      database/ucf101/train/BalanceBeam/v_BalanceBeam_g01_c04 4
      ...
      
  5. Copy/Cut the test.list and train.list files to the data_providers folders.

Step 2: Train or Test the model

  • Check the trainig help message

    python run_dense_net_3d.py -h

  • Train and test the program

    python run_dense_net_3d.py --train --test
    // Notices that all the logs message will be written in log.txt file in the root folder

Options

  • run_dense_net_3d.py -> train_params_<dataset> settings
    'num_classes': 5,               # The number of the classes that this dataset had
    'batch_size': 10,               # Batch Size When we trian the model
    'n_epochs': 100,                # The total number of epoch we run the model
    'crop_size': (64,64),           # The (width, height) of images that we used to trian the model
    'sequence_length': 16,          # The length of the video clip
    'overlap_length': 8,            # The overlap of the images when we extract the video clips,
                                      this should be less than sequence_length
    'initial_learning_rate': 0.1,
    'reduce_lr_epoch_1': 50,        # epochs * 0.5
    'reduce_lr_epoch_2': 75,        # epochs * 0.75
    'validation_set': True,         # Whether used validation set or not
    'validation_split': None,       # None or float
    'queue_size': 300,              # The data queue size when we extract the data from dataset,
                                      should be set according to your memory size
    'normalization': 'std',         # None, divide_256, divide_255, std
    

Result

Test results on MERL shopping dataset. Video normalization per channels was used. image

Approximate training time for models on GeForce GTX TITAN X (12 GB memory):

  • 3D-DenseNet(k = 12, d = 20) - 25 hrs

3d-densenet-1's People

Contributors

gudongfeng avatar ikhlestov avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.