GithubHelp home page GithubHelp logo

martinetoering / 3d-resnets-pytorch-timecycle Goto Github PK

View Code? Open in Web Editor NEW
1.0 0.0 0.0 44.73 MB

Video classification project using 3DCNNs and multiple self-supervised pretraining tasks (June, 2018).

License: MIT License

Python 100.00%
video video-classification 3d-convolutional-network 3d-convnet self-supervised-learning video-recognition pytorch

3d-resnets-pytorch-timecycle's Introduction

Video Classification from scratch

This repository contains code for the project Video Classification from scratch, a project which aims to remove the dependency on pretraining in video classification by leveraging self-supervised learning to improve performance.

The project is a bachelor graduation project from June 2018: Report.

Introduction

Every minute, 400 hours of video are uploaded to YouTube. Video being such a popular content type today has positively impacted the development of algorithms that attempt to extract semantic information from video, such as video classification. However, models often require initial model weights that are obtained by pretraining on large-scale datasets which is expensive and time-consuming.

Sequential data such as video contains a considerably larger amount of temporal information than images. This project researches whether it could be viable to remove the process of pretraining and train the network from scratch while combining additional utilization of spatiotemporal information.

The proposed architecture is a multi-branch architecture composed of three components: (1) video classification (2) video tracking task (3) video direction task. With methods (2) and (3), representations of the video are learned in a self-supervised manner in which labels are automatically obtained. This model uses data more efficiently, as no other data or annotations are needed.

The proposed multi-branch network architecture consists of two self-supervised learning tasks.

The combined training of self-supervised learning and video classification is a novel approach that contributes to progress in both fields. Our model shows a significant improvement and favourable results on the HMDB-51 dataset in comparison with random initialization.

Predictions for several samples from the HMDB-51 dataset with ground truth label in blue, correct prediction in green and incorrect prediction in red.

Instructions

Preprocess

Follow 3D-ResNets-PyTorch and change and run utils/generate_filelist.py

Run

For example, this command can be used to train ResNet 50 model on split 1 of HMDB-51:

python3 main.py --timecycle_weight 25 --binary_class_weight 2 --annotation_path hmdb51_1.json --list hmdb_1.txt --result_path res50_bin_test --videoLen 3 --frame_gap 4 --predDistance 0 --gpu_id 0

Acknowledgements

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.