GithubHelp home page GithubHelp logo

computer-vision's Introduction

Video Frame Prediction & Image Segmentation Using Generative Adversarial Network & U-Net

Overview

In the rapidly evolving field of computer vision, the ability to accurately predict and segment future frames in video sequences presents both significant challenges and opportunities for advancements in various applications, from autonomous driving to video surveillance. This repo contains the codes required to replicate the work by Team 14 in the DS-GA 1008 Final Competition, where our objective was to leverage deep learning models to generate the semantic segmentation mask of the last frame based on the first 11 frames of video sequences.

Code Usage

Dataset

First, add the provided dataset folder to this directory

FutureGAN

First, create and activate the FutureGAN-conda environment:

$ conda env create -f FutureGAN.yml
$ source activate FutureGAN

To train the model on the "train" dataset:

$ python train.py --data_root='<path_to_train_folder>' --nframes_in=11 --nframes_pred=11

To train the model on the "unlabeled" dataset:

$ python train.py --data_root='<path_to_unlabeled_folder>' --nframes_in=11 --nframes_pred=11

When training on HPC, we ran into an issue where the images are not sorted in chronological order of the video. This was not an issue on Lightning Studio. If you run into this issue, you may run the following script to rename the images so that they are sorted in chronological order:

$ python rename.py --dir='<path_to_training_videos>'

To evaluate the model on the "val" dataset:

$ python eval.py --model_path='<path_to_generator_ckpt>' --data_root='<path_to_val_folder>' \
--test_dir='./validation_result' --nframes_pred=11 --nframes_in=11 --resl=128 \
	--metrics='mse' --metrics='psnr'

The path to the trained models are

  • 128x128 trained on "labeled" dataset: logs\final\ckpts\gen_E121_I40020_R128x128_final.pth.tar
  • 32x32 trained on "unlabled" dataset: logs\final\ckpts\gen_E80_I119057_R32x32_stab.pth.tar

To duplicate frames in the "hidden" folders:

$ python dup.py

To evaluate the model on the "hidden" dataset:

$ python eval.py --model_path='<path_to_generator_ckpt>' --data_root='<path_to_hidden_folder>' \
--test_dir='./hidden_result' --nframes_pred=11 --nframes_in=11

To resize the 22nd predicted frames and save as input for U-Net:

$ python resize.py --source_dir='<path_to_prediction_folder>' --destination_dir='<path_to_save_22nd_frames>'

U-Net

Training

The train.ipynb notebook has all the required code for building and training a model. Trained model checkpoints will appear in the models folder. The trained model for submission is labeled submission_model.pt while all other trained models will appear as model_{epoch}.pt.

Hidden Mask Prediction

The predict_hidden.ipynb notebook is designed to process the output images generated by the FutureGAN model and apply the U-Net model to predict masks for these images. This notebook serves as the final step in evaluating the model's performance on hidden or unlabeled data.

Steps to Follow:

  1. Set Paths: Ensure that the paths to the FutureGAN output and the U-Net model weights are correctly set in the notebook. These paths must be updated to reflect the locations where the generated images and the model weights are stored on your system.
  2. Run Notebook: Run the notebook cells sequentially to generate predictions for the hidden masks. The model will process each image and apply the trained U-Net model to predict the corresponding mask.
  3. Save Output: The output will be saved in a .pt file, which contains all the predicted masks for the hidden dataset. The path for saving this file can also be adjusted as needed within the notebook.
  4. Visualize Predictions: Additionally, the notebook provides visualizations of some predicted masks at the end, allowing for a quick qualitative assessment of the model's performance on unseen data.

Requirements:

Before running the notebook, ensure that environment has been set up through the .yml file in the FutureGAN folder.

computer-vision's People

Contributors

hln2020 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.