Video Frame Prediction & Image Segmentation Using Generative Adversarial Network & U-Net

Overview

In the rapidly evolving field of computer vision, the ability to accurately predict and segment future frames in video sequences presents both significant challenges and opportunities for advancements in various applications, from autonomous driving to video surveillance. This repo contains the codes required to replicate the work by Team 14 in the DS-GA 1008 Final Competition, where our objective was to leverage deep learning models to generate the semantic segmentation mask of the last frame based on the first 11 frames of video sequences.

Code Usage

Dataset

First, add the provided dataset folder to this directory

FutureGAN

First, create and activate the FutureGAN-conda environment:

$ conda env create -f FutureGAN.yml
$ source activate FutureGAN

To train the model on the "train" dataset:

$ python train.py --data_root='<path_to_train_folder>' --nframes_in=11 --nframes_pred=11

To train the model on the "unlabeled" dataset:

$ python train.py --data_root='<path_to_unlabeled_folder>' --nframes_in=11 --nframes_pred=11

When training on HPC, we ran into an issue where the images are not sorted in chronological order of the video. This was not an issue on Lightning Studio. If you run into this issue, you may run the following script to rename the images so that they are sorted in chronological order:

$ python rename.py --dir='<path_to_training_videos>'

To evaluate the model on the "val" dataset:

$ python eval.py --model_path='<path_to_generator_ckpt>' --data_root='<path_to_val_folder>' \
--test_dir='./validation_result' --nframes_pred=11 --nframes_in=11 --resl=128 \
	--metrics='mse' --metrics='psnr'

The path to the trained models are

128x128 trained on "labeled" dataset: logs\final\ckpts\gen_E121_I40020_R128x128_final.pth.tar
32x32 trained on "unlabled" dataset: logs\final\ckpts\gen_E80_I119057_R32x32_stab.pth.tar

To duplicate frames in the "hidden" folders:

$ python dup.py

To evaluate the model on the "hidden" dataset:

$ python eval.py --model_path='<path_to_generator_ckpt>' --data_root='<path_to_hidden_folder>' \
--test_dir='./hidden_result' --nframes_pred=11 --nframes_in=11

To resize the 22nd predicted frames and save as input for U-Net:

$ python resize.py --source_dir='<path_to_prediction_folder>' --destination_dir='<path_to_save_22nd_frames>'

U-Net

Training

The train.ipynb notebook has all the required code for building and training a model. Trained model checkpoints will appear in the models folder. The trained model for submission is labeled submission_model.pt while all other trained models will appear as model_{epoch}.pt.

Hidden Mask Prediction

The predict_hidden.ipynb notebook is designed to process the output images generated by the FutureGAN model and apply the U-Net model to predict masks for these images. This notebook serves as the final step in evaluating the model's performance on hidden or unlabeled data.

Steps to Follow:

Set Paths: Ensure that the paths to the FutureGAN output and the U-Net model weights are correctly set in the notebook. These paths must be updated to reflect the locations where the generated images and the model weights are stored on your system.
Run Notebook: Run the notebook cells sequentially to generate predictions for the hidden masks. The model will process each image and apply the trained U-Net model to predict the corresponding mask.
Save Output: The output will be saved in a .pt file, which contains all the predicted masks for the hidden dataset. The path for saving this file can also be adjusted as needed within the notebook.
Visualize Predictions: Additionally, the notebook provides visualizations of some predicted masks at the end, allowing for a quick qualitative assessment of the model's performance on unseen data.

Requirements:

Before running the notebook, ensure that environment has been set up through the .yml file in the FutureGAN folder.

hln2020 / computer-vision Goto Github PK

computer-vision's Introduction

Video Frame Prediction & Image Segmentation Using Generative Adversarial Network & U-Net

Overview

Code Usage

Dataset

FutureGAN

U-Net

Training

Hidden Mask Prediction

Steps to Follow:

Requirements:

computer-vision's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs