tlgan's Introduction

Disentangling Random and Cyclic Effects in Time-Lapse Sequences

Disentangling Random and Cyclic Effects in Time-Lapse Sequences
Erik Härkönen¹, Miika Aittala², Tuomas Kynkäänniemi¹, Samuli Laine², Timo Aila², Jaakko Lehtinen^1,2
¹Aalto University, ²NVIDIA
https://doi.org/10.1145/3528223.3530170

Abstract: Time-lapse image sequences offer visually compelling insights into dynamic processes that are too slow to observe in real time. However, playing a long time-lapse sequence back as a video often results in distracting flicker due to random effects, such as weather, as well as cyclic effects, such as the day-night cycle. We introduce the problem of disentangling time-lapse sequences in a way that allows separate, after-the-fact control of overall trends, cyclic effects, and random effects in the images, and describe a technique based on data-driven generative models that achieves this goal. This enables us to ``re-render'' the sequences in ways that would not be possible with the input images alone. For example, we can stabilize a long sequence to focus on plant growth over many months, under selectable, consistent weather.
Our approach is based on Generative Adversarial Networks (GAN) that are conditioned with the time coordinate of the time-lapse sequence. Our architecture and training procedure are designed so that the networks learn to model random variations, such as weather, using the GAN's latent space, and to disentangle overall trends and cyclic variations by feeding the conditioning time label to the model using Fourier features with specific frequencies.
We show that our models are robust to defects in the training data, enabling us to amend some of the practical difficulties in capturing long time-lapse sequences, such as temporary occlusions, uneven frame spacing, and missing frames.

Video: https://youtu.be/UrQ3tOfpjuA

Setup

See the setup instructions.

Dataset preparation

See the dataset preprocessing instructions.

Usage

Training a model
First, go through the dataset preparation instructions above to produce a dataset zip.

# Print available options
python train.py --help

# Train TLGAN on Valley using 4 GPUs.
python train.py --outdir=~/training-runs --data=~/datasets/valley_1024x1024_2225hz.zip --gpus=4 --batch=32

# Train an unconditional StyleGAN2 on Teton using 2 GPUs.
python train.py --outdir=~/training-runs --data=~/datasets/teton_512x512_2225hz.zip --gpus=2 --batch=32 --cond=none --metrics=fid50k_full

Model visualizer
The interactive model visualizer can be used to explore the effects of the conditioning inputs and the latent space.

# Visualize a trained model
python visualize.py path/to/model.pkl

The UI can be scaled with the button in the top-right corner. The UI can be made fullscreen by pressing F11.

Grid visualizer
The input grid visualizer can be used to create 2D image grids, time-lapse images (stacked strips), and videos. All exported files (jpg, png, mp4) contain embedded metadata with all UI element states. This enables previously exported data to be loaded back into the UI via drag-and-drop.

# Open trained model pickle in grid visualizer
python grid_viz.py /path/to/model.pkl

# Reopen UI and load state from previously exported image
python grid_viz.py /path/to/image.png

Dataset visualization
Both visualizers can display dataset frames that most closely match the current conditioning variables. Set environment variable TLGAN_DATASET_ROOT or pass argument --dataset_root to specify the directory in which datasets are stored.

Downloads

Pre-trained models
Supplemental material (zip, 303 MB)

Known issues

NVJPEG does not work correctly with CUDA 11.0 - 11.5 ¹. CPU decoding will be used instead, leading to reduced performance. Affects preproc/process_sequence.py, grid_viz.py, and visualize.py.

Citation

@article{harkonen2022tlgan,
  author    = {Erik Härkönen and Miika Aittala and Tuomas Kynkäänniemi and Samuli Laine and Timo Aila and Jaakko Lehtinen},
  title     = {Disentangling Random and Cyclic Effects in Time-Lapse Sequences},
  journal   = {{ACM} Trans. Graph.},
  volume    = {41},
  number    = {4},
  year      = {2022},
}

License

The code of this repository is based on StyleGAN3, which is released under the NVIDIA License.
All modified source files are marked separately and released under the CC BY-NC-SA 4.0 license.
The files in ./ext are provided under the MIT license.
The file mutable_zipfile.py is released under the Python License.
The included Roboto Mono font is licensed under the Apache 2.0 license.

tlgan's People

Contributors

Stargazers

Watchers

tlgan's Issues

Which VS version to compile? error happens when conda env -prune the corresponding cuxx.yml file

cmake C:\Users\Administrator\AppData\Local\Temp\pip-install-qv_cg26i\simpleimageio_1f2a96f5db3b4f179142cbfeba783f7d -DPYLIB=C:\Users\Administrator\AppData\Local\Temp\pip-install-qv_cg26i\simpleimageio_1f2a96f5db3b4f179142cbfeba783f7d\build\lib.win-amd64-cpython-39\simpleimageio -DCMAKE_BUILD_TYPE=Release
-- Building for: Visual Studio 17 2022
-- The C compiler identification is MSVC 19.34.31942.0
-- The CXX compiler identification is MSVC 19.34.31942.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - failed
-- Check for working C compiler: D:/VS2022/VC/Tools/MSVC/14.34.31933/bin/Hostx64/x64/cl.exe
-- Check for working C compiler: D:/VS2022/VC/Tools/MSVC/14.34.31933/bin/Hostx64/x64/cl.exe - broken
CMake Error at C:/Users/Administrator/AppData/Local/Temp/pip-build-env-_9fg2wxv/overlay/Lib/site-packages/cmake/data/share/cmake-3.25/Modules/CMakeTestCCompiler.cmake:70 (message):
The C compiler

      "D:/VS2022/VC/Tools/MSVC/14.34.31933/bin/Hostx64/x64/cl.exe"

    is not able to compile a simple test program.

add model to Hugging Face Hub

Hi!

Congrats on the work being accepted at SIGGRAPH 2022, would you be interested in sharing your models in the Hugging Face Hub? The Hub offers free hosting and it would make your work more accessible and visible to the rest of the ML community.

Some of the benefits of sharing your models through the Hub would be:

versioning, commit history and diffs
repos provide useful metadata about their tasks, languages, metrics, etc that make them discoverable
multiple features from TensorBoard visualizations, PapersWithCode integration, and more
wider reach of your work to the ecosystem

Creating the repos and adding new models should be a relatively straightforward process if you've used Git before. This is a step-by-step guide explaining the process in case you're interested. Please let us know if you would be interested and if you have any questions.

also this month we have a event for siggraph: https://huggingface.co/SIGGRAPH2022

it would be great if you can join here and add the models here as well and a gradio demo for the model

here is a example gradio demo: https://huggingface.co/spaces/SIGGRAPH2022/StyleGAN-XL

Happy to hear your thoughts,
Ahsen and the Hugging Face team

Recommend Projects

harskish / tlgan Goto Github PK