GithubHelp home page GithubHelp logo

mmsys's Introduction

Improved Multimodal Real-Time Data Loading

This is the repository for the project "Improved Multimodal Real-Time Data Loading" for SNU's SP2023 Big Data and AI Systems course. We implemented a dataloader for multimodal models that loads, processes, transforms, and encodes each data modality individually, so they can take advantage of parallelism and show increased performance.

Requirements

  • Python 3.10
  • PyTorch 2.0.1
  • CUDA 11.2

Quick Start

Install Requirements

To install, run the following commands in an environment where cuda and GPU usage is available:

conda create -n mmsys python=3.10
conda activate mmsys
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt

How to use MultimodalDataLoadManager

  • This is an example for image-audio multimodal system
from multimodal_dataloader import MultimodalDataLoadManager, RealTimeDataPipe
from realtime_dataloader import DataLoader as RealTimeDataLoader

# Your multimodal model is here
model = model.cuda()
model.eval()

# Create RealTimeDataPipes and RealTimeDataLoaders
datapipes = {}
dataloaders = {}

datapipes['image'] = RealTimeDataPipe(read_image, preprocess_image)
dataloaders['image'] = RealTimeDataLoader(datapipes['image'], num_workers = 1, batch_size = 1, shuffle = False)

datapipes['audio'] = RealTimeDataPipe(read_audio, preprocess_audio)
dataloaders['image'] = RealTimeDataLoader(datapipes['audio'], num_workers = 1, batch_size = 1, shuffle = False)

# Define feature extractor dictionary
feature_extractors = {}
feature_extractors['image'] = model.extract_feature_image
feature_extractors['audio'] = model.extract_feature_audio

# Create MultimodalDataLoadManager
manager = MultimodalDataLoadManager(
    dataloaders,
    feature_extractors,
)

# Inference loop
for step in range(5):
    features = manager.get_data() # Dict()

    output = model.fusion(features)
    ...

Experiments

Run Synthetic Experiments

python benchmark_synthetic.py --config <config_file> [--num_modes <num_modes>]

All configs for the synthetic experiments are in configs/synthetic folder

  • For reproducing the results on Figure 3 in the reports
python scripts/run_synthetic_resnet.py
  • For reproducing the results on Figure 4 in the reports
python scripts/run_synthetic_nummodes.py

Run AVSR Experiments

  • To run the AVSR experiments, you'll need to install a pretrained AVSR model from their original model zoo. Install the AutoAVSR model for Lip Reading Sentences 3 (LRS3) using Audio-visual components. Then, move this model to
ASVR/Visual_Speech_Recognition_for_Multiple_Languages/benchmarks/LRS3/models

and unzip it.

Then, run the following code to generate a VizTracer log file for the original model:

viztracer --log_sparse avsr_demo.py config_filename=configs/LRS3_AV_WER0.9.ini data_filename=../example_videos/aisystems_demo detector=mediapipe parallel=False -o logs/baseline/sequential_avsr_demo.json

And for the parallel (our) version:

viztracer --log_sparse avsr_demo.py config_filename=configs/LRS3_AV_WER0.9.ini data_filename=../example_videos/aisystems_demo detector=mediapipe parallel=True -o logs/ours/parallel_avsr_demo.json

You can then check the loadtimes using

python check_runtime.py -f logs/baseline/sequential_avsr_demo.json

python check_runtime.py -f logs/ours/parallel_avsr_demo.json

Note that the load times will be inconsistent; though it is likely our parallel version will run faster, it is not guaranteed. Runtimes will also differ across GPU's. Several runs may required to see average results.

You can also view the VizTracer stack traces and see the parallel vs. sequential traces by running:

vizviewer logs/baseline/sequential_avsr_demo.json --port 4112

vizviewer logs/ours/parallel_avsr_demo.json --port 4112

mmsys's People

Contributors

koverbay avatar yhytoto12 avatar dqj5182 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.