GithubHelp home page GithubHelp logo

videointentdiscovery's Introduction

Multimodal Intent Discovery from Livestream Videos

PyTorch code for the Findings of NAACL 2022 paper "Multimodal Intent Discovery from Livestream Videos"

Requirements:

This code has been tested on torch==1.9.0 and transformers==4.3.2. Other requirements are moviepy for splicing videos.

Data:

We are releasing two datasets in this paper:

  • Behance Intent Discovery Dataset
    This is a dataset containing ~20K sentences with manual annotations for tool and creative intents (see paper) and accompanied by timestamps for the livestream video they have been taken from.
    The files are available in the ./data/bid/ folder.
    Use ./scripts/download_videos.py to download and splice the videos for the timestamps present in the dataset.
    We follow the HERO paper for extracting video representations; see this repository for extraction code.
  • Behance Livestreams Corpus: This is the larger unlabelled corpus containing nearly 8K full-length videos and their respective transcripts (download scripts coming soon).

Models:

The scripts for training the models presented in the paper are available under ./model/.

To train the unimodal RoBERTa model on the Behance Intent Discovery dataset, run

bash behance_unimodal.sh <GPU_ID>

To train the multimodal late fusion RoBERTa model on the Behance Intent Discovery dataset, run:

bash behance_late_fusion.sh <feature_type> <path_to_feature_directory> <GPU_ID>

To train the multimodal late fusion RoBERTa model on the Behance Intent Discovery dataset, run:

bash behance_late_fusion.sh <feature_type> <path_to_feature_directory> <GPU_ID>

Dockerized containers for training HERO + Late Fusion and ClipBERT + Late Fusion models are coming soon.

Acknowledgement:

The code in this repository has been adapted from BOND and HERO codebases.

videointentdiscovery's People

Contributors

adymaharana avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

saransh03sharma

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.