GithubHelp home page GithubHelp logo

clealiya / multimodal-model Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 0.0 15.87 MB

[FR|EN - Trio] 2023 - 2024 Centrale Méditerranée AI Master | Multimodal retranscription with text, audio and video

Python 71.49% TeX 28.51%
ai deep-learning machine-learning multimodal multimodal-fusion multimodality

multimodal-model's Introduction

Projet SAM

Our project focuses on multimodal approaches for predicting turn-taking changes in natural conversations. The goals of this project enable us to introduce various concepts of textual, visual, and auditory modality, as well as to compare and explore different multimodal processing models and their fusion.

Requirements

To run the code you need python (We use python 3.9.13) and packages that is indicate in requirements.txt. You can run the following code to install all packages in the correct versions:

pip install -r requirements.txt

Launch the code

The main.py script is the main entry point for this project. It accepts several command-line arguments to control its behavior:

  • --mode or -m: This option allows you to choose a mode between 'train', 'test' and 'latefusion'.
  • --config_path or -c: This option allows you to specify the path to the configuration file. The default is config/config.yaml. Use only for the training.
  • --path or -p: This option allows you to specify the experiment path for testing.
  • --task or -t: This option allows you to specify the task for the model. This will overwrite the task specified in the configuration file for training. It's can be text, audio, video, or multi, that will train a new experiments with this type of data. The task multi will be use the unimodal models that is indicate in the parameter 'load' in the configuration file.

Mode

Here's what each mode does:

  • train: Trains a model using the configuration specified in the --config_path and the task specified in --task.
  • test: Tests the model specified in the --path. You must specify a path.
  • latefusion: test a late fusion with models which is in load in the configuration file. There's no need to train this model, as it has no learnable parameters.

Configuration

The configuration of the model and the training process is done through a YAML file. You can specify the path to this file with the --config_path option. The default path is config/config.yaml.

The configuration file includes various parameters such as the learning rate, batch size, number of epochs, etc.

Help

To get a list of all available options, you can use the -h or --help option:

python main.py --help

This will display a help message with a description of all available options.

Example

Here's an example of how to use the script to train a model:

python main.py --mode train --config_path config/config.yaml --task text

This command will train a model using the configuration specified in config/config.yaml with a task=text.

Here's an example of how to run a test on the experiment separete:

python main.py --mode test --path logs/multi_4

Models

Unimodal Models

TEXT

AUDIO

VIDEO

Multimodal Models

LATE FUSION

EARLY FUSION

Results

Model test results. The LATE and EARLY FUSION do not use the VIDEO model.

Models Accuracy Precision Recall $f_1$ score
TEXT 82.8 41.3 50.0 45.3
AUDIO 47.1 48.5 47.4 41.5
VIDEO 82.9 41.4 50.0 45.2
LATE FUSION 78.5 50.6 50.1 48.8
EARLY FUSION 82.9 43.6 50.2 45.7

multimodal-model's People

Contributors

clealiya avatar

Stargazers

Nur Arifin Akbar avatar MelonJack avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.