GithubHelp home page GithubHelp logo

thuiar / mintrec Goto Github PK

View Code? Open in Web Editor NEW
68.0 68.0 12.0 1.53 MB

MIntRec: A New Dataset for Multimodal Intent Recognition (ACM MM 2022)

Home Page: https://mintrec.github.io/

License: MIT License

Python 100.00%
acm-mm acm-mm-22 artificial-intelligence multimodal-deep-learning multimodal-fusion multimodal-intent-analysis speaker-recognition

mintrec's People

Contributors

hanleizhang avatar mrfocusxin avatar tengjiayan20 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

mintrec's Issues

Code issues

Hello, I have followed the deployment in the readme file, but the results from the run are train_loss = nan, best_eval_score = 0.0114, eval_score = 0.0114. What could be the reason for this?

audio_feats load 报错

大佬您好! 我在运行您的代码时audio_pre.py中的这行audio_feats = pickle.load(f)代码报错,提示ValueError: could not convert string to float,您知道什么原因吗?

I find a typo.

I've been clone-coded to learn how MIntRec system goes.

While working on Methds >> MAG_BERT >> manager.py, in line 58, I think it should be "for epoch in range" rather than "for epoch in trange".

Though it's trivial, I just wanted to let you guys know. Thank you.

是否有声音模态和视觉模态的实验结果

您好!

在论文的表4中,您单独列出了文本模态的识别结果,请问有对其它两个模态做实验嘛?
图片

我对声音和视觉两种模态分别做实验,二分类acc不到60%,所以想问下您这边得到的结果,看哪里出了问题。

谢谢!

video_preprocess.py

您好,我最近在尝试使用您的video_preprocess对我的视频进行处理,但是遇到了几个问题,首先是TalkNet,请问这是要自行寻找的吗
image
第二个问题是我想知道video_preprocess的输入内容是什么,是视频吗?我已经将视频放在下述的文件夹了,但是依旧没有效果
image

Welcome update to OpenMMLab 2.0

Welcome update to OpenMMLab 2.0

I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.

Here are the OpenMMLab 2.0 repos branches:

OpenMMLab 1.0 branch OpenMMLab 2.0 branch
MMEngine 0.x
MMCV 1.x 2.x
MMDetection 0.x 、1.x、2.x 3.x
MMAction2 0.x 1.x
MMClassification 0.x 1.x
MMSegmentation 0.x 1.x
MMDetection3D 0.x 1.x
MMEditing 0.x 1.x
MMPose 0.x 1.x
MMDeploy 0.x 1.x
MMTracking 0.x 1.x
MMOCR 0.x 1.x
MMRazor 0.x 1.x
MMSelfSup 0.x 1.x
MMRotate 1.x 1.x
MMYOLO 0.x

Attention: please create a new virtual environment for OpenMMLab 2.0.

Error when I followed Quick start

I followed the Quick start.
But when I do pip install -r requirements.txt, always get error:
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects

The video_preprocess.py

Is args.TalkNet_speaker_path an empty folder? args.TalkNet_speaker_path="MIA/datasets/speaker_annotation/Talknet"
image

代码逻辑问题咨询

您好!MISA.py中这行代码 self._extract_features(video_feats, lengths, self.vrnn1, self.vrnn2, self.vlayer_norm)中的lengths不应该传入的是文本的长度吧?应该是viedo没有填充前的长度吧

Experiment Model Checkpoint Inquiry

I recently read your amazing paper and am interested in exploring improved methods for intent recognition based on its content. To do this, I believe it's crucial to first experience the end-to-end pipeline of this paper's experiment. I appreciate that the code for the experiment has been generously made available, but I also want to experience the process of feature extraction for Raw Audio, Raw Video, and Raw Text.

Would it be possible for you to share the pre-trained model checkpoint that was used in writing this paper and for the related experiment? I noticed in the paper that Bert-base-uncased was used for the Text modality's Feature Extractor, but it seems the details for other modalities have not been disclosed, hence I am leaving this issue.

Thank you once again for writing and sharing such a paper & implementation code

Code usage tutorial

Hi, when will the tutorial for using this code be released. For example, the processing details of the data, and the display of the results, etc. Looking forward to your update.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.