jperezrua / mfas Goto Github PK
View Code? Open in Web Editor NEWImplementation of CVPR 2019 paper "Mfas: Multimodal fusion architecture search"
Implementation of CVPR 2019 paper "Mfas: Multimodal fusion architecture search"
Hi, thank u for your great work! ๐
I am a little confused about the meaning of "self.alphas = self._create_alphas()" in Searchable_xxx_Net class? What was _create_alphas() function used for?
Dear Authors,
Would you mind sharing the scripts to prepare the mm_imdb raw dataset? Or could you tell me if my interpretation is right or not.
This part in your datasets/mm_imdb.py:
image = np.load(imagepath)
label = np.load(labelpath)
text = np.load(textpath)
The "image" is the poster image, the "label" is "genres", and the text is "plot", here a sample from mm_imdb raw dataset:
"plot": [
"A stationary camera looks at a large anvil with a blacksmith behind it and one on either side. The smith in the middle draws a heated metal rod from the fire, places it on the anvil, and all three begin a rhythmic hammering. After several blows, the metal goes back in the fire. One smith pulls out a bottle of beer, and they each take a swig. Then, out comes the glowing metal and the hammering resumes.",
"Three men hammer on an anvil and pass a bottle of beer around."
],
"votes": 1335,
"title": "Blacksmith Scene",
"smart canonical title": "Blacksmith Scene",
"long imdb canonical title": "Blacksmith Scene (1893)",
"certificates": [
"USA:Unrated"
],
"long imdb title": "Blacksmith Scene (1893)",
"country codes": [
"us"
],
"smart long imdb canonical title": "Blacksmith Scene (1893)",
"cover url": "http://ia.media-imdb.com/images/M/MV5BNDg0ZDg0YWYtYzMwYi00ZjVlLWI5YzUtNzBkNjlhZWM5ODk5XkEyXkFqcGdeQXVyNDk0MDg4NDk@._V1. _SX100_SY75_.jpg",
"sound mix": [
"lent"
],
"genres": [
"Short"
],
I am trying to reproduce the unimodal and multimodal results reported in the paper. I got following accuracies by running the scripts provided in this repo:
best_3_1_1_1_3_0_1_1_1_3_3_0_0.9134.checkpoint: 90.03%
conf_[[3_0_0][1_3_0][1_1_1]_[3_3_0]]_both_0.896888457572633.checkpoint: 88.64%
As you see, the results reasonable (still about 1% less than the numbers you got) which implies that I have setup the dataset correctly.
On the other hand, I get very different results from Skeleton unimodal net. I used the provided pre-trained checkpoints for each modality and loaded them into models.central.Visual and models.central.Skeleton modules. I wrote a simple script to forward and compute the accuracy of these modules. The result (especially for skeleton net) are very different from the paper
skeleton_32frames_85.24.checkpoint: 48.02%
rgb_8frames_83.91.checkpoint: 85.23%
Do you have any idea what I am doing wrong here? I would appreciate your comment.
Dear Author,
Thanks for this work! I'm trying to reproduce the result, first I want to know if av-mnist a public dataset? Because I can't find it. So I'm trying to use mmimdb. And got some questions in addition to #8 :
Counter({'Drama': 13967, 'Comedy': 8592, 'Romance': 5364, 'Thriller': 5192, 'Crime': 3838, 'Action': 3550, 'Adventure': 2710, 'Horror': 2703, 'Documentary': 2082, 'Mystery': 2057, 'Sci-Fi': 1991, 'Fantasy': 1933, 'Family': 1668, 'Biography': 1343, 'War': 1335, 'History': 1143, 'Music': 1045, 'Animation': 997, 'Musical': 841, 'Western': 705, 'Sport': 634, 'Short': 471, 'Film-Noir': 338, 'News': 64, 'Adult': 4, 'Talk-Show': 2, 'Reality-TV': 1})
Sincerely,
Somedaywilldo
Unexpected key(s) in state_dict: "fusion_layers.0.2.weight", "fusion_layers.0.2.bias", "fusion_layers.0.2.running_mean", "fusion_layers.0.2.running_var", "fusion_layers.0.2.num_batches_tracked", "fusion_layers.1.2.weight", "fusion_layers.1.2.bias", "fusion_layers.1.2.running_mean", "fusion_layers.1.2.running_var", "fusion_layers.1.2.num_batches_tracked", "fusion_layers.2.2.weight", "fusion_layers.2.2.bias", "fusion_layers.2.2.running_mean", "fusion_layers.2.2.running_var", "fusion_layers.2.2.num_batches_tracked", "fusion_layers.3.2.weight", "fusion_layers.3.2.bias", "fusion_layers.3.2.running_mean", "fusion_layers.3.2.running_var", "fusion_layers.3.2.num_batches_tracked".
I am testing the network you provided, i am getting the above error regarding the fusion layer weights.
Could you please provide a checkpoint file that has the fusion layer weights as well.
Hi
Thanks for sharing your nice work,
I tried the AVmnist code for uni-modal image classification with different hyper-parameters, but I could not get results better than 65-6% while 75% acc is reported in the paper. Would you kindly guide me how to fix that?
Thanks
Hi~
Thank you for sharing such a great job!
I plan to search an architecture on my own dataset, so I want to know how to get pretrained backbones models, like rgb_8frames_83.91.checkpoint and skeleton_32frames_85.24.checkpoint in your work.
Hi, juanmanpr, thank you for your open source MFAS. But I encounter a bug when I clone your repository on windows. The details of the bug are shown in the follow:
Cloning into 'mfas'...
remote: Enumerating objects: 63, done.
remote: Counting objects: 100% (63/63), done.
remote: Compressing objects: 100% (55/55), done.
remote: Total 63 (delta 20), reused 38 (delta 6), pack-reused 0
Unpacking objects: 100% (63/63), done.
fatal: cannot create directory at 'models/aux': Invalid argument
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'
This bug is caused by the forbidden file name you used in your repository. Here is what MicroSoft said:
Do not use the following reserved names for the name of a file: CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9. Also avoid these names followed immediately by an extension; for example, NUL.txt is not recommended. For more information, see Namespaces.
So I hope you can rename the folder model/aux. Thanks!
Hi~
Thank you for sharing such a great job!
I want to know how to use MFAS to search an architecture on my custom datasets, such as RGB and infrared images.
Hi, when I run the code training the unimodal image network (LeNet5 structure, as depicted in the paper) on the disturbed MNIST (25% energy removed), I obtain an accuracy ~53% instead of ~74% as described in the paper. I also tested in an extreme case when only 1% energy is removed, which gives an accuracy 95% as expected. This implies the problem lies in my dataset, instead of training settings, I believe.
I was wondering what might be the issue? Or have you ever came across this problem before? Thanks for your time.
Hi, I am unable to find the AV-MNIST dataset online. Could you kindly share a link? I am just starting out so am hoping to start with a less complex dataset.
Thanks
I noticed you have a Cifar-10 specialization, is this being used to find the best structure of a CNN ?
Hi,
Congratulations on the work. It seems really intriguing.
I came across a line in the paper:
However, the reader should consider that our fusion approach is in fact not limited to neural networks as primary feature extractors.
I was wondering if you could elaborate on this a little bit.
I was hoping to use a similar approach as mentioned in the paper but I don't want to restrict the search to pre-trained detectors. If I want to search for pre-fusion and post-fusion layers as well, do you think the current framework can handle that? And what would be a good starting point?
I downloaded the dataset according to your instructions, but I stuck in "change all video clips resolution to 256x256 30fps and copy them to / ntrgbd"_ rgb/avi_ 256x256_ 30/ directory.โHow can I change all video clips resolution to 256x256 30fps?
Thank you in advance for your answer.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.