as said in the journal I extracted frames from a random video (25 frames per second an

Thanks for the immediate feedback, You must be right with <code class="notranslate

Trying to use ypreds = net (text, videos, ind) about mixture-of-embedding-experts HOT 3 CLOSED

estathop commented on September 2, 2024

Trying to use ypreds = net (text, videos, ind)

from mixture-of-embedding-experts.

Comments (3)

antoine77340 commented on September 2, 2024

Hi,
Try replacing:
audio_data = torch.from_numpy(np.empty([1,128*16]))
by
audio_data = torch.from_numpy(np.empty([1,1,128]))
Audio tensor is actually of size NxTxD, where N is the batch_size, T the temporal extent of each sample and D the dimension of the audio features. I am sorry if this was not clear !
Also please replace np.empty by np.zeros, I do not know how pytorch behave when converting empty np array to torch tensors.

Last thing, when you extract the resnet features, please make sure that the model is correctly on the evaluation mode by adding the line:
model.eval()
I happen to forget it once and it extracted the feature with batch norm activated (which you do not want actually).

Hope this helped you !

from mixture-of-embedding-experts.

estathop commented on September 2, 2024

Thanks for the immediate feedback,
You must be right with np.empty as when I checked what the numpy array includes, it has random values.

empt = np.empty([1,128])

print empt
[[  4.66275287e-310   4.66277221e-310   0.00000000e+000   0.00000000e+000
    6.90121489e-310   6.90121492e-310   6.90121492e-310   6.90121491e-310
    6.90121492e-310   6.90121491e-310   6.90121494e-310   6.90121489e-310
    6.90121490e-310   6.90121491e-310   6.90121495e-310   6.90121490e-310
    6.90121495e-310   6.90121490e-310   6.90172708e-310   6.90121494e-310
    6.90121495e-310   6.90180965e-310   6.90180965e-310   6.90121493e-310
    6.90121492e-310   6.90121491e-310   6.90121492e-310   6.90121490e-310
    6.90121490e-310   6.90121494e-310   6.90121490e-310   6.90121494e-310
    6.90121494e-310   6.90121492e-310   6.90121492e-310   6.90121492e-310
    6.90121493e-310   6.90121491e-310   6.90121494e-310   6.90121489e-310
    6.90121491e-310   6.90121494e-310   6.90121492e-310   6.90121494e-310
    6.90121489e-310   6.90121493e-310   6.90121492e-310   6.90121492e-310
    6.90121491e-310   6.90121549e-310   6.90121493e-310   6.90121491e-310
    6.90121494e-310   6.90121489e-310   6.90121493e-310   6.90121489e-310
    6.90121489e-310   6.90121492e-310   6.90121492e-310   6.90121492e-310
    6.90121490e-310   6.90121491e-310   6.90121495e-310   6.90121495e-310
    6.90121490e-310   6.90121491e-310   6.90121494e-310   6.90121494e-310
    6.90121494e-310   6.90121491e-310   6.90121495e-310   6.90121492e-310
    6.90121490e-310   6.90121495e-310   6.90121493e-310   6.90121491e-310
    6.90121490e-310   6.90121493e-310   6.90121495e-310   6.90121490e-310
    6.90121493e-310   6.90121490e-310   6.90121493e-310   6.90121493e-310
    6.90121489e-310   6.90121489e-310   6.90121492e-310   6.90121492e-310
    6.90121489e-310   6.90121493e-310   6.90121495e-310   6.90121493e-310
    6.90121491e-310   6.90121494e-310   6.90121493e-310   6.90121491e-310
    6.90121489e-310   6.90121493e-310   6.90121489e-310   6.90121492e-310
    6.90121489e-310   6.90121494e-310   6.90121490e-310   6.90121494e-310
    6.90121494e-310   6.90121494e-310   6.90121489e-310   6.90121495e-310
    6.90121495e-310   6.90121493e-310   6.90121494e-310   6.90121493e-310
    6.90121492e-310   6.90121489e-310   6.90121495e-310   6.90121493e-310
    6.90121492e-310   6.90121493e-310   6.90121493e-310   6.90121495e-310
    6.90121494e-310   6.90121494e-310   6.90121549e-310   6.90121489e-310
    6.90121492e-310   6.90121491e-310   6.90121491e-310   6.90121492e-310]]

In the same manner as the Audio tensor includes N as the batch_size, the same applies to words embeddings, you also need there [1,6,300] to make it work.

wordsw2v = pickle.load(open( "picklew2vbeforeNetVLAD.p", "rb" ))
word_feature_1 = wordsw2v[0]
wf2 = word_feature_1.reshape(1,6,300)
testix2 = torch.from_numpy(np.array(wf2))

The Resnet model was built and used on Keras 2.0.0 with imported weights from the initial paper, so there is no concern about model.eval()


"""ResNet152 model for Keras.

# Reference:

- [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)

Adaptation of code from flyyufelix, mvoelk, BigMoyan, fchollet

"""

Another thing I came across was the type of Tensor being used. When used

audio_data = audio_data.type(torch.FloatTensor)
audio_data = Variable(audio_data, requires_grad=False)

face_data = face_data.type(torch.FloatTensor)
face_data = Variable(face_data, requires_grad = False)

motion_data = motion_data.type(torch.FloatTensor)
motion_data = Variable(motion_data, requires_grad = False)

visual_data = visual_data.type(torch.FloatTensor)
visual_data = Variable(visual_data, requires_grad = False)

an error happened which asked for torch.cuda.FloatTensor , when I changed each type to torch.cuda.FloatTensor the script threw an error and asked for torch.FloatTensor. So in model.py I erased in 2 occasions the .cuda() method you had and everything worked with torch.FloatTensor using the CPU instead of the GPU, I guess there will be no problem in inference times.

After those tweaks I got my predictions at last, the model worked without errors. After the visual-text feature extraction and usage I am planning to fully implement your model with optical flow, face descriptors and audio. I hope everything goes well.
Thanks for your time

from mixture-of-embedding-experts.

antoine77340 commented on September 2, 2024

I am not familiar with Keras resnet pretrained model, but to reproduce the same feature, you might want to use the Imagenet pretrained resnet-152 from pytorch model_zoo instead (the one I used):
https://pytorch.org/docs/stable/torchvision/models.html#id3

Then do not forget model.eval()

from mixture-of-embedding-experts.

Trying to use ypreds = net (text, videos, ind) about mixture-of-embedding-experts HOT 3 CLOSED

Comments (3)

Related Issues (14)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs