GithubHelp home page GithubHelp logo

Comments (3)

antoine77340 avatar antoine77340 commented on September 2, 2024

Hi,
Try replacing:
audio_data = torch.from_numpy(np.empty([1,128*16]))
by
audio_data = torch.from_numpy(np.empty([1,1,128]))
Audio tensor is actually of size NxTxD, where N is the batch_size, T the temporal extent of each sample and D the dimension of the audio features. I am sorry if this was not clear !
Also please replace np.empty by np.zeros, I do not know how pytorch behave when converting empty np array to torch tensors.

Last thing, when you extract the resnet features, please make sure that the model is correctly on the evaluation mode by adding the line:
model.eval()
I happen to forget it once and it extracted the feature with batch norm activated (which you do not want actually).

Hope this helped you !

from mixture-of-embedding-experts.

estathop avatar estathop commented on September 2, 2024

Thanks for the immediate feedback,
You must be right with np.empty as when I checked what the numpy array includes, it has random values.

empt = np.empty([1,128])

print empt
[[  4.66275287e-310   4.66277221e-310   0.00000000e+000   0.00000000e+000
    6.90121489e-310   6.90121492e-310   6.90121492e-310   6.90121491e-310
    6.90121492e-310   6.90121491e-310   6.90121494e-310   6.90121489e-310
    6.90121490e-310   6.90121491e-310   6.90121495e-310   6.90121490e-310
    6.90121495e-310   6.90121490e-310   6.90172708e-310   6.90121494e-310
    6.90121495e-310   6.90180965e-310   6.90180965e-310   6.90121493e-310
    6.90121492e-310   6.90121491e-310   6.90121492e-310   6.90121490e-310
    6.90121490e-310   6.90121494e-310   6.90121490e-310   6.90121494e-310
    6.90121494e-310   6.90121492e-310   6.90121492e-310   6.90121492e-310
    6.90121493e-310   6.90121491e-310   6.90121494e-310   6.90121489e-310
    6.90121491e-310   6.90121494e-310   6.90121492e-310   6.90121494e-310
    6.90121489e-310   6.90121493e-310   6.90121492e-310   6.90121492e-310
    6.90121491e-310   6.90121549e-310   6.90121493e-310   6.90121491e-310
    6.90121494e-310   6.90121489e-310   6.90121493e-310   6.90121489e-310
    6.90121489e-310   6.90121492e-310   6.90121492e-310   6.90121492e-310
    6.90121490e-310   6.90121491e-310   6.90121495e-310   6.90121495e-310
    6.90121490e-310   6.90121491e-310   6.90121494e-310   6.90121494e-310
    6.90121494e-310   6.90121491e-310   6.90121495e-310   6.90121492e-310
    6.90121490e-310   6.90121495e-310   6.90121493e-310   6.90121491e-310
    6.90121490e-310   6.90121493e-310   6.90121495e-310   6.90121490e-310
    6.90121493e-310   6.90121490e-310   6.90121493e-310   6.90121493e-310
    6.90121489e-310   6.90121489e-310   6.90121492e-310   6.90121492e-310
    6.90121489e-310   6.90121493e-310   6.90121495e-310   6.90121493e-310
    6.90121491e-310   6.90121494e-310   6.90121493e-310   6.90121491e-310
    6.90121489e-310   6.90121493e-310   6.90121489e-310   6.90121492e-310
    6.90121489e-310   6.90121494e-310   6.90121490e-310   6.90121494e-310
    6.90121494e-310   6.90121494e-310   6.90121489e-310   6.90121495e-310
    6.90121495e-310   6.90121493e-310   6.90121494e-310   6.90121493e-310
    6.90121492e-310   6.90121489e-310   6.90121495e-310   6.90121493e-310
    6.90121492e-310   6.90121493e-310   6.90121493e-310   6.90121495e-310
    6.90121494e-310   6.90121494e-310   6.90121549e-310   6.90121489e-310
    6.90121492e-310   6.90121491e-310   6.90121491e-310   6.90121492e-310]]

In the same manner as the Audio tensor includes N as the batch_size, the same applies to words embeddings, you also need there [1,6,300] to make it work.

wordsw2v = pickle.load(open( "picklew2vbeforeNetVLAD.p", "rb" ))
word_feature_1 = wordsw2v[0]
wf2 = word_feature_1.reshape(1,6,300)
testix2 = torch.from_numpy(np.array(wf2))

The Resnet model was built and used on Keras 2.0.0 with imported weights from the initial paper, so there is no concern about model.eval()


"""ResNet152 model for Keras.

# Reference:

- [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)

Adaptation of code from flyyufelix, mvoelk, BigMoyan, fchollet

"""

Another thing I came across was the type of Tensor being used. When used

audio_data = audio_data.type(torch.FloatTensor)
audio_data = Variable(audio_data, requires_grad=False)

face_data = face_data.type(torch.FloatTensor)
face_data = Variable(face_data, requires_grad = False)

motion_data = motion_data.type(torch.FloatTensor)
motion_data = Variable(motion_data, requires_grad = False)

visual_data = visual_data.type(torch.FloatTensor)
visual_data = Variable(visual_data, requires_grad = False)

an error happened which asked for torch.cuda.FloatTensor , when I changed each type to torch.cuda.FloatTensor the script threw an error and asked for torch.FloatTensor. So in model.py I erased in 2 occasions the .cuda() method you had and everything worked with torch.FloatTensor using the CPU instead of the GPU, I guess there will be no problem in inference times.

After those tweaks I got my predictions at last, the model worked without errors. After the visual-text feature extraction and usage I am planning to fully implement your model with optical flow, face descriptors and audio. I hope everything goes well.
Thanks for your time

from mixture-of-embedding-experts.

antoine77340 avatar antoine77340 commented on September 2, 2024

I am not familiar with Keras resnet pretrained model, but to reproduce the same feature, you might want to use the Imagenet pretrained resnet-152 from pytorch model_zoo instead (the one I used):
https://pytorch.org/docs/stable/torchvision/models.html#id3

Then do not forget model.eval()

from mixture-of-embedding-experts.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.