zhegan27 / scn_for_video_captioning Goto Github PK

View Code? Open in Web Editor NEW

98.0 98.0 47.0 25 KB

Using Semantic Compositional Networks for Video Captioning

Python 100.00%

scn_for_video_captioning's People

Contributors

Stargazers

Watchers

scn_for_video_captioning's Issues

About process raw feature

hi zhegan
When I extracted the C3D feature,It is a binary file (.pool5) How can I turn it into a mat file to use in the experiment. Can you give me some advice?

thank you for your patience！

whatis the x[5] means? how to create it?

x = cPickle.load(open("./data/youtube2text/corpus.p", "rb"))
train, val, test = x[0], x[1], x[2]
wordtoix, ixtoword = x[3], x[4]
W = x[5]

what's the order of the tag features storage?

How could I feed in a video for captioning?

Hi, it walkthrough all steps, it is a great project!

How could I feed in a video for captioning? Thanks.

How to extract captioning features_

Thank you for providing the code for the paper . I want to apply this technique to a new data set . So I need to know how captioning features were extracted.
Any help is appreciated

Regards
Kamal

can you tell me that your test datasets about "img_feats[1300:]" ?????

which 670 test datasets would you choose??? or from shuffle?

ZeroDivisionError: float division by zero

Traceback (most recent call last):
File "SCN_training.py", line 244, in
n_words=n_words)
File "SCN_training.py", line 162, in train_model
valid_negll = calu_negll(f_cost, prepare_data, valid, img_feats, tag_feats, kf_valid)
File "SCN_training.py", line 53, in calu_negll
return totalcost/totallen
ZeroDivisionError: float division by zero

overflow

D:\Anaconda2\envs\Theano\python.exe "D:/PythonCode1/Semantic Compositional Networks/SCN_for_video_captioning-master/SCN_decode.py"
loading data...
loading learned params...
count how many captions we have generated...
start decoding @ 18:10:53.925000
D:/PythonCode1/Semantic Compositional Networks/SCN_for_video_captioning-master/SCN_decode.py:39: RuntimeWarning: overflow encountered in exp
return 1/(1+np.exp(-x))
Thank you!

About corpus.p file

hi:
I want to know what file are the corpus.p and youtube2text_nbest.p? how can i get them when i want to caption my own raw video.
thanks!!!

TypeError

Traceback (most recent call last):
File "SCN_training.py", line 243, in
n_words=n_words)
File "SCN_training.py", line 109, in train_model
f_grad_shared, f_update = Adam(tparams, cost, [x, mask, y, z], lr)
File "/home/sk/SCN/model_scn_v2/optimizers.py", line 216, in Adam
grads = tensor.grad(cost, tparams.values())
File "/home/anaconda3/lib/python3.5/site-packages/theano/gradient.py", line 502, in grad
" of type " + str(type(elem)))
TypeError: Expected Variable, got odict_values

OSError: [Errno 2] No such file or directory

Traceback (most recent call last):
File "SCN_evaluation.py", line 52, in
print score(refs, hypo)
File "SCN_evaluation.py", line 25, in score
(Meteor(),"METEOR"),
File "/root/SCN_for_video_captioning/pycocoevalcap/meteor/meteor.py", line 24, in init
stderr=subprocess.PIPE)
File "/usr/lib/python2.7/subprocess.py", line 710, in init
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

Feature dimension and missing corpus files

Hi,

I tried to run the first step (Run 1_obtain_tags_youtube2text.py to obtain the ground-truth 300 tags for the Youtube2Text dataset.) to obtain the tag file but there is no file "./data/corpus_youtube2text.p". Can you upload it? And how this file is format?

For Youtube2Text dataset, the dimension for C3D, ResNet, tag are (1970, 512), (1970, 2048), (1970, 300). What is each row in "1970" in this case? Is it just one frame or 1 video?

Also, can you give a more detailed guide about how to train on other datasets?
Thanks!

how to extract the video features

How do you extract the video features from the Raw video data on the youtube dataset?
please help me, thank you very mach!

problems running SCN_training.py

ImportError: No module named version
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Traceback (most recent call last):
File "SCN_training.py", line 243, in
n_words=n_words)
File "SCN_training.py", line 137, in train_model
cost = f_grad_shared(x, mask,y,z)
File "/home/koeldc23/anaconda2/envs/theano/lib/python2.7/site-packages/theano/compile/function_module.py", line 917, in call
storage_map=getattr(self.fn, 'storage_map', None))
File "/home/koeldc23/anaconda2/envs/theano/lib/python2.7/site-packages/theano/gof/link.py", line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/home/koeldc23/anaconda2/envs/theano/lib/python2.7/site-packages/theano/compile/function_module.py", line 903, in call
self.fn() if output_subset is None else
File "pygpu/gpuarray.pyx", line 693, in pygpu.gpuarray.pygpu_empty
File "pygpu/gpuarray.pyx", line 301, in pygpu.gpuarray.array_empty
pygpu.gpuarray.GpuArrayException: cuMemAlloc: CUDA_ERROR_OUT_OF_MEMORY: out of memory
Apply node that caused the error: GpuSoftmaxWithBias(GpuDot22.0, bhid)
Toposort index: 449
Inputs types: [GpuArrayType(float32, matrix), GpuArrayType(float32, vector)]
Inputs shapes: [(2944, 12594), (12594,)]
Inputs strides: [(50376, 4), (4,)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[GpuAdvancedSubtensor(GpuSoftmaxWithBias.0, ARange{dtype='int64'}.0, HostFromGpu(gpuarray).0), HostFromGpu(gpuarray)(GpuSoftmaxWithBias.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
File "SCN_training.py", line 243, in
n_words=n_words)
File "SCN_training.py", line 104, in train_model
(use_noise, x, mask, y, z, cost) = build_model(tparams,options)
File "/home/koeldc23/SCN_for_video_captioning/model_scn_v2/video_cap.py", line 91, in build_model
pred = tensor.nnet.softmax(pred_x)

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

How to predict semantic attributes

how did you predict semantic attributes? could you release the code of predict semantic attributes?

c3d feature vector of size 512

Hello,

I'm trying to replicate what you've done in the article by extracting c3d and resnet feature vectors and then running the SCN_decode.py. In the article it says c3d vectors should be of size 4096 since we are extracting them from layer fc7. However, I've noticed that the shape of your dataset (the c3d.mat files provided) is equal to 512. Which step am I missing?

Thanks in advance.

Feature Extraction

Hi, if I want to apply the model for other videos, how can I extract the features?

zhegan27 / scn_for_video_captioning Goto Github PK

scn_for_video_captioning's People

Contributors

Stargazers

Watchers

Forkers

scn_for_video_captioning's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs