Hi, I am planning to do research based on your model. I found that in the paper that you cited (Look, listen and
learn), there are 34 classes in Kinetics-Sound. Among these classes, 32 classes are used in your research. Could you provide the category list? Many thanks for considering my request.
Hello, I am planning to do research based on this model. Could you release the entire code..? Would it be available within this month? Thank you in advance.
Hi Pritam, thank you very much for your amazing work. I have some questions about the dataset you used in this work. The pretrained dataset : K400, AudioSet and Kinetics-Sound, do you always use both audio and visual information, and do they always contain audio stream? Because I am trying k400, but I found some videos miss audio stream. In addition, the downstream dataset like UCF-101 and HMDB-51, do you use both audio and visual pairs , or just use visual information for evaluation? It seems that videos files in UCF-101 do not always contain the audio stream. Thank you very much.