As far as I kown, the state of the art on UCF101 is more than 98%, for example I3D. Al

The accuracy of SOTA on UCF101 is more than 98%, why DPC is worse? about dpc HOT 3 CLOSED

tengdahan commented on August 20, 2024

The accuracy of SOTA on UCF101 is more than 98%, why DPC is worse?

from dpc.

Comments (3)

TengdaHan commented on August 20, 2024

You could read some papers about 'self-supervised learning' on either images or videos, also the reference in our paper like OPN (Lee et al.), 3D-ST-Puzzle (Kim et al.).
These 98% and 88% results you mentioned are finetuned with a pretrained network: supervised pretrained on larger dataset like Kinetics (for I3D) or ImageNet (for two-stream), which requires expensive annotation.
Self-supervised learning doesn't require labels to learn the representation.

from dpc.

PGogo commented on August 20, 2024

You could read some papers about 'self-supervised learning' on either images or videos, also the reference in our paper like OPN (Lee et al.), 3D-ST-Puzzle (Kim et al.).
These 98% and 88% results you mentioned are finetuned with a pretrained network: supervised pretrained on larger dataset like Kinetics (for I3D) or ImageNet (for two-stream), which requires expensive annotation.
Self-supervised learning doesn't require labels to learn the representation.

Thanks for your prompt reply ! But I think that it may require labels in the downstream classification task, since it has to output the highest score to match the label. Is this part require labels as supervised learning? If yes, then it is the same as supervised learning, right ? Then the reduceing performance may only due to the pretrained model? I'm curious :)

from dpc.

TengdaHan commented on August 20, 2024

Evaluating the feature quality by finetuning on action classification task (requiring label) on smaller datasets is a conventional evaluation method for videos. Yes, the downstream task performance reflects the quality of the pretraining feature.
When comparing the self-supervised feature against the fully-supervised feature, the performance is not necessary 'reducing'. If you check the two-stream paper (Simonyan and Zisserman, 2014) Table 4, spatial stream on UCF101 result with ImageNet pretraining is 73%, but our self-supervised pretraining gets 76%. Self-supervised learning is very promising as you have unlimited data available from the internet.

from dpc.

Recommend Projects

The accuracy of SOTA on UCF101 is more than 98%, why DPC is worse? about dpc HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs