GithubHelp home page GithubHelp logo

Comments (3)

TengdaHan avatar TengdaHan commented on August 20, 2024

You could read some papers about 'self-supervised learning' on either images or videos, also the reference in our paper like OPN (Lee et al.), 3D-ST-Puzzle (Kim et al.).
These 98% and 88% results you mentioned are finetuned with a pretrained network: supervised pretrained on larger dataset like Kinetics (for I3D) or ImageNet (for two-stream), which requires expensive annotation.
Self-supervised learning doesn't require labels to learn the representation.

from dpc.

PGogo avatar PGogo commented on August 20, 2024

You could read some papers about 'self-supervised learning' on either images or videos, also the reference in our paper like OPN (Lee et al.), 3D-ST-Puzzle (Kim et al.).
These 98% and 88% results you mentioned are finetuned with a pretrained network: supervised pretrained on larger dataset like Kinetics (for I3D) or ImageNet (for two-stream), which requires expensive annotation.
Self-supervised learning doesn't require labels to learn the representation.

Thanks for your prompt reply ! But I think that it may require labels in the downstream classification task, since it has to output the highest score to match the label. Is this part require labels as supervised learning? If yes, then it is the same as supervised learning, right ? Then the reduceing performance may only due to the pretrained model? I'm curious :)

from dpc.

TengdaHan avatar TengdaHan commented on August 20, 2024

Evaluating the feature quality by finetuning on action classification task (requiring label) on smaller datasets is a conventional evaluation method for videos. Yes, the downstream task performance reflects the quality of the pretraining feature.
When comparing the self-supervised feature against the fully-supervised feature, the performance is not necessary 'reducing'. If you check the two-stream paper (Simonyan and Zisserman, 2014) Table 4, spatial stream on UCF101 result with ImageNet pretraining is 73%, but our self-supervised pretraining gets 76%. Self-supervised learning is very promising as you have unlimited data available from the internet.

from dpc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.