GithubHelp home page GithubHelp logo

Training/Validation Data Split about disco HOT 6 CLOSED

wangt-cn avatar wangt-cn commented on August 22, 2024 1
Training/Validation Data Split

from disco.

Comments (6)

Wangt-CN avatar Wangt-CN commented on August 22, 2024

Hi @Boese0601 , thanks for your nice comment.

For training, we use 000-334 of TikTok dataset; For testing, we find that there are potential risks of the person ID leakage for the TikTok dataset. Therefore, we choose to collect 10 short videos from both the 335-340 sequence and the Internet to make sure there are no person ID coincide for fair comparison.

from disco.

Boese0601 avatar Boese0601 commented on August 22, 2024

Thanks for your kind reply! That makes things clear. Btw could you please also upload those additional video sequences collected from the internet to Google Drive?

from disco.

Wangt-CN avatar Wangt-CN commented on August 22, 2024

Hi @Boese0601 , I have submitted the query to the corporation to open-source the additional TikTok-style data. Since it is collected by the corporation so we need to get the permission.
Currently, if you want to make a fair comparison, you could follow the penultimate line of Table 1 which does not use the additional data for training.

from disco.

notorious-eric avatar notorious-eric commented on August 22, 2024

Hi, I download the tsv file and found that there are additional data in the file. Therefore, in the penultimate line of Table 1, you do not use the tsv file you presented, just use 335-340 sequence for evaluation, is that correct?

from disco.

Wangt-CN avatar Wangt-CN commented on August 22, 2024

@notorious-eric Hi, do you mean the evaluation data? All the models are evaluated on the same data, i.e., 10 videos which is the combination of the original testing tiktok and additional data.

from disco.

Kelu007 avatar Kelu007 commented on August 22, 2024

Hi @Boese0601 , thanks for your nice comment.

For training, we use 000-334 of TikTok dataset; For testing, we find that there are potential risks of the person ID leakage for the TikTok dataset. Therefore, we choose to collect 10 short videos from both the 335-340 sequence and the Internet to make sure there are no person ID coincide for fair comparison.

What are the videos collected from the Internet for evaluation?

from disco.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.