GithubHelp home page GithubHelp logo

youwikihow's Introduction

YouwikiHow

The repo contains the YouwikiHow dataset and original scripts to build this dataset.

Download the Processed Datasets

Dataset layout

YouwikiHow/
|   +-- annotations/
|       +-- wikihow_data.pkl
|       +-- crosstask_test.pkl
|   +-- features/
|       +-- train.csv
|       +-- test.csv
|       +-- train_s3d_features_from_ht100m
|       +-- test_s3d_features_fr16_sz256_nf16

Training Set

The training set wikihow_data.pkl is a dictionary, key is the wikiHow ID, and value are:

  • task_full_name: wikiHow task full name.
  • task_text: wikihow articles.
    • simplified_headline_list: List of all high-level summary sentences.
    • simplified_article_list: list of all low-level articles.
    • head2sent: The sentence IDs mapping from high-level (key) sentences to low-level (value) sentences.
    • sent2head: The sentence IDs mapping from low-level (key) sentences to high-level (value) sentences.
  • task_video: The list of all YouTube videos correspoinding to this wikiHow task.
  • video_duration: The total duration of all YouTube videos.

Test Set

The test set crosstask_test.pkl is same format as the training set, with two more keys:

  • step2headline: The mannually mapping between the CrossTask steps (in original datasets) and the wikiHow articles (i.e., headlines).
  • crosstask_annotations: The ground-truth for evaluation. Keys are the video ids, and Values are all possible ground-truth annotations propagated from CrossTask.

Download Visual Features

OneDrive

  • train_s3d_features_from_ht100m
  • test_s3d_features_fr16_sz256_nf16

Build from Scratch (Original Datasets)

Download Original Datasets

  • wikiHow: Replace the path in dataloader/wikiHow_text.py with corresponding wikiHow dataset path.
  • CrossTask: Replace the path in dataloader/CrossTask.py with corresponding CrossTask dataset path.
  • HowTo100M: Replace the path in datasetloader/HowTo100M.py with corrsponding HowTo100M dataset path.

YouwikiHow Preprocessing

  1. Use manually rules to filter HowTo100M tasks for the training set.
python task_selection.py
  1. Save the manually mapped crosstask annotations as the training set
python test_set_crosstask_reader.py
  1. Generate S3D features using S3D_Feature_Extractors with csv files in features/train.csv and features/test.csv.

Citations

@inproceedings{chen2022weakly,
  title={Weakly-Supervised Temporal Article Grounding},
  author={Chen, Long and Niu, Yulei and Chen, Brian and Lin, Xudong and Han, Guangxing and Thomas, Christopher and Ayyubi, Hammad and Ji, Heng and Chang, Shih-Fu},
  booktitle={Empirical Methods in Natural Language Processing (EMNLP), 2022},
  year={2022}
}

youwikihow's People

Contributors

zjuchenlong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

wenjiajia123

youwikihow's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.