GithubHelp home page GithubHelp logo

pku-alignment / safe-sora Goto Github PK

View Code? Open in Web Editor NEW
23.0 3.0 4.0 2.51 MB

SafeSora is a human preference dataset designed to support safety alignment research in the text-to-video generation field, aiming to enhance the helpfulness and harmlessness of Large Vision Models (LVMs).

Home Page: https://sites.google.com/view/safe-sora

Makefile 2.10% Python 97.90%
alignment human-preferences large-vision-models text-to-video-generation

safe-sora's Introduction

SafeSora Logo

Towards Safety Alignment of Text2Video Generation

Code License Data License

[๐Ÿ  Project Homepage] [๐Ÿ“• Paper] [๐Ÿค— SafeSora Datasets] [๐Ÿค— SafeSora Label] [๐Ÿค— SafeSora Evaluation] [BibTeX]

SafeSora is a human preference dataset designed to support safety alignment research in the text-to-video generation field, aiming to enhance the helpfulness and harmlessness of Large Vision Models (LVMs). It currently contains three types of data:

  • A classification dataset of 57k+ Text-Video pairs, including multi-label classification of 12 harm labels for their text prompts and text-video pairs.
  • A human preference dataset of 51k+ instances in the text-to-video generation task, containing comparative relationships in terms of helpfulness and harmlessness, as well as four sub-dimensions of helpfulness.
  • An evaluation dataset containing 600 human-written prompts, with 300 being safety-neutral and another 300 constructed according to 12 harm categories as red-team prompts.

In the future, we will also open-source some baseline alignment algorithms that utilize these datasets.

Table of Contents

Dataset Release

Multi-label Classification Dataset

The multi-label classification dataset contains 57k+ text-video pairs, each labeled with 12 harm tags. We perform multi-label classification on individual prompts as well as the combination of prompts and the videos generated from those prompts. These 12 harm tags are defined as:

  • S1: Adult, Explicit Sexual Content
  • S2: Animal Abuse
  • S3: Child Abuse
  • S4: Crime
  • S5: Debated Sensitive Social Issue
  • S6: Drug, Weapons, Substance Abuse
  • S7: Insulting, Hateful, Aggressive Behavior
  • S8: Violence, Injury, Gory Content
  • S9: Racial Discrimination
  • S10: Other Discrimination (Excluding Racial)
  • S11: Terrorism, Organized Crime
  • S12: Other Harmful Content

The distribution of these 14 categories is shown below:

Data Ratio

In our dataset, nearly half of the prompts are safety-critical, while the remaining half are safety-neutral. Our prompts partly come from real online users, while the remaining portion is supplemented by researchers for balancing purposes.

For more information, please refer to Hugging Face Page: PKU-Alignment/SafeSora-Label.

Human Preference Dataset

The human preference dataset contains over 51,000 comparisons, each data point comprising a user input and two generated videos. Through the following heuristic-based annotation process, human preferences were obtained in terms of helpfulness or harmlessness dimensions.

Additionally, due to a pre-annotation process, human preferences on four helpfulness sub-dimensions were also included. These sub-dimensions are:

  • Instruction Following
  • Correctness
  • Informativeness
  • Aesthetics

The specific annotation process is as shown in the figure below:

Annotation Process

For more information, please refer to Hugging Face Page: PKU-Alignment/SafeSora.

Evaluation Dataset

The evaluation dataset contains 600 human-written prompts, including 300 safety-neutral prompts and 300 red-teaming prompts. The 300 red-teaming prompts are constructed based on 12 harmful categories. These prompts will not appear in the training set and are reserved for researchers to generate videos for model evaluation.

For more information, please refer to Hugging Face Page: PKU-Alignment/SafeSora-Eval.

Data Access

The dataset is available on the Hugging Face Datasets Hub. A recommended way to download is using huggingface cli:

# Multi-label Classification Dataset: SafeSora-Label
huggingface-cli download --repo-type dataset --local-dir-use-symlinks False --resume-download PKU-Alignment/SafeSora-Label --local-dir ./SafeSora-Label

# Human Preference Dataset: SafeSora
huggingface-cli download --repo-type dataset --local-dir-use-symlinks False --resume-download PKU-Alignment/SafeSora --local-dir ./SafeSora

# Evaluation Dataset: SafeSora-Eval
huggingface-cli download --repo-type dataset --local-dir-use-symlinks False --resume-download PKU-Alignment/SafeSora-Eval --local-dir ./SafeSora-Eval

The downloaded data mainly consists of two parts: config-train.json.gz and config-test.json.gz are the data configurations, and videos.tar.gz is the compressed package of videos. Please unzip the package before use.

tar -xzvf video.tar.gz

Each data point in the dataset includes a user prompt, the potential harmful category of the user prompt, a generated video, and the annotation results of the harmful category for the Text-Video pair. In the config, the video will include a video_path pointing to its relative location in the videos folder. This relative location follows a fixed rule: videos/prompt_id/video_id.

Note: The videos.tar.gz file in the SafeSora-Label and SafeSora preference datasets is the same, so if you have previously downloaded videos.tar.gz, you can use the same video folder and only need to download the config files separately.

We also provide a script to quickly return a Torch Dataset class:

from safe_sora.datasets import VideoDataset, PairDataset, PromptDataset

# Multi-label Classification Dataset
label_data = VideoDataset.load("path/to/config", video_dir="path/to/video_dir")

# Human Preference Dataset
pref_data = PairDataset.load("path/to/config", video_dir="path/to/video_dir")

# Evaluation Dataset
eval_data = PromptDataset.load("path/to/config", video_dir="path/to/video_dir")

Citation

If you find the SafeSora dataset family useful in your research, please cite the following paper:

@misc{dai2024safesora,
      title={SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset},
      author={Josef Dai and Tianle Chen and Xuyao Wang and Ziran Yang and Taiye Chen and Jiaming Ji and Yaodong Yang},
      year={2024},
      eprint={2406.14477},
      archivePrefix={arXiv},
      primaryClass={id='cs.CV' full_name='Computer Vision and Pattern Recognition' is_active=True alt_name=None in_archive='cs' is_general=False description='Covers image processing, computer vision, pattern recognition, and scene understanding. Roughly includes material in ACM Subject Classes I.2.10, I.4, and I.5.'}
}

License

SafeSora dataset and its family are released under the CC BY-NC 4.0 License. The code is released under Apache License 2.0.

safe-sora's People

Contributors

calico-1226 avatar

Stargazers

 avatar Xiaodong Wang avatar Ziang Wu avatar Xin Gao avatar ZHU-Zhiyu avatar rain avatar ๆž—ๆฅทๅ‚‘ avatar lizhongguo avatar  avatar Anwesha Chowdhury avatar Jianshu Guo avatar Bill Wang avatar Borong Zhang avatar  avatar TianyiQ avatar Donghai Hong avatar Boyuan Chen avatar Jiayi Zhou avatar Jiaming Ji avatar  avatar ziranyang avatar Xuyao Wang avatar  avatar

Watchers

 avatar Xuehai Pan avatar  avatar

safe-sora's Issues

[Feature Request] Add traditional methods for comparison

Required prerequisites

Motivation

While some traditional methods may not precisely capture human value preferences, they are usually faster and cheaper to compute. Therefore, we will integrate some traditional metrics into the SafeSora library, expecting they will complement feedback-based methods and enable comparative analysis.

Checklist

  • I have provided all relevant and necessary information above.
  • I have chosen a suitable title for this issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.