Awesome-LM-SSP

Introduction

The resources related to the trustworthiness of large models (LMs) across multiple dimensions (e.g., safety, security, and privacy), with a special focus on multi-modal LMs (e.g., vision-language models and diffusion models).

This repo is in progress 🌱 (currently manually collected).
Badges:
- Model:
- Comment:
- Venue (Continuous update): or
🌻 Welcome to recommend resources to us via Issues with the following format (please fill in this table):

Title	Link	Code	Venue	Classification	Model	Comment
aa	arxiv	github	bb'23	A1. Jailbreak	LLM	Agent

News

[2024.04.27] We adjusted the categories.
[2024.01.20] We collected 3 related papers from NDSS'24!
[2024.01.17] We collected 108 related papers from ICLR'24!
[2024.01.09] 🚀 LM-SSP is released!

Collections

Book (1)
Competition (5)
Leaderboard (3)
Toolkit (6)
Survey (23)
Paper (710)
- A. Safety (435)
  - A0. General (6)
  - A1. Jailbreak (134)
  - A2. Alignment (47)
  - A3. Deepfake (36)
  - A4. Ethics (5)
  - A5. Fairness (49)
  - A6. Hallucination (97)
  - A7. Prompt Injection (13)
  - A8. Toxicity (48)
- B. Security (115)
  - B0. General (1)
  - B1. Adversarial Examples (61)
  - B2. Poison & Backdoor (46)
  - B3. System (7)
- C. Privacy (160)
  - C0. General (14)
  - C1. Contamination (8)
  - C2. Copyright (52)
  - C3. Data Reconstruction (18)
  - C4. Membership Inference Attacks (9)
  - C5. Model Extraction (7)
  - C6. Privacy-Preserving Computation (22)
  - C7. Unlearning (30)

Star History

Acknowledgement

Organizers: Tianshuo Cong (丛天硕), Xinlei He (何新磊), Zhengyu Zhao (赵正宇), Yugeng Liu (刘禹更), Delong Ran (冉德龙)
This project is inspired by LLM Security, Awesome LLM Security, LLM Security & Privacy, UR2-LLMs, PLMpapers, EvaluationPapers4ChatGPT

Title	Link	Code	Venue	Classification	Model	Comment
Towards More Effective Protection Against Diffusion-Based Mimicry with Score Distillation	https://arxiv.org/abs/2311.12832	https://github.com/xavihart/Diff-Protect	ICLR 2024	C2. Copyright	Diffusion Model	protective perturbation of diffusion model
Diffusion-Based Adversarial Sample Generation for Improved Stealthiness and Controllability	https://arxiv.org/abs/2305.16494	https://github.com/xavihart/Diff-PGD	NeurIPS 2023	B1. Adversarial Samples	Diffusion Model	generate stealthy adversarial samples

Title	Link	Code	Venue	Classification	Model
Query-Relevant Images Jailbreak Large Multi-Modal Models	https://arxiv.org/abs/2311.17600	https://github.com/isXinLiu/MM-SafetyBench	arXiv'23	A1. Jailbreak	VLM
GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models	https://arxiv.org/abs/2402.03299		arXiv'24	A1. Jailbreak	LLM
On the Robustness of Large Multimodal Models Against Image Adversarial Attacks	https://arxiv.org/abs/2312.03777		arXiv'23	B1. Adversarial Examples	VLM
VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models	https://arxiv.org/abs/2402.13851		arXiv'24	B2. Poisoning	VLM

thuccslab / awesome-lm-ssp Goto Github PK

awesome-lm-ssp's Introduction

Awesome-LM-SSP

Introduction

News

Collections

Star History

Acknowledgement

awesome-lm-ssp's People

Contributors

Stargazers

Watchers

Forkers

awesome-lm-ssp's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs