Project for PKU 2024 Summer Course Large Models: From fundamental to frontier, Deep Explorer Team
For multimodal LLMs, we proposed a method combining image attack and text prompt attack, which increases attack successful rate and has good transferability.
For text-only LLMs, we proposed an automated attack procedure which automatically tries multiple text prompting attack methods.
Based on JailGuard, we implemented a similar method to detect adversarial image and text prompts.