Inaudible Adversarial Perturbation: Manipulating the Recognition of User Speech in Real Time

This is the core implementation for (VRifle) "Inaudible Adversarial Perturbation: Manipulating the Recognition of User Speech in Real Time", in Proceedings of Network and Distributed System Security 2024 Symposium (NDSS 2024).

We would like to thank the author of deepspeech2-pytorch-adversarial-attack for providing an excellent foundation for our code, which targets the DeepSpeech2 model.

We also extend our gratitude to the contributors of deepspeech.pytorch for developing an easy-to-use DeepSpeech framework.

Citation

If you think this repo helps you, please consider cite in the following format.

@inproceedings{li2024vrifle,
  title={Inaudible Adversarial Perturbation: Manipulating the Recognition of User Speech in Real Time},
  author={Li, Xinfeng and Yan, Chen and Lu, Xuancun and Zeng, Zihan and Ji, Xiaoyu and Xu, Wenyuan},
  booktitle={NDSS},
  year={2024}
}

Get Start

Several dependencies required to be installed first. Please follow the instruction in DeepSpeech 2 PyTorch to build up the environments.
It is recommended to setup your folders of DeepSpeech 2 PyTorch in the following structure.

ROOT_FOLDER/
├── this_repo/
│   ├──main_vrifle.py
│   └──...
├──deepspeech.pytorch/
│   ├──models/
│   │   └──librispeech/
│   │       └──librispeech_pretrained_v2.pth
│   └──...

Then, you should download the DeepSpeech pretrained model from this link provided by the DeepSpeech 2 PyTorch

Introduction

Deep Speech 2^[1] is a state-of-the-art Automatic Speech Recognition (ASR) system, notable for its end-to-end training capability where spectrograms are directly utilized to generate predicted sentences.

In this work, we implement the first trial of completely inaudible (ultrasonic) adversarial perturbation attacks against this ASR system. In this way, the classical PGD (Projected Gradient Descent) algorithm can also render an efficient optimization.

[1] Amodei, D., Ananthanarayanan, S., Anubhai, R., Bai, J., Battenberg, E., Case, C., ... & Zhu, Z. (2016, June). Deep speech 2: End-to-end speech recognition in english and mandarin. In International conference on machine learning (pp. 173-182).

Preparation

Download the Fluent Speech Command Dataset
If you want to speed up the optimization on 3090 GPU. Turn to Support DeepSpeech on 3090 GPUs (NVIDIA)

Usage

It is easy to perturb the original raw wave file to generate desired sentence with main_vrifle.py.

python main_vrifle.py --attack_type Mute_robust --device 0

python main_vrifle.py --attack_type Universal_robust --device 0

Actually, several parameters are available to make your adversarial attack better. You may tune hypyerparameters such as epsilon, alpha, and PGD_iter to adjusted for better results. For the details, please refer to main_vrifle.py and vrifle_attack.py.

Support DeepSpeech on 3090 GPUs (NVIDIA)

Through our numerous attempts and extensive research, we have established the following setup details :)

Install Deepspeech.pytorch

Download deepspeech.pytorch
cd into the folder and then pip install -r requirements.txt
pip install -e . # Dev install
pip install adversarial-robustness-toolbox[pytorch]
pip install torchaudio
git clone https://github.com/SeanNaren/warp-ctc.git
You should replace the #include <THC/THC.h>extern THCState* state, which refers to https://blog.csdn.net/weixin_41868417/article/details/123819183修改binding.cpp`

6. Install Warp-CTC

edit the CMakeLists.txt

# Before replacement
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_30,code=sm_30 -O2")
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_35,code=sm_35")

set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_50,code=sm_50")
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_52,code=sm_52")

# After
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_86,code=sm_86")

Compilation

cd warp-ctc
mkdir build
cd build
cmake ..
make
cd ../pytorch_binding

Modifying binding.cpp

## replace

#include <THC/THC.h>
extern THCState* state; 
void* gpu_workspace = THCudaMalloc(state, gpu_size_bytes);

## into
void* gpu_workspace = c10::cuda::CUDACachingAllocator::raw_alloc(gpu_size_bytes);


## replace
THCudaFree(state, (void *) gpu_workspace);
## into
c10::cuda::CUDACachingAllocator::raw_delete((void *) gpu_workspace);

the last step

python setup.py install

You should notice that the --recursive is required for a workable CTCdecode dependency

git clone --recursive [email protected]:parlance/ctcdecode.git

letterligo / inaudible-adversarial-perturbation-vrifle Goto Github PK

inaudible-adversarial-perturbation-vrifle's Introduction

Inaudible Adversarial Perturbation: Manipulating the Recognition of User Speech in Real Time

Citation

Get Start

Introduction

Preparation

Usage

Support DeepSpeech on 3090 GPUs (NVIDIA)

Install Deepspeech.pytorch

inaudible-adversarial-perturbation-vrifle's People

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs