event-ahu / hardvs Goto Github PK

[AAAI-2024] HARDVS: Revisiting Human Activity Recognition with Dynamic Vision Sensors

License: Apache License 2.0

Shell 0.65% Python 99.18% Dockerfile 0.17%

deep-learning dynamic-vision-sensors event-camera human-action-recognition human-activity-recognition spatiotemporal-features transformer

hardvs's Introduction

HARDVS: Revisiting Human Activity Recognition with Dynamic Vision Sensors

Paper

Wang, Xiao and Wu, Zongzhen and Jiang, Bo and Bao, Zhimin and Zhu, Lin and Li, Guoqi and Wang, Yaowei and Tian, Yonghong. "HARDVS: Revisiting Human Activity Recognition with Dynamic Vision Sensors." arXiv preprint arXiv:2211.09648 (2022). [arXiv] [Demovideo] [Poster]

Abstract

The main streams of human activity recognition (HAR) algorithms are developed based on RGB cameras which are suffered from illumination, fast motion, privacy-preserving, and large energy consumption. Meanwhile, the biologically inspired event cameras attracted great interest due to their unique features, such as high dynamic range, dense temporal but sparse spatial resolution, low latency, low power, etc. As it is a newly arising sensor, even there is no realistic large-scale dataset for HAR. Considering its great practical value, in this paper, we propose a large-scale benchmark dataset to bridge this gap, termed HARDVS, which contains 300 categories and more than 100K event sequences. We evaluate and report the performance of multiple popular HAR algorithms, which provide extensive baselines for future works to compare. More importantly, we propose a novel spatial-temporal feature learning and fusion framework, termed ESTF, for event stream based human activity recognition. It first projects the event streams into spatial and temporal embeddings using StemNet, then, encodes and fuses the dual-view representations using Transformer networks. Finally, the dual features are concatenated and fed into a classification head for activity prediction. Extensive experiments on multiple datasets fully validated the effectiveness of our model.

News

🔥 [2023.12.09] Our paper is accepted by AAAI-2024 !!!
🔥 [2023.05.29] The class label (i.e., category name) is available at [HARDVS_300_class.txt]
🔥 [2022.12.14] HARDVS dataset is integrated into the SNN toolkit [SpikingJelly]

Demo Videos

A demo video for the HARDVS dataset can be found by clicking the image below:

Video Tutorial for this work can be found by clicking the image below:

Representative samples of HARDVS can be found below:

Dataset Download

Download from Baidu Disk:

  [Event Images] 链接：https://pan.baidu.com/s/1OhlhOBHY91W2SwE6oWjDwA?pwd=1234    提取码：1234
  [Compact Event file] 链接：https://pan.baidu.com/s/1iw214Aj5ugN-arhuxjmfOw?pwd=1234 提取码：1234
  [Raw Event file] To be updated

Download from DropBox:

  To be updated ...

Environment

conda create -n event python=3.8 pytorch=1.10 cudatoolkit=11.3 torchvision -c pytorch -y
conda activate event
pip3 install openmim
mim install mmcv-full
mim install mmdet  # optional
mim install mmpose  # optional
pip3 install -e .

Details of each package:

Our Proposed Approach

An overview of our proposed ESTF framework for event-based human action recognition. It transforms the event streams into spatial and temporal tokens and learns the dual features using multi-head self-attention layers. Further, a FusionFormer is proposed to realize message passing between the spatial and temporal features. The aggregated features are added with dual features as the input for subsequent TF and SF blocks, respectively. The outputs will be concatenated and fed into MLP layers for action prediction.

Train & Test & Evaluation

# train
  CUDA_VISIBLE_DEVICES=0 python tools/train.py configs/recognition/hardvs_ESTF/hardvs_ESTF.py --work-dir path_to_checkpoint --validate --seed 0 --deterministic --gpu-ids=0

# test
  CUDA_VISIBLE_DEVICES=0 python tools/test.py configs/recognition/hardvs_ESTF/hardvs_ESTF.py  path_to_checkpoint --eval top_k_accuracy

Citation

If you find this work useful for your research, please cite the following paper and give us a 🌟.

@article{wang2022hardvs,
  title={HARDVS: Revisiting Human Activity Recognition with Dynamic Vision Sensors},
  author={Wang, Xiao and Wu, Zongzhen and Jiang, Bo and Bao, Zhimin and Zhu, Lin and Li, Guoqi and Wang, Yaowei and Tian, Yonghong},
  journal={arXiv preprint arXiv:2211.09648},
  url={https://arxiv.org/abs/2211.09648}, 
  year={2022}
}

Acknowledgement and Other Useful Materials

MMAction2: https://github.com/open-mmlab/mmaction2
SpikingJelly: https://github.com/fangwei123456/spikingjelly

hardvs's People

Contributors

Stargazers

Watchers

Forkers

zeyuxiao1997 zfd-1

hardvs's Issues

Hello, how can I use two GPUs to run it?

How can I use two GPUs to run it?
Looking forward to your reply, thank you

环境问题 ‘cannot import name 'Config' from 'mmcv'

严格按照readme配置环境会报错：‘cannot import name 'Config' from 'mmcv'，查了一下发现是mmcv2.0.0以后就没有Config了，
换成<2.0.0以后发现mmaction有很多报错，按readme安装很多mmaction中的类和函数都没有，是否能提供更加完整和详细的环境

数据集问题

网盘中提供了list，MINIHARDVS_EVENT_files, rawframes,
数据集是用网盘中的rawframes吗，是否需要进一步的处理，直接用网盘的数据集无法正常运行：
FileNotFoundError: [Errno 2] No such file or directory: '.../wzz_300HarDvs/rawframes/action_130/dvSave-2021_09_28_19_29_23/00000000.png', 网盘中的数据集多了dvSave-2021_09_28_19_29_23_dvs这个路径，修改后：
FileNotFoundError: [Errno 2] No such file or directory: '.../wzz_300HarDvs/rawframes/action_130/dvSave-2021_09_28_19_29_23/00000001.png'，提供的数据集为'00000000.png', '00000005.png', '00000010.png'，修改后又有新的错误：
Assertion t >= 0 && t < n_classes failed.
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
是否能够提供数据集的使用说明

result error

{"mode": "train", "epoch": 50, "iter": 2300, "lr": 1e-05, "memory": 5808, "data_time": 1.81207, "top1_acc": 0.2528, "top5_acc": 0.2696, "loss_cls": 4.24143, "loss": 4.24143, "grad_norm": 1.43265, "time": 2.03297}
{"mode": "train", "epoch": 50, "iter": 2400, "lr": 1e-05, "memory": 5808, "data_time": 1.4463, "top1_acc": 0.254, "top5_acc": 0.2668, "loss_cls": 4.24071, "loss": 4.24071, "grad_norm": 1.53618, "time": 1.66408}
{"mode": "train", "epoch": 50, "iter": 2500, "lr": 1e-05, "memory": 5808, "data_time": 1.61799, "top1_acc": 0.2372, "top5_acc": 0.258, "loss_cls": 4.32051, "loss": 4.32051, "grad_norm": 1.59426, "time": 1.83132}
{"mode": "val", "epoch": 50, "iter": 430, "lr": 1e-05, "top1_acc": 0.67431, "top5_acc": 0.77064, "mean_class_accuracy": 0.71983}

I have successfully run your code, but the accuracy is very low. Is this correct?

Access HARDVS dataset from outside of China

I'm trying to download the HARDVS dataset which is hosted on pan.baidu. Unfortunately, it's very difficult to get access as I am located outside of China. Is there a mirror for the data on another cloud service (such as google drive)?

Some files lose in MINI_HARDVS_files.zip which from baidu Netdisk

there are some files in train_label.txt but not in MINI_HARDVS_files.zip, and we have not .npz files：
dvSave-2021_07_30_10_44_40
dvSave-2021_07_30_10_45_19
dvSave-2021_07_30_10_45_34
dvSave-2021_07_30_10_45_54
dvSave-2021_07_30_10_46_07
dvSave-2021_07_30_10_46_20
dvSave-2021_07_30_10_46_36
how to solve this problem, del this in train_label.txt? or give the corresponding .npz agian？

rgb image

请问event对应的rgb图像在哪里下载，我在数据集里面没有找到？

Access the illumination type of each sample in HARDVS

Hi,

I am interested in your research on event cameras. In particular, I noticed that the HARDVS dataset was captured in multiple lighting scenarios. However, I was not able to get the illumination information for the HARDVS dataset from the provided GitHub repository, e.g., what type of illumination (strong or low light) the samples were captured under.

I would greatly appreciate it if you could provide access to the illumination type of each sample or direct me to the appropriate location where I can find it.

Thank you for your time and consideration.

Best regards,
fisher.

开源数据集

请问有没有可以wget的链接直接获取数据集，220G百度网盘太慢了