Research Code for RGBDGaze

This is the research repository for RGBDGaze: Gaze Tracking on Smartphones with RGB and Depth Data, presented at ACM ICMI 2022.

It contains the training code and dataset link.

Environment

docker
docker-compose
nvidia-docker
nvidia-driver

How to use

1. Download dataset and pretrained RGB model

Dataset: https://www.dropbox.com/s/iixjrxzxx7nbupl/RGBDGaze_dataset.zip?dl=0
RGB part of the Spatial CNN model pretrained with GazeCapture dataset: https://www.dropbox.com/s/dgpn0j212l260q1/pretrained_rgb.pth?dl=0

2. Clone

$ git clone https://github.com/FIGLAB/RGBDGaze

3. Setup

$ cp .env{.example,}

In .env, you can set a path to your data directory.

4. Docker build & run

$ DOCKER_BUILDKIT=1 docker build -t rgbdgaze --ssh default .
$ docker-compose run --rm experiment

5. Run

Prepare following files in the docker container

/root/datadrive/RGBDGaze/dataset/RGBDGaze_dataset
/root/datadrive/RGBDGaze/models/SpatialWeightsCNN_gazecapture/pretrained_rgb.pth

Make tensors to be used for training

$ cd preprocess
$ python format.py

For training RGB+D model, run

$ python lopo.py --config ./config/rgbd.yml

For training RGB model, run

$ python lopo.py --config ./config/rgb.yml

Dataset description

Overview

The data is organized in the following manner:

45 participants (*1)
synchronized RGB + Depth images for different four context
- standing, sitting, walking, and lying
meta data
- corresponding gaze target on the screen
- detected face bounding box
- acceleration data
- device id
- intrinsic camera parameter of the device
*1: We used 50 participants data in the paper. However, five of them did not agree to be included in the public dataset.

Structure

The folder structure is organized like this:

RGBDGaze_dataset
│   README.txt
│   iphone_spec.csv   
│
└───P1
│   │   intrinsic.json
│   │
│   └───decoded
│       │   
│       └───standing
│       │       │   label.csv
│       │       │
│       │       └───rgb
│       │       │   1.jpg
│       │       │   2.jpg ...
│       │       │
│       │       └───depth
│       │       
│       └───sitting
│       └───walking
│       └───lying
│   
└───P2 ...

Reference

Download the paper here.

Riku Arakawa, Mayank Goel, Chris Harrison, Karan Ahuja. 2022. RGBDGaze: Gaze Tracking on Smartphones with RGB and Depth Data In Proceedings of the 2022 International Conference on Multimodal Interaction (ICMI '22). Association for Computing Machinery, New York, NY, USA.

@inproceedings{DBLP:conf/icmi/ArakawaG0A22,
  author    = {Riku Arakawa and
               Mayank Goel and
               Chris Harrison and
               Karan Ahuja},
  title     = {RGBDGaze: Gaze Tracking on Smartphones with {RGB} and Depth Data},
  booktitle = {International Conference on Multimodal Interaction, {ICMI} 2022, Bengaluru,
               India, November 7-11, 2022},
  pages     = {329--336},
  publisher = {{ACM}},
  year      = {2022},
  doi       = {10.1145/3536221.3556568},
  address   = {New York},
}

License

GPL v 2.0 License file present in repo. Please contact [email protected] if you would like another license for your use.

seanpresent / rgbdgaze Goto Github PK

rgbdgaze's Introduction

Research Code for RGBDGaze

Environment

How to use

1. Download dataset and pretrained RGB model

2. Clone

3. Setup

4. Docker build & run

5. Run

Dataset description

Overview

Structure

Reference

License

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs