GithubHelp home page GithubHelp logo

rgbdgaze's Introduction

Research Code for RGBDGaze

image

This is the research repository for RGBDGaze: Gaze Tracking on Smartphones with RGB and Depth Data, presented at ACM ICMI 2022.

It contains the training code and dataset link.

Environment

  • docker
  • docker-compose
  • nvidia-docker
  • nvidia-driver

How to use

1. Download dataset and pretrained RGB model

2. Clone

$ git clone https://github.com/FIGLAB/RGBDGaze

3. Setup

$ cp .env{.example,}

In .env, you can set a path to your data directory.

4. Docker build & run

$ DOCKER_BUILDKIT=1 docker build -t rgbdgaze --ssh default .
$ docker-compose run --rm experiment

5. Run

Prepare following files in the docker container

  • /root/datadrive/RGBDGaze/dataset/RGBDGaze_dataset
  • /root/datadrive/RGBDGaze/models/SpatialWeightsCNN_gazecapture/pretrained_rgb.pth

Make tensors to be used for training

$ cd preprocess
$ python format.py

For training RGB+D model, run

$ python lopo.py --config ./config/rgbd.yml

For training RGB model, run

$ python lopo.py --config ./config/rgb.yml

Dataset description

Overview

The data is organized in the following manner:

  • 45 participants (*1)

  • synchronized RGB + Depth images for different four context

    • standing, sitting, walking, and lying
  • meta data

    • corresponding gaze target on the screen
    • detected face bounding box
    • acceleration data
    • device id
    • intrinsic camera parameter of the device
  • *1: We used 50 participants data in the paper. However, five of them did not agree to be included in the public dataset.

Structure

The folder structure is organized like this:

RGBDGaze_dataset
│   README.txt
│   iphone_spec.csv   
│
└───P1
│   │   intrinsic.json
│   │
│   └───decoded
│       │   
│       └───standing
│       │       │   label.csv
│       │       │
│       │       └───rgb
│       │       │   1.jpg
│       │       │   2.jpg ...
│       │       │
│       │       └───depth
│       │       
│       └───sitting
│       └───walking
│       └───lying
│   
└───P2 ...

Reference

Download the paper here.

Riku Arakawa, Mayank Goel, Chris Harrison, Karan Ahuja. 2022. RGBDGaze: Gaze Tracking on Smartphones with RGB and Depth Data In Proceedings of the 2022 International Conference on Multimodal Interaction (ICMI '22). Association for Computing Machinery, New York, NY, USA.
@inproceedings{DBLP:conf/icmi/ArakawaG0A22,
  author    = {Riku Arakawa and
               Mayank Goel and
               Chris Harrison and
               Karan Ahuja},
  title     = {RGBDGaze: Gaze Tracking on Smartphones with {RGB} and Depth Data},
  booktitle = {International Conference on Multimodal Interaction, {ICMI} 2022, Bengaluru,
               India, November 7-11, 2022},
  pages     = {329--336},
  publisher = {{ACM}},
  year      = {2022},
  doi       = {10.1145/3536221.3556568},
  address   = {New York},
}

License

GPL v 2.0 License file present in repo. Please contact [email protected] if you would like another license for your use.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.