david-wb / gaze-estimation Goto Github PK

A deep learning based gaze estimation framework implemented with PyTorch

Jupyter Notebook 92.85% Python 7.07% Shell 0.08%

gaze-estimation eye-tracking deep-learning artificial-intelligence artificial-neural-networks computer-vision pytorch cnn machine-learning

gaze-estimation's People

Contributors

Stargazers

Watchers

gaze-estimation's Issues

Reason of np.fliplr usage

Hello David,

I hope everything is okay with you.
I discovered that np.fliplr was used for left eyes in run_with_webcam.py before running eyenet, despite the fact that it was not used during training; the images are given to the model without flipping. What is the explanation for this? I'd appreciate it if you could let me know.

Different results were obtained using onnx and pt

@david-wb I have converted model to ONNX format.The script is as follows

    parser = argparse.ArgumentParser()
    parser.add_argument('--weights', type=str, default='./weights/checkpoint.pt', help='weights path')  # from yolov5/models/
    parser.add_argument('--img-size', nargs='+', type=int, default=[96, 160], help='image size')  # height, width
    parser.add_argument('--batch-size', type=int, default=1, help='batch size')
    opt = parser.parse_args()
    opt.img_size *= 2 if len(opt.img_size) == 1 else 1  # expand

    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    checkpoint = torch.load(opt.weights, map_location=device)

    nstack = checkpoint['nstack']
    nfeatures = checkpoint['nfeatures']
    nlandmarks = checkpoint['nlandmarks']
    eyenet = EyeNet(nstack=nstack, nfeatures=nfeatures, nlandmarks=nlandmarks).to(device)
    
    eyenet.load_state_dict(checkpoint['model_state_dict'])
    # Input
    eyenet.eval()
    img = torch.zeros(1, *opt.img_size).to(device)
    _,landmarks, gaze = eyenet(img)
    f = opt.weights.replace('.pt', '.onnx')  # filename
    torch.onnx.export(eyenet, img, f, verbose=False, opset_version=12, input_names=['inputs'])

But the results of the new model are all wrong

Gaze Values

Could you tell me how we can use this gaze values to tell the person looking direction? when I compared the model gaze values and training data gaze it looks quite different. from the vector, gaze value has arrived but how we can use that tell the person looking direction?

warpAffine get a inverse point

Hi David,

I have a quick question about warpAffine. The full frame gets into the segment_eyes() method and there transform_mat and inv_transform_mat are calculated and cv2.warpAffine is called to get eye_image (160, 96). I have a question about getting the pupil center point (x,y) I predict in the eye_image back into the full frame coordinate.

If I understand this correct, when I do want to remap point back to full frame I will probably get a coordinates relative to 5_point facial landmarks which then can give me full frame coordinates for eye_center. How can I approach it to get pupil_center from eye_image (160, 96) coordinates back to full image coordinates? I have tried few approaches but nothing gives me relevant points.

1)  #eye_center_full_frame = cv2.transformPerspective(eye_center_point, eye.eye_sample.transform_inv[:2, :])                                                                                                    
2)  #eye_center_full_frame = cv2.transform(eye_center_point, eye.eye_sample.transform_original[:2, :], (2,1), cv2.WARP_INVERSE_MAP)

eye_center_array = np.array([[eye_center[0], eye_center[1]]], dtype=np.float32)
transformed_points = eye_center_array * eye.eye_sample.transform_inv[:2, :2]

Thanks

Edit:
I found a solution, not sure if it is inteded to be used like this. If I don't flip the image to show there is needed different change for eye.is_left.

for eye in [left_eye, right_eye]:
    ... 

     if eye.eye_sample.is_left:
        eye_center_array = np.array([[160 - eye_center[0], eye_center[1], 1.0]], dtype=np.float32)
    else:
        eye_center_array = np.array([[eye_center[0], eye_center[1], 1.0]], dtype=np.float32)
                    
     transformed_points = np.asarray(np.matmul(eye_center_array, eye.eye_sample.transform_inv.T))[:, :2]    
    center = np.array(transformed_points).flatten()
  
    # Draw dots on eye center in full frame             
     cv2.circle(orig_frame, (int(center[0]), int(center[1])) , 3, (0,255,0),-1)            
     cv2.imshow("Webcam", cv2.flip(orig_frame, 1))

Why using torch.no_grad() instead of eval() mode?

Hi David,

I had the same idea for hourglass pytorch model, trained on UnityEyes dataset, but you made it really good. I just have question about why are you not using .eval() mode in inference? When I tried eval() mode I get very different results, probably because of the BatchNorm statistics calculation.

How to get [Yaw, Pitch, Roll] or gaze vector [x, y, z]

Thank you very much for sharing your code.
Is it possible to get the gaze vector? I mean [Yaw, Pitch, Roll] or [x, y, z] values of gaze vector with respect to eyes.
Thank you very much.

Reference Paper

Hello Davis,

Would you please share the reference paper to this gaze-estimation work?
I have these references:

https://ait.ethz.ch/projects/2018/landmarks-gaze/downloads/park2018etra.pdf
https://www.cl.cam.ac.uk/research/rainbow/projects/unityeyes/
https://rahimentezari.github.io/GAN/gan-gaze.html
https://openaccess.thecvf.com/content_ICCV_2019/papers/He_Photo-Realistic_Monocular_Gaze_Redirection_Using_Generative_Adversarial_Networks_ICCV_2019_paper.pdf
https://openaccess.thecvf.com/content_cvpr_2018/papers/Wang_A_Hierarchical_Generative_CVPR_2018_paper.pdf
https://www.mdpi.com/1424-8220/20/17/4935/htm

But I need the exact reference to the GAN part.

I actually tested the first link which you mentioned is the base to this work several months ago and the performance was not satisfying and there was lag in the stream and the detection. I want to know exactly what improvement/development/modification made to this work which became this one.

Waiting for your response.

Thx,

david-wb / gaze-estimation Goto Github PK

gaze-estimation's People

Contributors

Stargazers

Watchers

Forkers

gaze-estimation's Issues

Reason of np.fliplr usage

Different results were obtained using onnx and pt

Gaze Values

warpAffine get a inverse point

Why using torch.no_grad() instead of eval() mode?

How to get [Yaw, Pitch, Roll] or gaze vector [x, y, z]

Reference Paper

cannot open url of pretrained model in fetch_models.sh

query regarding "run_with_webcam.py" file

Gaze-estimation for a set of images

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs