GithubHelp home page GithubHelp logo

david-wb / gaze-estimation Goto Github PK

View Code? Open in Web Editor NEW
141.0 6.0 31.0 2.33 MB

A deep learning based gaze estimation framework implemented with PyTorch

Jupyter Notebook 92.85% Python 7.07% Shell 0.08%
gaze-estimation eye-tracking deep-learning artificial-intelligence artificial-neural-networks computer-vision pytorch cnn machine-learning

gaze-estimation's People

Contributors

david-wb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

gaze-estimation's Issues

Reason of np.fliplr usage

Hello David,

I hope everything is okay with you.
I discovered that np.fliplr was used for left eyes in run_with_webcam.py before running eyenet, despite the fact that it was not used during training; the images are given to the model without flipping. What is the explanation for this? I'd appreciate it if you could let me know.

Different results were obtained using onnx and pt

@david-wb I have converted model to ONNX format.The script is as follows

    parser = argparse.ArgumentParser()
    parser.add_argument('--weights', type=str, default='./weights/checkpoint.pt', help='weights path')  # from yolov5/models/
    parser.add_argument('--img-size', nargs='+', type=int, default=[96, 160], help='image size')  # height, width
    parser.add_argument('--batch-size', type=int, default=1, help='batch size')
    opt = parser.parse_args()
    opt.img_size *= 2 if len(opt.img_size) == 1 else 1  # expand

    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    checkpoint = torch.load(opt.weights, map_location=device)

    nstack = checkpoint['nstack']
    nfeatures = checkpoint['nfeatures']
    nlandmarks = checkpoint['nlandmarks']
    eyenet = EyeNet(nstack=nstack, nfeatures=nfeatures, nlandmarks=nlandmarks).to(device)
    
    eyenet.load_state_dict(checkpoint['model_state_dict'])
    # Input
    eyenet.eval()
    img = torch.zeros(1, *opt.img_size).to(device)
    _,landmarks, gaze = eyenet(img)
    f = opt.weights.replace('.pt', '.onnx')  # filename
    torch.onnx.export(eyenet, img, f, verbose=False, opset_version=12, input_names=['inputs'])

But the results of the new model are all wrong

Gaze Values

Could you tell me how we can use this gaze values to tell the person looking direction? when I compared the model gaze values and training data gaze it looks quite different. from the vector, gaze value has arrived but how we can use that tell the person looking direction?

warpAffine get a inverse point

Hi David,

I have a quick question about warpAffine. The full frame gets into the segment_eyes() method and there transform_mat and inv_transform_mat are calculated and cv2.warpAffine is called to get eye_image (160, 96). I have a question about getting the pupil center point (x,y) I predict in the eye_image back into the full frame coordinate.

If I understand this correct, when I do want to remap point back to full frame I will probably get a coordinates relative to 5_point facial landmarks which then can give me full frame coordinates for eye_center. How can I approach it to get pupil_center from eye_image (160, 96) coordinates back to full image coordinates? I have tried few approaches but nothing gives me relevant points.

1)  #eye_center_full_frame = cv2.transformPerspective(eye_center_point, eye.eye_sample.transform_inv[:2, :])                                                                                                    
2)  #eye_center_full_frame = cv2.transform(eye_center_point, eye.eye_sample.transform_original[:2, :], (2,1), cv2.WARP_INVERSE_MAP)

or

eye_center_array = np.array([[eye_center[0], eye_center[1]]], dtype=np.float32)
transformed_points = eye_center_array * eye.eye_sample.transform_inv[:2, :2]

Thanks

Edit:
I found a solution, not sure if it is inteded to be used like this. If I don't flip the image to show there is needed different change for eye.is_left.

for eye in [left_eye, right_eye]:
    ... 

     if eye.eye_sample.is_left:
        eye_center_array = np.array([[160 - eye_center[0], eye_center[1], 1.0]], dtype=np.float32)
    else:
        eye_center_array = np.array([[eye_center[0], eye_center[1], 1.0]], dtype=np.float32)
                    
     transformed_points = np.asarray(np.matmul(eye_center_array, eye.eye_sample.transform_inv.T))[:, :2]    
    center = np.array(transformed_points).flatten()
  
    # Draw dots on eye center in full frame             
     cv2.circle(orig_frame, (int(center[0]), int(center[1])) , 3, (0,255,0),-1)            
     cv2.imshow("Webcam", cv2.flip(orig_frame, 1))

Why using torch.no_grad() instead of eval() mode?

Hi David,

I had the same idea for hourglass pytorch model, trained on UnityEyes dataset, but you made it really good. I just have question about why are you not using .eval() mode in inference? When I tried eval() mode I get very different results, probably because of the BatchNorm statistics calculation.

Reference Paper

Hello Davis,

Would you please share the reference paper to this gaze-estimation work?
I have these references:

https://ait.ethz.ch/projects/2018/landmarks-gaze/downloads/park2018etra.pdf
https://www.cl.cam.ac.uk/research/rainbow/projects/unityeyes/
https://rahimentezari.github.io/GAN/gan-gaze.html
https://openaccess.thecvf.com/content_ICCV_2019/papers/He_Photo-Realistic_Monocular_Gaze_Redirection_Using_Generative_Adversarial_Networks_ICCV_2019_paper.pdf
https://openaccess.thecvf.com/content_cvpr_2018/papers/Wang_A_Hierarchical_Generative_CVPR_2018_paper.pdf
https://www.mdpi.com/1424-8220/20/17/4935/htm

But I need the exact reference to the GAN part.

I actually tested the first link which you mentioned is the base to this work several months ago and the performance was not satisfying and there was lag in the stream and the detection. I want to know exactly what improvement/development/modification made to this work which became this one.

Waiting for your response.

Thx,

query regarding "run_with_webcam.py" file

I am getting error in the following line:
torch.load('checkpoint.pt', map_location=device).
The error states that 'checkpoint.pt' is missing.
Can you guide us on How to get 'checkpoint.pt' file or how to rid of the error.

Thank You
Himani

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.