david-wb / gaze-estimation Goto Github PK
View Code? Open in Web Editor NEWA deep learning based gaze estimation framework implemented with PyTorch
A deep learning based gaze estimation framework implemented with PyTorch
Hello David,
I hope everything is okay with you.
I discovered that np.fliplr was used for left eyes in run_with_webcam.py before running eyenet, despite the fact that it was not used during training; the images are given to the model without flipping. What is the explanation for this? I'd appreciate it if you could let me know.
@david-wb I have converted model to ONNX format.The script is as follows
parser = argparse.ArgumentParser()
parser.add_argument('--weights', type=str, default='./weights/checkpoint.pt', help='weights path') # from yolov5/models/
parser.add_argument('--img-size', nargs='+', type=int, default=[96, 160], help='image size') # height, width
parser.add_argument('--batch-size', type=int, default=1, help='batch size')
opt = parser.parse_args()
opt.img_size *= 2 if len(opt.img_size) == 1 else 1 # expand
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
checkpoint = torch.load(opt.weights, map_location=device)
nstack = checkpoint['nstack']
nfeatures = checkpoint['nfeatures']
nlandmarks = checkpoint['nlandmarks']
eyenet = EyeNet(nstack=nstack, nfeatures=nfeatures, nlandmarks=nlandmarks).to(device)
eyenet.load_state_dict(checkpoint['model_state_dict'])
# Input
eyenet.eval()
img = torch.zeros(1, *opt.img_size).to(device)
_,landmarks, gaze = eyenet(img)
f = opt.weights.replace('.pt', '.onnx') # filename
torch.onnx.export(eyenet, img, f, verbose=False, opset_version=12, input_names=['inputs'])
But the results of the new model are all wrong
Could you tell me how we can use this gaze values to tell the person looking direction? when I compared the model gaze values and training data gaze it looks quite different. from the vector, gaze value has arrived but how we can use that tell the person looking direction?
Hi David,
I have a quick question about warpAffine. The full frame gets into the segment_eyes() method and there transform_mat and inv_transform_mat are calculated and cv2.warpAffine is called to get eye_image (160, 96). I have a question about getting the pupil center point (x,y) I predict in the eye_image back into the full frame coordinate.
If I understand this correct, when I do want to remap point back to full frame I will probably get a coordinates relative to 5_point facial landmarks which then can give me full frame coordinates for eye_center. How can I approach it to get pupil_center from eye_image (160, 96) coordinates back to full image coordinates? I have tried few approaches but nothing gives me relevant points.
1) #eye_center_full_frame = cv2.transformPerspective(eye_center_point, eye.eye_sample.transform_inv[:2, :])
2) #eye_center_full_frame = cv2.transform(eye_center_point, eye.eye_sample.transform_original[:2, :], (2,1), cv2.WARP_INVERSE_MAP)
or
eye_center_array = np.array([[eye_center[0], eye_center[1]]], dtype=np.float32)
transformed_points = eye_center_array * eye.eye_sample.transform_inv[:2, :2]
Thanks
Edit:
I found a solution, not sure if it is inteded to be used like this. If I don't flip the image to show there is needed different change for eye.is_left.
for eye in [left_eye, right_eye]:
...
if eye.eye_sample.is_left:
eye_center_array = np.array([[160 - eye_center[0], eye_center[1], 1.0]], dtype=np.float32)
else:
eye_center_array = np.array([[eye_center[0], eye_center[1], 1.0]], dtype=np.float32)
transformed_points = np.asarray(np.matmul(eye_center_array, eye.eye_sample.transform_inv.T))[:, :2]
center = np.array(transformed_points).flatten()
# Draw dots on eye center in full frame
cv2.circle(orig_frame, (int(center[0]), int(center[1])) , 3, (0,255,0),-1)
cv2.imshow("Webcam", cv2.flip(orig_frame, 1))
Hi David,
I had the same idea for hourglass pytorch model, trained on UnityEyes dataset, but you made it really good. I just have question about why are you not using .eval() mode in inference? When I tried eval() mode I get very different results, probably because of the BatchNorm statistics calculation.
Thank you very much for sharing your code.
Is it possible to get the gaze vector? I mean [Yaw, Pitch, Roll] or [x, y, z] values of gaze vector with respect to eyes.
Thank you very much.
Hello Davis,
Would you please share the reference paper to this gaze-estimation work?
I have these references:
https://ait.ethz.ch/projects/2018/landmarks-gaze/downloads/park2018etra.pdf
https://www.cl.cam.ac.uk/research/rainbow/projects/unityeyes/
https://rahimentezari.github.io/GAN/gan-gaze.html
https://openaccess.thecvf.com/content_ICCV_2019/papers/He_Photo-Realistic_Monocular_Gaze_Redirection_Using_Generative_Adversarial_Networks_ICCV_2019_paper.pdf
https://openaccess.thecvf.com/content_cvpr_2018/papers/Wang_A_Hierarchical_Generative_CVPR_2018_paper.pdf
https://www.mdpi.com/1424-8220/20/17/4935/htm
But I need the exact reference to the GAN part.
I actually tested the first link which you mentioned is the base to this work several months ago and the performance was not satisfying and there was lag in the stream and the detection. I want to know exactly what improvement/development/modification made to this work which became this one.
Waiting for your response.
Thx,
can you send another link to pretrained model?
thanks!!
I am getting error in the following line:
torch.load('checkpoint.pt', map_location=device).
The error states that 'checkpoint.pt' is missing.
Can you guide us on How to get 'checkpoint.pt' file or how to rid of the error.
Thank You
Himani
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.