lmb-freiburg / hand3d Goto Github PK
View Code? Open in Web Editor NEWNetwork estimating 3D Handpose from single color images
License: GNU General Public License v2.0
Network estimating 3D Handpose from single color images
License: GNU General Public License v2.0
Asking because sometimes it can take a while for newer work to show prominently in google results
Hi! I have a problem that while no hand in image, the script return weird coordinates, I think it can be resolved by setting the threshold on the result. Can you plz tell me how?
In create_db.m, i think you should also add "anno_uv_l= anno_uv_l(1:2, :);" in order to get 2x21 coordinate matrix for each sample, just like anno_uv_r. If the third dimension indicates the visibility of each keypoint shouldn't anno_uv_r be of the same shape?
Hi!
Thanks for you great work!
I was confused that how you get the weight folder which I directly download from the data you showed on Readme. when I run the run.py, it shows "Loaded 102 variables from weights/posenet3d-rhd-stb-slr-finetuned.pickle", "Loaded 37 variables from weights/handsegnet-rhd.pickle". I was confused how you get those pickle files, cause after I finish training, I cannot find any model saved like those pickle files.
Thank you
Hi @zimmerm,
Thanks for your paper,
There is no a pretrained model, could you please you provide that?
I try your create_db.m, I find uv coordinate is false in 'BB', how should I do?
is there a way to extract your predicted keypoints
hand3d-master/nets/ColorHandPose3DNetwork.py", line 52, in init
assert os.path.exists(file_name), "File not found."
AssertionError: File not found.
Missing weight files : ./weights/handsegnet-rhd.pickle', './weights/posenet3d-rhd-stb-slr-finetuned.pickle'
Hi,
I extracted data in the root folder. I am getting the following error.
File "run.py", line 47, in
keypoints_scoremap_tf, keypoint_coord3d_tf = net.inference(image_tf, hand_side_tf, evaluation)
File "/home/alex/dev/projects/hand3d-master/net.py", line 37, in inference
hand_mask = single_obj_scoremap(hand_scoremap)
File "/home/alex/dev/projects/hand3d-master/utils.py", line 246, in single_obj_scoremap
max_loc = find_max_location(scoremap_fg)
File "/home/alex/dev/projects/hand3d-master/utils.py", line 228, in find_max_location
xy_loc.append(tf.concat(0, [x_loc, y_loc]))
File "/home/alex/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1062, in concat
).assert_is_compatible_with(tensor_shape.scalar())
File "/home/alex/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 737, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (2, 1) and () are incompatible
Hello, thank you for your great work.
There are some questions of STB dataset, the datatset has left & right RGB images, when I saw the description of it (3000 frames of eval & 15000 frames of training) on your paper, but if I follow the setting, I found that it will create 3000 * 2 images of eval & 15000 * 2 of training (because of it has left and right).So I want to make sure is this (3000 * 2, 15000 * 2) correct?
Hope your answer.Best wishes.
Hello!
I'm wokring on a academic project (bachelor) where I need to estimate a hands skeleton in 3D!
You work looks very interesting and I would like to use it as my backend for estimating hand skeletons.
I see it is built on top of tensorflow, so how difficult would this be to export to Java? E.g. Tensorflow Lite? I guess it still calls some of the native C++ functions.
Best regards,
Christian!
I had download the data files, weights_HandSegNet.pickle and weights_Pose3D.pickle.
I encountered the error "ValueError: unsupported pickle protocol: 3" when using the python 2.7.
The tensorflow package installed on my python 2.7,and I know that probably I reinstall tensorflow on python 3.5 maybe solve the issue.
If you help me provide the weights files generated by pickle dump in protocol 2, it will help me a lot.
Thanks
Hello,
what does the forward pass weights contains? Was it trained with the whole RHD dataset for handsegnet and posenet? Because it says minimal example should I download the dataset and retrain all?
Hello,
Thanks for your sharing.
I am working on an environment with only four 1080 titian GPUs, I am not sure whether it can be trained if it has other tasks running.
I am not sure whether the lmb-freiburg can provide a pre-trained model?
Thanks
Hi! I have ran your demo code which provides both 2D and normalized 3D coordinates. The first ones can be easily translated to pixel coordinates and be overlaed in the original image. Is there any way to do the same for the 3D coordinates? i.e. translate the coordinates to the pixel scale to overlap x,y values on the original image.
I would like to start to port it to caffe(My goal is to implement just forward passes, not for training).
I have read your paper roughly and I wonder there are some custom layers which are not included in the existing TF or caffe layers.
Plus, I wonder, HandSegNet can be replaced with SSD hand detector as I thought HandSegNet is just for detecting hands and after that the hand-cropped patchs from the original image(not from the feature map of the last layer of the HandSegNet) is transferred to PoseNet.
hi, thanks for sharing your code.
i'm confused that there are 42 key points in a frame even though the corresponding image in 'color' directory only shows one hand
in fact, i'm creating my own data, but how could i get 42 points by only one hand contained image ?
Does it takes video as input and yields output accordingly?
can I get any inference speed measurments? can it be poosible to run on TX2 in real time?
i tried to run on grey channel-averaged images and got nonsense results as compared to the color versions (my use case has only grey images). does this mean the model is fined tuned for caucasian hue, and won't work on grey images nor on people of color?
When running with ,,python training_handsegnet.py"
it reported error:
"
tensorflow.python.framework.errors_impl.InvalidArgumentError: Unsuccessful TensorSliceReader constructor: Failed to get matching files on ./weights/cpm-model-mpii: Not found: ./weights; No such file or directory."
From the paper, some design idea come from open-pose, could you guide me where is the essential pre-downloaded model for training, e.g., cpm-model-mpii.
Thanks & Regards!
Neo
Thanks for your works.
I have trained on tensorflow successfully, Do you have the trained caffe model?
If you do not have the plan to convert the model to caffe, do you have some suggestions that I should pay attention when trained on caffe?
I wanted to play around with your work but im getting the above error when executing run.py
annysuggestions ? Its TF 1.1
tia
I used another hand detector and keypoints detector. After that, I want to use your training_lifting.py to lift 2D to 3D coordinates. So, can I use only your posepriornetwork for 3D pose estimation. How can I do that?
And in your posenet network, what kinds of output of keypoints_map. As example, 21 keypoints for one image or all images. Can you show me your keypoints_map result as an example?
With your RHD DB, you have mask images from color image. Could you tell me how to make mask image?
I would like to make the similar dataset from the custom dataset as well.
How do i run this code to detect my hands in realtime?
Hello Team,
I tried to run the run.py file for the various images and the images that provided by you in the folder "data". But the execution time is more for each images. And processing image and getting result itself taking more time and its taking close to "6 seconds per image".
Even I tried tensorflow, and tensoflow-gpu and G3 AWS instance(It has below graphic card) , but no luck in execution time.
Graphic card details:
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:1e.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1)
(Main goal is to explore this runi.py for live webcam video , but please help me reduce the execution time)
Please find installed packages:
absl-py==0.7.1
bleach==1.5.0
cycler==0.10.0
html5lib==0.9999999
Markdown==3.0.1
matplotlib==1.5.3
numpy==1.16.2
Pillow==5.4.1
pkg-resources==0.0.0
protobuf==3.7.0
pyparsing==2.3.1
python-dateutil==2.8.0
pytz==2018.9
scipy==0.18.1
six==1.12.0
tensorflow-gpu==1.5.0
tensorflow-tensorboard==1.5.1
Werkzeug==0.15.1
Code where I am checking execution time in run.py:
t = time.time()
print("Intial taken : {:.3f}".format(time.time() - t))
hand_scoremap_v, image_crop_v, scale_v, center_v,\
keypoints_scoremap_v, keypoint_coord3d_v = sess.run([hand_scoremap_tf, image_crop_tf, scale_tf, center_tf,
keypoints_scoremap_tf, keypoint_coord3d_tf],
feed_dict={image_tf: image_v})
print("time taken by network : {:.3f}".format(time.time() - t))
Result: for this below images
image_list.append('./data/img3.png')
image_list.append('./data/img4.png')
image_list.append('./data/img5.png')
Intial taken : 0.000
time taken by network : 5.537
Intial taken : 0.000
time taken by network : 5.338
Intial taken : 0.000
time taken by network : 5.340
Hey,I read the handsegnet part of your code,but after that ,I have a question.That is: in your paper,you said,
"Our HandSegNet is a smaller version of the network from Wei et al. [19] trained on our hand pose dataset. ".
But in your code,I think you just did the first half part like the method in "Fully Convolutional Networks for Semantic Segmentation"(FCN),the hand masks is not the same as heatmaps,at least in my opinion.Is my understanding right?
i saw in your training code only use the first batch of the dataset is used for training, why?
or just code is sample to show the idea, we need to write our own training loop? @zimmerm
Hi @zimmerm
I have read your paper, and in supplementary material you wrote each section training procedure,
it means that you have trained each section separately not end to end?
for example for training pose prior net
, you use the ground-truth heatmaps as input to network to get can pose & rot mat
?
for training the posenet
you have used groundtruth bounding box around the hand ?
I mean you did not have end to end training you have trained each module separately (handseg, posenet, poseprior)
with ground truth input (not predictions from other modules as input) , right?
I would like to visualize a bounding box and visualize it on the full size input image. Is there a option to achieve that?
Thank you!
Hello, thank you very much for your outstanding work!
My question is, the GT root node of the RHD data set in your paper is at the position of the wrist. But the root node of the predicted hand pose is in the center of the palm.
So, how do you ensure the fairness of quantitative comparison?
Hi.
Thanks for the great work!
Can this network detect both hands at a time?
I couldn't find a code that can pose estimate both hands simultaneously from run.py.
So I manually masked one side of image if both hands are present.
Would this be only option for detecting 2 hands? or would there be a more convenient way?
Thankyou
Hello!
I'm trying to achieve the same results that you describe in your paper on the posenet stage when adding the STB dataset. However, the results are far from what you have achieved, and I can not find the reason why. I was hoping if you could enlighten me on this step.
After training with the RHD dataset using the pipeline you've published on posenet_training.py
, I load BinaryDbReaderST
B with the following parameters:
dataset = BinaryDbReaderSTB(mode='training', batch_size=train_para['BATCH_SIZE'], shuffle=True, coord_uv_noise=True, hand_crop=True, crop_center_noise=True, use_wrist_coord=True)
And proceed to run the session passing the tensors:
_, loss_v = sess.run([train_op, loss])
The BinaryDbReaderSTB
class was not modified and I've processed the data using the scripts you provided.
I then proceed to evaluate the training, using:
dataset = BinaryDbReaderSTB(mode='evaluation', shuffle=False, use_wrist_coord=True)
When executing with USE_RETRAINED=False
, the metrics are as expected:
Average mean EPE: 18.581 pixels
However, when using my model trained with RHD+STB, the lowest mean EPE I got was ~40 pixels. Could you please point me to what I am forgetting?
I tried some ideas, as using different epochs combinations, tweaking the lr decay and different configurations on the data loader, but no effect.
Thank you for your attention
run.py contains code like this:
keypoints_scoremap_v, keypoint_coord3d_v = sess.run([hand_scoremap_tf, image_crop_tf, scale_tf, center_tf,keypoints_scoremap_tf, keypoint_coord3d_tf],feed_dict={image_tf: image_v})
The input image is ./data/img.png. but I get keypoint_coord3d_v like this:
[[ 1.44893011e-06 2.47310300e-06 8.19431716e-06]
[ 1.90374464e-01 -2.14477921e+00 -1.78279579e-01]
[ -3.34568620e-02 -1.62475693e+00 4.05245125e-02]
[ -3.15526843e-01 -1.11640537e+00 3.22329104e-01]
[ -5.08553386e-01 -3.76516193e-01 5.07125676e-01]
[ -3.13133001e-01 -1.18032885e+00 -1.30266607e+00]
[ 1.37096226e-01 -1.25639629e+00 -1.34058976e+00]
[ 5.93820870e-01 -1.23831999e+00 -1.13210297e+00]
[ 5.98206878e-01 -9.30948436e-01 -4.23936307e-01]
[ -4.06365603e-01 -5.10450840e-01 -9.82764661e-01]
[ -1.59013331e-01 -7.13905573e-01 -1.39288712e+00]
[ 5.09743333e-01 -8.13335598e-01 -1.47191596e+00]
[ 7.26554811e-01 -4.88295704e-01 -6.81376576e-01]
[ -3.66737843e-01 -9.66296196e-02 -9.01745081e-01]
[ -2.10868478e-01 -2.65226990e-01 -1.37259626e+00]
[ 3.51663291e-01 -2.92807043e-01 -1.53445566e+00]
[ 6.96920574e-01 -4.46554348e-02 -8.14203858e-01]
[ -3.76463950e-01 2.41669282e-01 -1.09918964e+00]
[ -1.30924404e-01 1.84271917e-01 -1.45671225e+00]
[ 2.94114619e-01 2.42203340e-01 -1.51309526e+00]
[ 4.97256935e-01 4.71493840e-01 -8.21017146e-01]]
The distance between wrist and thumb tip (keypoint_coord3d_v[0] and keypoint_coord3d_v[1]) is 2.160583, a weird number. What's the unit?? millimeter and meter both are not correct.
Does it need a conversion?
Thanks
--
when I use only the model "'./weights/posenet3d-rhd-stb-slr-finetuned.pickle'",
I got one error "ValueError: cannot reshape array of size 1049600 into shape (2562,512)"
Hmmm, 2562 x 512 is 1311744 and 1049600 = 2050 x 512. What should I do?
It seems to be related to PosePrior/fc_rel0/weights's shape " 2050, 512 = 1049600.
I think I should call _inference_pose3d() as PosePrior, ViewPointNet's parameters are contained in the pickle and it was loaded already.
I don't use HandSegNet as below;
def inference(self, image, hand_side, evaluation):
# detect keypoints in 2D
keypoints_scoremap = self.inference_pose2d(image)
keypoints_scoremap = keypoints_scoremap[-1]
# estimate most likely 3D pose
keypoint_coord3d = self._inference_pose3d(keypoints_scoremap, hand_side, evaluation)
# upsample keypoint scoremap
s = image.get_shape().as_list()
keypoints_scoremap = tf.image.resize_images(keypoints_scoremap, (s[1], s[2]))
return image, keypoints_scoremap
Can anybody give me any hint? Please help me ;/
I saved frozen model, then converted to a transformed model optimized for inference using Tensorflow TransformGraph, but when I now try to inspect the .pb file created by TransformGraph, I get the following error
Traceback (most recent call last):
File "/Users/jyoti/code/hand3d/venv-latest/lib/python3.5/site-packages/tensorflow/python/framework/importer.py", line 427, in import_graph_def
graph._c_graph, serialized, options) # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape must be rank 1 but is rank 0 for 'import/single_obj_scoremap/Slice' (op: 'Slice') with input shapes: [4], [], [1].
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "../tf-coreml/utils/inspect_pb.py", line 58, in <module>
inspect(sys.argv[1], sys.argv[2])
File "../tf-coreml/utils/inspect_pb.py", line 12, in inspect
tf.import_graph_def(graph_def)
File "/Users/jyoti/code/hand3d/venv-latest/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/Users/jyoti/code/hand3d/venv-latest/lib/python3.5/site-packages/tensorflow/python/framework/importer.py", line 431, in import_graph_def
raise ValueError(str(e))
ValueError: Shape must be rank 1 but is rank 0 for 'import/single_obj_scoremap/Slice' (op: 'Slice') with input shapes: [4], [], [1].```
how can the programmed detect two hands ?
thanks
Hi @zimmerm ,
in RHD
dataset what does visible
mean? does it mean to be occluded or not
Or does it mean the existence of hand (joint) in image?
I’d like to run hand3d using deeplearnjs but they accept .ckpt files see https://deeplearnjs.org/demos/mnist/mnist.html
Do you have those files or a script to convert the Pickle files to ckpt?
Can this model be used to analyze a video? Also is the snapshot linked in the READMe equal to the power achieved by training the model myself?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.