Issue Type Bug OS Mac OS

BodyPix on MacOS - Dilation not supported for AutoPadType::SAME_UPPER or AutoPadType::SAME_LOWER about pinto_model_zoo HOT 8 CLOSED

cansik commented on June 11, 2024

BodyPix on MacOS - Dilation not supported for AutoPadType::SAME_UPPER or AutoPadType::SAME_LOWER

from pinto_model_zoo.

Comments (8)

cansik commented on June 11, 2024 1

Ok, I will have a look if the openvino runtime has the same issues (should be able to drop the onnx directly into it), maybe it's working with an alternative backend. Anyway, already thanks a lot for the model conversion and example script you've provided!

It would be really helpful to be able to be able to run bodypix on various machines, since it's one of the only pretrained bodypart-segmentation models. Using the python-tf-bodypix version is quite difficult to install lately, that's why I wanted to make a clean rewrite based on onnx or openvino.

from pinto_model_zoo.

PINTO0309 commented on June 11, 2024 1

That is a great initiative. I've done some miscellaneous ONNX conversions as a hobby and will implement a way to eliminate the above error when I get around to it. It's not very difficult.

from pinto_model_zoo.

cansik commented on June 11, 2024 1

I was able to add openvino as additional runtime. The output seems correct as far as I can tell (somehow weird translated, but I experienced that with DirectML as well). It's of course not as fast, but at least it's a solution which runs on any OS (on CPU).

Would you be open for a PR with the openvino runtime and some additional cleanups of the demo script?

from pinto_model_zoo.

cansik commented on June 11, 2024 1

Affine Transformation and Resize to Original Size

Ok, the problem of the translation was because I did not resize the output maps to the original image size before applying the affine transform. It seems to work now, but I am not sure if it's the correct way. For poses that are detected further away from the center, I am still getting masks that do not match the original image (are a bit shifted to the left or right):

I am applying the affine transformation (# Fine-tune position of mask image) and then resize the output to the original image size. Am I missing something?

I've already added padding for the input image, but it the offset is still visible.

Unique Keypoints

I also noticed, that sometimes too many keypoints are returned from extract_max_score_points_unique. I've added the following line of code to only extract the unique indices to always be able to create a valid pose.

# only get unique values
unique_first_values, unique_indices = np.unique(keypoints_classidscorexy[:, 0], return_index=True)
keypoints_classidscorexy = keypoints_classidscorexy[unique_indices]

Thresholds / Constants

I am just thinking about if it would make sense to not fix thresholds inside the graph, but expose them as inputs:

Of course it would be possible to use onnx and get the specific node and adjust the value by code, but wouldn't an input make more sense?

Part Color Overlapping

Maybe related to the threshold thing, but it seems that the colored part map overlaps at the edge and creates a rainbow of parts. Do you have an idea why this is happening?

from pinto_model_zoo.

PINTO0309 commented on June 11, 2024

I know; it doesn't work on Linux as well as Mac.

from pinto_model_zoo.

cansik commented on June 11, 2024

It doesn't seem to be a problem with the os, but with the execution provider. I tired it on windows with CPU and the same error happened. With DirectML or CUDA, it isn't a problem.

from pinto_model_zoo.

PINTO0309 commented on June 11, 2024

You're right. I have already confirmed that in advance as well. So far, except for the TensorRT Provider, the runtime seems to be buggy.

from pinto_model_zoo.

PINTO0309 commented on June 11, 2024

I am applying the affine transformation (# Fine-tune position of mask image) and then resize the output to the original image size. Am I missing something?

I have made significant changes to the processing flow to keep processing to only the minimum necessary to optimize the model. The meaning of this optimization is that all computational graphs that would not have a fixed model shape were recalculated to have a fixed shape. (This is my own tuning technique and may be difficult to understand.)

I also noticed, that sometimes too many keypoints are returned from extract_max_score_points_unique. I've added the following line of code to only extract the unique indices to always be able to create a valid pose.

I don't think you are wrong. I think my implementation is pretty messy because I was processing the model while optimizing the model and testing single human inference, while also thinking of ideas to efficiently perform multiple person detection. I agree with your implementation.

Of course it would be possible to use onnx and get the specific node and adjust the value by code, but wouldn't an input make more sense?

Yes, it is. That's the part I was quite torn about what to do, too. Many of the models I have committed to are very rarely used by good engineers like yourself. Thus, it seems more likely that people are looking for a demonstration that works quickly and without thinking. This was the only material for the final decision. In fact, all of the models I use in my research are processed into models where thresholds can be entered externally.

Maybe related to the threshold thing, but it seems that the colored part map overlaps at the edge and creates a rainbow of parts. Do you have an idea why this is happening?

I think that the Resize (Bilinear) and Sigmoid near the last layer and the part of the mask generation that forces the division by 255 and the resultant 0 or 1 to be processed as a boolean may be the cause of such an error. I haven't looked into it in much detail, but I can't deny that there are such discrepancies, especially since the post-processing part is implemented in a very forced manner. 16 strides, so a very small vertical and horizontal ROI is linearly stretched by a factor of 16. The specification is to be compensated by offset, but it does not seem to be working. This offset was the output of the model in Google's design, but in my design it is calculated and embedded in the model.

Both Google's official implementation five years ago and the tf-bodypix repository I cited were originally where all post-processing was handled programmatically. However, those post-processing was designed in such a way that GPUs and accelerators could not be used effectively. In other words, the post-processing is designed to be as efficient as possible in hardware in exchange for allowing a small amount of accuracy degradation.

from pinto_model_zoo.

BodyPix on MacOS - Dilation not supported for AutoPadType::SAME_UPPER or AutoPadType::SAME_LOWER about pinto_model_zoo HOT 8 CLOSED

Comments (8)

Affine Transformation and Resize to Original Size

Unique Keypoints

Thresholds / Constants

Part Color Overlapping

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs