In multi_hand_tracker.py, line 884-889, it's transforming 2D coordinates from cropped image back to original image with 2D similarity transform Minv. However, here the z coordinates are still in cropped image and not transformed.
kp_orig_0 = (self._pad1(joints[:,:2]) @ Minv.T)[:,:2]
kp_orig_0 -= pad[::-1]
# Add back the 3D data
kp_orig = joints[:,:]
kp_orig[:,:2] = kp_orig_0[:,:2]
kp_orig_0 = (self._pad1(joints[:,:2]) @ Minv.T)[:,:2]
kp_orig_0 -= pad[::-1]
scale = np.linalg.norm(Minv[0, :2])
# Add back the 3D data
kp_orig = joints[:,:]
kp_orig[:,:2] = kp_orig_0[:,:2]
# also scale the z coordinates
kp_orig[:, 2] *= scale