Inconsistency between pretrained HRNet 2D detector and twoDPose_HRN.npy about evoskeleton HOT 7 CLOSED

nicholasli1995 commented on May 13, 2024

Inconsistency between pretrained HRNet 2D detector and twoDPose_HRN.npy

from evoskeleton.

Comments (7)

Nicholasli1995 commented on May 13, 2024

Hi, appreciate for your excellent work and comprehensively technical detail release! I believe this would be great effort to 3DHPE field.

One question. I encountered some issues when inference HRNet model (I mean the 2D detector) that loads the pretrained weight, given cropped h36m images.

Accuracy in the first screenshot shows that the 2D average error (pixel) is around 7, which is inconsistent with reported 4.4.

Meanwhile, I print the 2D pose prediction for the frame of (9, 'Directions', 'Directions 1.54138969.h5-sh') from the model inference result, and from released twoDPose_HRN_test.npy.
The inconsistence appears again as shown in the uploaded 2nd image.

Could you help me get rid of the unexpected situation ? Did I miss something, or may you release another high-acc pretrained HR model ?

Many thanks !

The released model was the one used to generate the 2D predictions. How did you pre-process the images before feeding them to the model? How about other sequences other than (9, 'Directions', 'Directions 1.54138969.h5-sh')? I think the difference may come from the way you crop the input patch.

from evoskeleton.

kyang-06 commented on May 13, 2024

Thank you for soon reply!
Yes, the crop strategy is a confusing point to me. Sorry for forgetting to mention

I first crop 1000x1002 (or 1000x1000) image into person bounding patch by ground truth bounding box provided by h36m official, then resize it into 384x288 with black border.
Did I do it in the right way?

As for other sequences, I do not check the consistency.
But as the mean 2D error is around 7 pixel on the whole test set, I guess they are in the similar situation.

from evoskeleton.

kyang-06 commented on May 13, 2024

Hi, I visualized the inference results.
It seems that even easy pose is predicted at quite high error (e.g. T-pose prediction at 4pixel error).

The crop border size in the figure is some kind like the ones you provided in the instruction, so I guess the issue is not caused by cropping.
In the other hand, as for the normalization, I scale the image from [0-255] to [0,1], and then normalize it by mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225].

Any hint could be helpful :) Appreciate

from evoskeleton.

Nicholasli1995 commented on May 13, 2024

Hi, I visualized the inference results. It seems that even easy pose is predicted at quite high error (e.g. T-pose prediction at 4pixel error).

The crop border size in the figure is some kind like the ones you provided in the instruction, so I guess the issue is not caused by cropping. In the other hand, as for the normalization, I scale the image from [0-255] to [0,1], and then normalize it by mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225].

Any hint could be helpful :) Appreciate

Your results do not seem correct to me either. Maybe you can paste your code here so that I can check your data pre-processing, model inference, and error calculation. Did you use the original code that uses an affine transformation warping? Showing your code can help me understand your process of "crop 1000x1002 (or 1000x1000) image into person bounding patch by ground truth bounding box provided by h36m official, then resize it into 384x288 with black border."

from evoskeleton.

kyang-06 commented on May 13, 2024

Many thanks for continuous follow-up of this issue.

I have found one bug that previously I missed to run cv2.cvtColor(img, cv2.COLOR_BGR2RGB) when loading image samples.
After fixing the bug, I got 2D error at 5.85, still worse than 4.4 but tolerable, as shown in the screenshot below:

The result visualization is now like in the figure below. Does it look like normal in your view?

Data process:

    def __getitem__(self, idx):
        image_file, db = tuple(self._db[idx])
        joints_3d = db['joint_label']
        bbox = db['bbox_2d'].astype(int)

        data_numpy = cv2.imread(image_file, 1 | 128)

        data_numpy = cv2.cvtColor(data_numpy, cv2.COLOR_BGR2RGB)

        if data_numpy is None:
            logger.error('=> fail to read {}'.format(image_file))
            raise ValueError('Fail to read {}'.format(image_file))
        
        #### Here: 1000x1000 -> cropped image with ground-truth bbox
        data_numpy = data_numpy[bbox[0,1]:bbox[1,1]+1, bbox[0,0]:bbox[1,0]+1]

        img_trans_mat = np.eye(3)
        img_trans_mat[:2, -1] = -bbox[0]

        joints_2d = db['joint_2d']
        joints_original = joints_2d.copy()
        joints_2d = joints_2d - bbox[0]
        joints_vis = np.ones(joints_2d.shape, dtype=np.float32)
        c, s = self._xywh2cs(0, 0, data_numpy.shape[1], data_numpy.shape[0])
        score = 1
        r = 0

        trans = get_affine_transform(c, s, r, self.image_size)
        input = cv2.warpAffine(
            data_numpy,
            trans,
            (int(self.image_size[0]), int(self.image_size[1])),
            flags=cv2.INTER_LINEAR)

        img_trans_mat = np.matmul(trans, img_trans_mat)
        img_trans_mat = np.concatenate([img_trans_mat, np.array([[0,0,1]])])

        if self.transform:
            input = self.transform(input)

        for i in range(self.num_joints):
            if joints_vis[i, 0] > 0.0:
                joints_2d[i, 0:2] = affine_transform(joints_2d[i, 0:2], trans)
                # set joints to in-visible if they are out-side of the image
                if joints_2d[i, 0] >= self.image_width or joints_2d[i, 1] >= self.image_height:
                    joints_vis[i, 0] = 0.0

        target, target_weight = self.generate_target(joints_2d, joints_vis)

        target = torch.from_numpy(target)
        target_weight = torch.from_numpy(target_weight)

        meta = {
            'image': image_file,
            'joints_2d': joints_2d,
            'joints_vis': joints_vis,
            'j_original_2d': joints_original,  # original coordinates
            'joints_3d': joints_3d,
            'center': c,
            'scale': s,
            'rotation': r,
            'score': score,
            'trans': img_trans_mat,     # 3x3
            'trans_inv': np.linalg.inv(img_trans_mat),
            'bbox': bbox
        }

        return input, target, target_weight, meta

Evaluation (I want to jointly train 2D+3D):

def validate(config, val_loader, val_dataset, model, criterion, output_dir,
                   tb_log_dir, writer_dict=None, total_batches=-1, save=False, split=None):
    batch_time = AverageMeter()
    losses = AverageMeter()
    acc = AverageMeter()
    error = AverageMeter()

    # switch to evaluate mode
    model.eval()

    num_iters = 0
    with torch.no_grad():
        end = time.time()
        for i, (input, target, target_weight, meta) in enumerate(val_loader):
            num_iters += 1
            if total_batches > 0 and num_iters > total_batches and not save:
                break
            batch_size = len(input)
            # compute output
            out_kpt_3d, out_kpt_2d, outputs, out_kpt_2d_orig = model(input, img_trans_mat_inv=meta['trans_inv'].float().cuda(), kpt_2d_gt=meta['j_original_2d'].float().cuda())
            
            if isinstance(outputs, list):
                output = outputs[-1]
            else:
                output = outputs

            target = target.cuda(non_blocking=True)
            target_weight = target_weight.cuda(non_blocking=True)
            target_3d = meta['joints_3d'].float().cuda() / 1.e3
            loss_3d = criterion['3d'](out_kpt_3d, target_3d)
            loss = loss_3d

            num_images = input.size(0)
            # measure accuracy and record loss
            losses.update(loss.item(), num_images)
            avg_acc = torch.norm(out_kpt_2d - meta['joints_2d'].cuda().float(), dim=-1).mean()
            acc.update(avg_acc, batch_size)

            err_cur = torch.norm((out_kpt_3d - target_3d) * 1e3, dim=-1).mean()
            error.update(err_cur, batch_size)

            batch_time.update(time.time() - end)
            end = time.time()
            if i % config.PRINT_FREQ == 0 or (i+1) == len(val_loader):
                msg = 'Test: [{0}/{1}]\t' \
                      'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' \
                      'Loss {loss.val:.4f} ({loss.avg:.4f})\t' \
                      'Accuracy {acc.val:.3f} ({acc.avg:.3f})\t Error ({error.avg:.3f})'.format(
                    i, len(val_loader), batch_time=batch_time, acc=acc,
                    loss=losses, error=error)
                logger.info(msg)

                prefix = '{}_{}'.format(
                    os.path.join(output_dir, 'val'), i
                )

Model inference:


class JointTrainingModel(nn.Module):
    def __init__(self, cfg, is_train, **kwargs):
        super(JointTrainingModel, self).__init__()
        self.model_2d = PoseHighResolutionNet(cfg, **kwargs)
        self.model_3d = MyLiftingModel(cfg, **kwargs)
        self.re_order = [3, 12, 14, 16, 11, 13, 15, 1, 2, 0, 4, 5, 7, 9, 6, 8, 10]

        if is_train and cfg.MODEL.INIT_WEIGHTS:
            self.model_2d.init_weights(cfg.MODEL.PRETRAINED_HRNET)
            self.model_3d.init_weights(cfg.MODEL.PRETRAINED_LIFTING)

    def forward(self, x, img_trans_mat_inv):
        output_heatmaps = self.model_2d(x)
        kpt_2d, maxvals = get_max_preds_soft_pt(output_heatmaps)
        kpt_2d = kpt_2d[:, self.re_order]
        ### Here: 2D coordinate 384x288 -> 1000x1000
        kpt_2d_original = torch.bmm(img_trans_mat_inv, torch.nn.functional.pad(kpt_2d, (0,1), mode='constant', value=1.).transpose(-2, -1)).transpose(-2, -1)[:, :, :2]
        kpt_2d_normalized = (kpt_2d_original - 500.) / 500.
        out_kpt_3d = self.model_3d(kpt_2d_normalized)

        return out_kpt_3d, kpt_2d, output_heatmaps, kpt_2d_original

Appreciate in advance for any possible reason that comes to your mind. :)

from evoskeleton.

Nicholasli1995 commented on May 13, 2024

out_kpt_2d

Hi, I notice you are computing joint distance in the local patch:
avg_acc = torch.norm(out_kpt_2d - meta['joints_2d'].cuda().float(), dim=-1).mean()

In contrast, I compute such distances in the original image before affine transformation:

EvoSkeleton/libs/hhr/core/evaluate.py

Line 120 in b2b355f

distance_list.append(get_distance(joints_original, pred_src_coordinates))

Please use consistent code for evaluation.

from evoskeleton.

kyang-06 commented on May 13, 2024

Thank you for the patient help! I will have a try.
This issue got mainly solved, so I close it.

from evoskeleton.

Inconsistency between pretrained HRNet 2D detector and twoDPose_HRN.npy about evoskeleton HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs