GithubHelp home page GithubHelp logo

Comments (7)

Nicholasli1995 avatar Nicholasli1995 commented on May 13, 2024

Hi, appreciate for your excellent work and comprehensively technical detail release! I believe this would be great effort to 3DHPE field.

One question. I encountered some issues when inference HRNet model (I mean the 2D detector) that loads the pretrained weight, given cropped h36m images.

  1. Accuracy in the first screenshot shows that the 2D average error (pixel) is around 7, which is inconsistent with reported 4.4.
  2. Meanwhile, I print the 2D pose prediction for the frame of (9, 'Directions', 'Directions 1.54138969.h5-sh') from the model inference result, and from released twoDPose_HRN_test.npy.
    The inconsistence appears again as shown in the uploaded 2nd image.

Could you help me get rid of the unexpected situation ? Did I miss something, or may you release another high-acc pretrained HR model ?

Many thanks !

Ref1 image

The released model was the one used to generate the 2D predictions. How did you pre-process the images before feeding them to the model? How about other sequences other than (9, 'Directions', 'Directions 1.54138969.h5-sh')? I think the difference may come from the way you crop the input patch.

from evoskeleton.

kyang-06 avatar kyang-06 commented on May 13, 2024

Thank you for soon reply!
Yes, the crop strategy is a confusing point to me. Sorry for forgetting to mention

I first crop 1000x1002 (or 1000x1000) image into person bounding patch by ground truth bounding box provided by h36m official, then resize it into 384x288 with black border.
Did I do it in the right way?

As for other sequences, I do not check the consistency.
But as the mean 2D error is around 7 pixel on the whole test set, I guess they are in the similar situation.

from evoskeleton.

kyang-06 avatar kyang-06 commented on May 13, 2024

image
Hi, I visualized the inference results.
It seems that even easy pose is predicted at quite high error (e.g. T-pose prediction at 4pixel error).

The crop border size in the figure is some kind like the ones you provided in the instruction, so I guess the issue is not caused by cropping.
In the other hand, as for the normalization, I scale the image from [0-255] to [0,1], and then normalize it by mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225].

Any hint could be helpful :) Appreciate

from evoskeleton.

Nicholasli1995 avatar Nicholasli1995 commented on May 13, 2024

image Hi, I visualized the inference results. It seems that even easy pose is predicted at quite high error (e.g. T-pose prediction at 4pixel error).

The crop border size in the figure is some kind like the ones you provided in the instruction, so I guess the issue is not caused by cropping. In the other hand, as for the normalization, I scale the image from [0-255] to [0,1], and then normalize it by mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225].

Any hint could be helpful :) Appreciate

Your results do not seem correct to me either. Maybe you can paste your code here so that I can check your data pre-processing, model inference, and error calculation. Did you use the original code that uses an affine transformation warping? Showing your code can help me understand your process of "crop 1000x1002 (or 1000x1000) image into person bounding patch by ground truth bounding box provided by h36m official, then resize it into 384x288 with black border."

from evoskeleton.

kyang-06 avatar kyang-06 commented on May 13, 2024

Many thanks for continuous follow-up of this issue.

I have found one bug that previously I missed to run cv2.cvtColor(img, cv2.COLOR_BGR2RGB) when loading image samples.
After fixing the bug, I got 2D error at 5.85, still worse than 4.4 but tolerable, as shown in the screenshot below:
image

The result visualization is now like in the figure below. Does it look like normal in your view?
image

Data process:

    def __getitem__(self, idx):
        image_file, db = tuple(self._db[idx])
        joints_3d = db['joint_label']
        bbox = db['bbox_2d'].astype(int)

        data_numpy = cv2.imread(image_file, 1 | 128)

        data_numpy = cv2.cvtColor(data_numpy, cv2.COLOR_BGR2RGB)

        if data_numpy is None:
            logger.error('=> fail to read {}'.format(image_file))
            raise ValueError('Fail to read {}'.format(image_file))
        
        #### Here: 1000x1000 -> cropped image with ground-truth bbox
        data_numpy = data_numpy[bbox[0,1]:bbox[1,1]+1, bbox[0,0]:bbox[1,0]+1]

        img_trans_mat = np.eye(3)
        img_trans_mat[:2, -1] = -bbox[0]

        joints_2d = db['joint_2d']
        joints_original = joints_2d.copy()
        joints_2d = joints_2d - bbox[0]
        joints_vis = np.ones(joints_2d.shape, dtype=np.float32)
        c, s = self._xywh2cs(0, 0, data_numpy.shape[1], data_numpy.shape[0])
        score = 1
        r = 0

        trans = get_affine_transform(c, s, r, self.image_size)
        input = cv2.warpAffine(
            data_numpy,
            trans,
            (int(self.image_size[0]), int(self.image_size[1])),
            flags=cv2.INTER_LINEAR)

        img_trans_mat = np.matmul(trans, img_trans_mat)
        img_trans_mat = np.concatenate([img_trans_mat, np.array([[0,0,1]])])

        if self.transform:
            input = self.transform(input)

        for i in range(self.num_joints):
            if joints_vis[i, 0] > 0.0:
                joints_2d[i, 0:2] = affine_transform(joints_2d[i, 0:2], trans)
                # set joints to in-visible if they are out-side of the image
                if joints_2d[i, 0] >= self.image_width or joints_2d[i, 1] >= self.image_height:
                    joints_vis[i, 0] = 0.0

        target, target_weight = self.generate_target(joints_2d, joints_vis)

        target = torch.from_numpy(target)
        target_weight = torch.from_numpy(target_weight)

        meta = {
            'image': image_file,
            'joints_2d': joints_2d,
            'joints_vis': joints_vis,
            'j_original_2d': joints_original,  # original coordinates
            'joints_3d': joints_3d,
            'center': c,
            'scale': s,
            'rotation': r,
            'score': score,
            'trans': img_trans_mat,     # 3x3
            'trans_inv': np.linalg.inv(img_trans_mat),
            'bbox': bbox
        }

        return input, target, target_weight, meta

Evaluation (I want to jointly train 2D+3D):

def validate(config, val_loader, val_dataset, model, criterion, output_dir,
                   tb_log_dir, writer_dict=None, total_batches=-1, save=False, split=None):
    batch_time = AverageMeter()
    losses = AverageMeter()
    acc = AverageMeter()
    error = AverageMeter()

    # switch to evaluate mode
    model.eval()

    num_iters = 0
    with torch.no_grad():
        end = time.time()
        for i, (input, target, target_weight, meta) in enumerate(val_loader):
            num_iters += 1
            if total_batches > 0 and num_iters > total_batches and not save:
                break
            batch_size = len(input)
            # compute output
            out_kpt_3d, out_kpt_2d, outputs, out_kpt_2d_orig = model(input, img_trans_mat_inv=meta['trans_inv'].float().cuda(), kpt_2d_gt=meta['j_original_2d'].float().cuda())
            
            if isinstance(outputs, list):
                output = outputs[-1]
            else:
                output = outputs

            target = target.cuda(non_blocking=True)
            target_weight = target_weight.cuda(non_blocking=True)
            target_3d = meta['joints_3d'].float().cuda() / 1.e3
            loss_3d = criterion['3d'](out_kpt_3d, target_3d)
            loss = loss_3d

            num_images = input.size(0)
            # measure accuracy and record loss
            losses.update(loss.item(), num_images)
            avg_acc = torch.norm(out_kpt_2d - meta['joints_2d'].cuda().float(), dim=-1).mean()
            acc.update(avg_acc, batch_size)

            err_cur = torch.norm((out_kpt_3d - target_3d) * 1e3, dim=-1).mean()
            error.update(err_cur, batch_size)

            batch_time.update(time.time() - end)
            end = time.time()
            if i % config.PRINT_FREQ == 0 or (i+1) == len(val_loader):
                msg = 'Test: [{0}/{1}]\t' \
                      'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' \
                      'Loss {loss.val:.4f} ({loss.avg:.4f})\t' \
                      'Accuracy {acc.val:.3f} ({acc.avg:.3f})\t Error ({error.avg:.3f})'.format(
                    i, len(val_loader), batch_time=batch_time, acc=acc,
                    loss=losses, error=error)
                logger.info(msg)

                prefix = '{}_{}'.format(
                    os.path.join(output_dir, 'val'), i
                )

Model inference:


class JointTrainingModel(nn.Module):
    def __init__(self, cfg, is_train, **kwargs):
        super(JointTrainingModel, self).__init__()
        self.model_2d = PoseHighResolutionNet(cfg, **kwargs)
        self.model_3d = MyLiftingModel(cfg, **kwargs)
        self.re_order = [3, 12, 14, 16, 11, 13, 15, 1, 2, 0, 4, 5, 7, 9, 6, 8, 10]

        if is_train and cfg.MODEL.INIT_WEIGHTS:
            self.model_2d.init_weights(cfg.MODEL.PRETRAINED_HRNET)
            self.model_3d.init_weights(cfg.MODEL.PRETRAINED_LIFTING)

    def forward(self, x, img_trans_mat_inv):
        output_heatmaps = self.model_2d(x)
        kpt_2d, maxvals = get_max_preds_soft_pt(output_heatmaps)
        kpt_2d = kpt_2d[:, self.re_order]
        ### Here: 2D coordinate 384x288 -> 1000x1000
        kpt_2d_original = torch.bmm(img_trans_mat_inv, torch.nn.functional.pad(kpt_2d, (0,1), mode='constant', value=1.).transpose(-2, -1)).transpose(-2, -1)[:, :, :2]
        kpt_2d_normalized = (kpt_2d_original - 500.) / 500.
        out_kpt_3d = self.model_3d(kpt_2d_normalized)

        return out_kpt_3d, kpt_2d, output_heatmaps, kpt_2d_original

Appreciate in advance for any possible reason that comes to your mind. :)

from evoskeleton.

Nicholasli1995 avatar Nicholasli1995 commented on May 13, 2024

out_kpt_2d

Hi, I notice you are computing joint distance in the local patch:
avg_acc = torch.norm(out_kpt_2d - meta['joints_2d'].cuda().float(), dim=-1).mean()

In contrast, I compute such distances in the original image before affine transformation:

distance_list.append(get_distance(joints_original, pred_src_coordinates))

Please use consistent code for evaluation.

from evoskeleton.

kyang-06 avatar kyang-06 commented on May 13, 2024

Thank you for the patient help! I will have a try.
This issue got mainly solved, so I close it.

from evoskeleton.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.