Comments (7)
Hi, appreciate for your excellent work and comprehensively technical detail release! I believe this would be great effort to 3DHPE field.
One question. I encountered some issues when inference HRNet model (I mean the 2D detector) that loads the pretrained weight, given cropped h36m images.
- Accuracy in the first screenshot shows that the 2D average error (pixel) is around 7, which is inconsistent with reported 4.4.
- Meanwhile, I print the 2D pose prediction for the frame of (9, 'Directions', 'Directions 1.54138969.h5-sh') from the model inference result, and from released twoDPose_HRN_test.npy.
The inconsistence appears again as shown in the uploaded 2nd image.Could you help me get rid of the unexpected situation ? Did I miss something, or may you release another high-acc pretrained HR model ?
Many thanks !
The released model was the one used to generate the 2D predictions. How did you pre-process the images before feeding them to the model? How about other sequences other than (9, 'Directions', 'Directions 1.54138969.h5-sh')? I think the difference may come from the way you crop the input patch.
from evoskeleton.
Thank you for soon reply!
Yes, the crop strategy is a confusing point to me. Sorry for forgetting to mention
I first crop 1000x1002 (or 1000x1000) image into person bounding patch by ground truth bounding box provided by h36m official, then resize it into 384x288 with black border.
Did I do it in the right way?
As for other sequences, I do not check the consistency.
But as the mean 2D error is around 7 pixel on the whole test set, I guess they are in the similar situation.
from evoskeleton.
Hi, I visualized the inference results.
It seems that even easy pose is predicted at quite high error (e.g. T-pose prediction at 4pixel error).
The crop border size in the figure is some kind like the ones you provided in the instruction, so I guess the issue is not caused by cropping.
In the other hand, as for the normalization, I scale the image from [0-255] to [0,1], and then normalize it by mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225].
Any hint could be helpful :) Appreciate
from evoskeleton.
Hi, I visualized the inference results. It seems that even easy pose is predicted at quite high error (e.g. T-pose prediction at 4pixel error).
The crop border size in the figure is some kind like the ones you provided in the instruction, so I guess the issue is not caused by cropping. In the other hand, as for the normalization, I scale the image from [0-255] to [0,1], and then normalize it by mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225].
Any hint could be helpful :) Appreciate
Your results do not seem correct to me either. Maybe you can paste your code here so that I can check your data pre-processing, model inference, and error calculation. Did you use the original code that uses an affine transformation warping? Showing your code can help me understand your process of "crop 1000x1002 (or 1000x1000) image into person bounding patch by ground truth bounding box provided by h36m official, then resize it into 384x288 with black border."
from evoskeleton.
Many thanks for continuous follow-up of this issue.
I have found one bug that previously I missed to run cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
when loading image samples.
After fixing the bug, I got 2D error at 5.85, still worse than 4.4 but tolerable, as shown in the screenshot below:
The result visualization is now like in the figure below. Does it look like normal in your view?
Data process:
def __getitem__(self, idx):
image_file, db = tuple(self._db[idx])
joints_3d = db['joint_label']
bbox = db['bbox_2d'].astype(int)
data_numpy = cv2.imread(image_file, 1 | 128)
data_numpy = cv2.cvtColor(data_numpy, cv2.COLOR_BGR2RGB)
if data_numpy is None:
logger.error('=> fail to read {}'.format(image_file))
raise ValueError('Fail to read {}'.format(image_file))
#### Here: 1000x1000 -> cropped image with ground-truth bbox
data_numpy = data_numpy[bbox[0,1]:bbox[1,1]+1, bbox[0,0]:bbox[1,0]+1]
img_trans_mat = np.eye(3)
img_trans_mat[:2, -1] = -bbox[0]
joints_2d = db['joint_2d']
joints_original = joints_2d.copy()
joints_2d = joints_2d - bbox[0]
joints_vis = np.ones(joints_2d.shape, dtype=np.float32)
c, s = self._xywh2cs(0, 0, data_numpy.shape[1], data_numpy.shape[0])
score = 1
r = 0
trans = get_affine_transform(c, s, r, self.image_size)
input = cv2.warpAffine(
data_numpy,
trans,
(int(self.image_size[0]), int(self.image_size[1])),
flags=cv2.INTER_LINEAR)
img_trans_mat = np.matmul(trans, img_trans_mat)
img_trans_mat = np.concatenate([img_trans_mat, np.array([[0,0,1]])])
if self.transform:
input = self.transform(input)
for i in range(self.num_joints):
if joints_vis[i, 0] > 0.0:
joints_2d[i, 0:2] = affine_transform(joints_2d[i, 0:2], trans)
# set joints to in-visible if they are out-side of the image
if joints_2d[i, 0] >= self.image_width or joints_2d[i, 1] >= self.image_height:
joints_vis[i, 0] = 0.0
target, target_weight = self.generate_target(joints_2d, joints_vis)
target = torch.from_numpy(target)
target_weight = torch.from_numpy(target_weight)
meta = {
'image': image_file,
'joints_2d': joints_2d,
'joints_vis': joints_vis,
'j_original_2d': joints_original, # original coordinates
'joints_3d': joints_3d,
'center': c,
'scale': s,
'rotation': r,
'score': score,
'trans': img_trans_mat, # 3x3
'trans_inv': np.linalg.inv(img_trans_mat),
'bbox': bbox
}
return input, target, target_weight, meta
Evaluation (I want to jointly train 2D+3D):
def validate(config, val_loader, val_dataset, model, criterion, output_dir,
tb_log_dir, writer_dict=None, total_batches=-1, save=False, split=None):
batch_time = AverageMeter()
losses = AverageMeter()
acc = AverageMeter()
error = AverageMeter()
# switch to evaluate mode
model.eval()
num_iters = 0
with torch.no_grad():
end = time.time()
for i, (input, target, target_weight, meta) in enumerate(val_loader):
num_iters += 1
if total_batches > 0 and num_iters > total_batches and not save:
break
batch_size = len(input)
# compute output
out_kpt_3d, out_kpt_2d, outputs, out_kpt_2d_orig = model(input, img_trans_mat_inv=meta['trans_inv'].float().cuda(), kpt_2d_gt=meta['j_original_2d'].float().cuda())
if isinstance(outputs, list):
output = outputs[-1]
else:
output = outputs
target = target.cuda(non_blocking=True)
target_weight = target_weight.cuda(non_blocking=True)
target_3d = meta['joints_3d'].float().cuda() / 1.e3
loss_3d = criterion['3d'](out_kpt_3d, target_3d)
loss = loss_3d
num_images = input.size(0)
# measure accuracy and record loss
losses.update(loss.item(), num_images)
avg_acc = torch.norm(out_kpt_2d - meta['joints_2d'].cuda().float(), dim=-1).mean()
acc.update(avg_acc, batch_size)
err_cur = torch.norm((out_kpt_3d - target_3d) * 1e3, dim=-1).mean()
error.update(err_cur, batch_size)
batch_time.update(time.time() - end)
end = time.time()
if i % config.PRINT_FREQ == 0 or (i+1) == len(val_loader):
msg = 'Test: [{0}/{1}]\t' \
'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' \
'Loss {loss.val:.4f} ({loss.avg:.4f})\t' \
'Accuracy {acc.val:.3f} ({acc.avg:.3f})\t Error ({error.avg:.3f})'.format(
i, len(val_loader), batch_time=batch_time, acc=acc,
loss=losses, error=error)
logger.info(msg)
prefix = '{}_{}'.format(
os.path.join(output_dir, 'val'), i
)
Model inference:
class JointTrainingModel(nn.Module):
def __init__(self, cfg, is_train, **kwargs):
super(JointTrainingModel, self).__init__()
self.model_2d = PoseHighResolutionNet(cfg, **kwargs)
self.model_3d = MyLiftingModel(cfg, **kwargs)
self.re_order = [3, 12, 14, 16, 11, 13, 15, 1, 2, 0, 4, 5, 7, 9, 6, 8, 10]
if is_train and cfg.MODEL.INIT_WEIGHTS:
self.model_2d.init_weights(cfg.MODEL.PRETRAINED_HRNET)
self.model_3d.init_weights(cfg.MODEL.PRETRAINED_LIFTING)
def forward(self, x, img_trans_mat_inv):
output_heatmaps = self.model_2d(x)
kpt_2d, maxvals = get_max_preds_soft_pt(output_heatmaps)
kpt_2d = kpt_2d[:, self.re_order]
### Here: 2D coordinate 384x288 -> 1000x1000
kpt_2d_original = torch.bmm(img_trans_mat_inv, torch.nn.functional.pad(kpt_2d, (0,1), mode='constant', value=1.).transpose(-2, -1)).transpose(-2, -1)[:, :, :2]
kpt_2d_normalized = (kpt_2d_original - 500.) / 500.
out_kpt_3d = self.model_3d(kpt_2d_normalized)
return out_kpt_3d, kpt_2d, output_heatmaps, kpt_2d_original
Appreciate in advance for any possible reason that comes to your mind. :)
from evoskeleton.
out_kpt_2d
Hi, I notice you are computing joint distance in the local patch:
avg_acc = torch.norm(out_kpt_2d - meta['joints_2d'].cuda().float(), dim=-1).mean()
In contrast, I compute such distances in the original image before affine transformation:
EvoSkeleton/libs/hhr/core/evaluate.py
Line 120 in b2b355f
Please use consistent code for evaluation.
from evoskeleton.
Thank you for the patient help! I will have a try.
This issue got mainly solved, so I close it.
from evoskeleton.
Related Issues (20)
- Regarding the 2D HPE model HOT 2
- training input size and inference input size not match HOT 32
- Location of the hip/pelvis joint? HOT 6
- camera params in cameras.npy HOT 2
- generating other poses from a known pose HOT 7
- ImportError: No module named libs.hhr.config HOT 3
- AttributeError: module 'libs.utils' has no attribute 'utils' HOT 5
- 2d poses from 3d pose HOT 11
- 2D to 3D with own data HOT 6
- weird 3D Pose
- Input of the plot_distribution in anglelimits.py HOT 4
- Data Preprocessing when test on 3dhp dataset HOT 3
- Obtain 3D skeleton with 2D key-points as inputs using SMPLify by own data HOT 3
- Source of 2D keypoints when eval on 3dhp dataset HOT 1
- Preprocessed npz file of HRNet HOT 1
- About 2D anchor of the cropped image HOT 3
- Regarding h36m image HOT 5
- 模型下载问题 HOT 1
- annotate_3D.py
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from evoskeleton.