thodan / epos Goto Github PK
View Code? Open in Web Editor NEWCode for "EPOS: Estimating 6D Pose of Objects with Symmetries", CVPR 2020.
Home Page: http://cmp.felk.cvut.cz/epos/
License: MIT License
Code for "EPOS: Estimating 6D Pose of Objects with Symmetries", CVPR 2020.
Home Page: http://cmp.felk.cvut.cz/epos/
License: MIT License
when i train, i got nan for loss
python train.py --model=lmo
step: 0 total_loss: 9.5576973 obj_cls: 2.77258897 frag_cls: 4.15888262 frag_loc: 2.37503433
step: 100 total_loss: nan obj_cls: nan frag_cls: nan frag_loc: nan
INFO:tensorflow:global_step/sec: 2.1272
step: 200 total_loss: nan obj_cls: nan frag_cls: nan frag_loc: nan
INFO:tensorflow:global_step/sec: 2.22972
step: 300 total_loss: nan obj_cls: nan frag_cls: nan frag_loc: nan
INFO:tensorflow:global_step/sec: 2.22798
step: 400 total_loss: nan obj_cls: nan frag_cls: nan frag_loc: 2.52651024
INFO:tensorflow:global_step/sec: 2.22882
step: 500 total_loss: nan obj_cls: nan frag_cls: nan frag_loc: nan
INFO:tensorflow:global_step/sec: 2.23132
step: 600 total_loss: nan obj_cls: nan frag_cls: nan frag_loc: 2.38968158
INFO:tensorflow:global_step/sec: 2.22965
step: 700 total_loss: nan obj_cls: nan frag_cls: nan frag_loc: nan
INFO:tensorflow:global_step/sec: 2.23278
step: 800 total_loss: nan obj_cls: nan frag_cls: nan frag_loc: nan
INFO:tensorflow:global_step/sec: 2.22892
step: 900 total_loss: nan obj_cls: nan frag_cls: nan frag_loc: 2.19273663
INFO:tensorflow:global_step/sec: 2.22798
step: 1000 total_loss: nan obj_cls: nan frag_cls: nan frag_loc: nan
INFO:tensorflow:global_step/sec: 2.22868
step: 1100 total_loss: nan obj_cls: nan frag_cls: nan frag_loc: nan
INFO:tensorflow:global_step/sec: 2.22729
so i think error generated for it
Caused by op 'logits/pred_frag_conf/weights_1', defined at:
File "train.py", line 559, in
tf.app.run()
File "/home/default/anaconda3/envs/epos/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "train.py", line 485, in main
freeze_regex_list=FLAGS.freeze_regex_list)
File "train.py", line 355, in _train_epos_model
reuse_variable=(i != 0))
File "train.py", line 267, in _tower_loss
outputs_to_num_channels)
File "train.py", line 239, in _build_epos_model
tf.summary.histogram(model_var.op.name, model_var)
File "/home/default/anaconda3/envs/epos/lib/python3.6/site-packages/tensorflow/python/summary/summary.py", line 187, in histogram
tag=tag, values=values, name=scope)
File "/home/default/anaconda3/envs/epos/lib/python3.6/site-packages/tensorflow/python/ops/gen_logging_ops.py", line 284, in histogram_summary
"HistogramSummary", tag=tag, values=values, name=name)
File "/home/default/anaconda3/envs/epos/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/default/anaconda3/envs/epos/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/default/anaconda3/envs/epos/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/home/default/anaconda3/envs/epos/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): Nan in summary histogram for: logits/pred_frag_conf/weights_1
[[node logits/pred_frag_conf/weights_1 (defined at train.py:239) = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](logits/pred_frag_conf/weights_1/tag, logits/pred_frag_conf/weights/read/_9035)]]
[[{{node xception_65/middle_flow/block1/unit_3/xception_module/separable_conv2_depthwise/BatchNorm/moving_mean/read/_9950}} = _SendT=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1856_..._mean/read", _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
What can i do for training?
Hi, thodan. Firstly, thanks for your sharing. However, when I compiled the OSMesa, I found the link "ftp://ftp.freedesktop.org/pub/mesa/older-versions/17.x/mesa-${mesaversion}.tar.gz" did't work. The file I downloaded always unzipped unsuccessfully. I guess the downloaded file is defective because it just 324 bytes. Could you share a new link or the file?
Hello, you have achieved good experimental results in your paper compared to other advanced methods, but the evaluation indicators used in your paper are not reported in other papers, how do you get them? The results of other papers, that is, the AR results of other papers in your paper, how did you get them, thank you for your reply!!!
I am trying to use the repository to do pose estimation on my own dataset.
First of all, everything seems to work fine in check_train_input.py, train.py, eval.py, and infer.py with the following parameters in params.yml:
#Dataset.
dataset: "sphere"
#Model.
model_variant: "xception_65"
atrous_rates: [12, 24, 36]
encoder_output_stride: 8
decoder_output_stride: [4]
upsample_logits: false
frag_seg_agnostic: false
frag_loc_agnostic: false
num_frags: 64
#Establishing correspondences.
corr_min_obj_conf: 0.1
corr_min_frag_rel_conf: 0.5
corr_project_to_model: false
#Training.
train_tfrecord_names: ["sphere_train-blender"]
train_max_height_before_crop: 128
train_crop_size: "128,128"
optimizer: "AdamOptimizer"
save_interval_steps: 10000
initialize_last_layer: false
fine_tune_batch_norm: false
train_steps: 4500000
train_batch_size: 4
base_learning_rate: 0.0001
obj_cls_loss_weight: 1.0
frag_cls_loss_weight: 1.0
frag_loc_loss_weight: 100.0
train_knn_frags: 1
data_augmentations:
random_adjust_brightness:
min_delta: -0.15
max_delta: 0.15
random_adjust_contrast:
min_delta: 0.85
max_delta: 1.15
random_adjust_saturation:
min_delta: 0.85
max_delta: 1.15
random_adjust_hue:
max_delta: 1.0
random_blur:
max_sigma: 1.5
random_gaussian_noise:
max_sigma: 0.03
jpeg_artifacts:
min_quality: 85
However, when I enable the upsample_logits
flag, I get the following error:
Traceback (most recent call last):
File "/home/user/miniconda/envs/eposaidev/lib/python3.8/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 928, in merge_with
self.assert_same_rank(other)
File "/home/user/miniconda/envs/eposaidev/lib/python3.8/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 982, in assert_same_rank
raise ValueError("Shapes %s and %s must have the same rank" %
ValueError: Shapes (?, 128, 128) and (?, ?, ?, ?) must have the same rank
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/miniconda/envs/eposaidev/lib/python3.8/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 1013, in with_rank
return self.merge_with(unknown_shape(rank=rank))
File "/home/user/miniconda/envs/eposaidev/lib/python3.8/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 934, in merge_with
raise ValueError("Shapes %s and %s are not compatible" % (self, other))
ValueError: Shapes (?, 128, 128) and (?, ?, ?, ?) are not compatible
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 584, in <module>
tf.app.run()
File "/home/user/miniconda/envs/eposaidev/lib/python3.8/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/user/miniconda/envs/eposaidev/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/user/miniconda/envs/eposaidev/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "train.py", line 505, in main
train_tensor, summary_op = _train_epos_model(
File "train.py", line 374, in _train_epos_model
loss = _tower_loss(
File "train.py", line 285, in _tower_loss
_build_epos_model(
File "train.py", line 202, in _build_epos_model
loss.add_obj_cls_loss(
File "/home/user/phd/epos/epos_lib/loss.py", line 131, in add_obj_cls_loss
targets_shape = misc.resolve_shape(targets, 4)[1:3]
File "/home/user/phd/epos/epos_lib/misc.py", line 44, in resolve_shape
shape = tensor.get_shape().with_rank(rank).as_list()
File "/home/user/miniconda/envs/eposaidev/lib/python3.8/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 1015, in with_rank
raise ValueError("Shape %s must have rank %d" % (self, rank))
ValueError: Shape (?, 128, 128) must have rank 4
I tried multiple sources of data, including the tfrecord file ycbv_test_targets-bop19.tfrecord
provided by the authors, so I at least have some confidence that the data is not the source of the issue. However, I am not an in-depth expert on this repository and have not yet traced the entire path of the data through the code up until this point of failure.
Any clues or insights as to what the shapes should be like at this point of failure? Appreciate the help.
Are depth images really necessary for training? I am quite confused because in your paper you say you only use RGB images, but when I try to train my own dataset, it requires depth images (when running calc_gt_info).
Hello, thodan.
First of all, thanks for your sharing.
I'm trying to training model with 'python train.py --model='ycbv_custom'', but i got nan values in losses. Like:
step: 0 total_loss: 9.91135 obj_cls: 3.09097505 frag_cls: 4.15891361 frag_loc: 2.40971112
step: 100 total_loss: nan obj_cls: nan frag_cls: nan frag_loc: nan
INFO:tensorflow:global_step/sec: 1.57567
step: 200 total_loss: nan obj_cls: nan frag_cls: nan frag_loc: nan
INFO:tensorflow:global_step/sec: 1.66842
And i check the input data with visualization, it works. I trying to fix for days, But I don't where the error is...
Can you give me some advice for that?
Hi, thodan. Firstly, thanks for your sharing and you readme file is very detailed. But after I configured the environment successfully and run the infer.py file with your provided pre-trained models on tless and ycbv dataset, the visualization result is bad on nearly all test images. So look forward for help.
can you tell me how to get the resnet backbone pre-trained models, I only found the model for xception-65.
Looking forward to your reply!
Hello, first of all, thank you for your sharing. I found that all the links in the project were invalid. Could you please tell me how to get it now?
Also, I found that Osmesa and Progression-X were not compiled successfully. Will this affect my training?
I've now got my own dataset in BOP format.
Looking forward to your reply!
thank you!
Hi~
Thanks for your wonderful work.
After reading you paper, I have a question that can be hardly found in your paper.
Your paper says,'if the ground-truth one-hot distribution ¯bi(u) indicates a different fragment at pixels with similar appearance, the network is expected to learn at such pixels the same probability bij (u) for all the indicated fragments.'
Your paper says 'Vectors a¯(u),¯bi(u), and r¯ij (u) are obtained by rendering the 3D object models in the ground-truth poses with a custom OpenGL shader.' But I can not find how to get the ground-truth of ¯bi(u) especially for partial symmetries. Could you please give me some suggestions?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.