GithubHelp home page GithubHelp logo

Comments (24)

joe-siyuan-qiao avatar joe-siyuan-qiao commented on September 25, 2024

Hello @HarborYuan

Thanks for reaching out. May I know more details about the error you observed? For example, the tf version, the script you were using for evaluation, the config file, and if there were any changes made to the repo.

Please refer to config for learning rate, and batch size originally used in the paper. The number of GPUs will depend the devices you have access to, and will also limit the batch size for training. As long as the model fits in the memory, the number of GPUs shouldn't be affecting the results.

Hope this helps.

from deeplab2.

HarborYuan avatar HarborYuan commented on September 25, 2024

Hi @joe-siyuan-qiao ,

Thanks for your response.

My tf version is 2.6 (cuda 11.4, driver 470, NV P40 GPU).

The config I used is

deeplab2/configs/semkitti_dvps/vip_deeplab/resnet50_beta_os32.textproto

which has the batch_size = 4. Does it mean the total batch size is 4 rather than bs=4 per gpu/tpu? By the way, I modified the "initial_checkpoint"(panoptic deeplab checkpoint for train and the checkpoint dumped by vip-deeplab for eval), "experiment_name", "file_pattern"(train and val), "merge_semantic_and_instance_with_tf_op" (true) to match my enviroment.

Thanks again for your help.

The whole information printed is as followed, I guess the info did not give enough information about the error:

python deeplab2/trainer/train.py --config_file=deeplab2/configs/semkitti_dvps/vip_deeplab/resnet50_beta_os32.textproto --mode=eval --model_dir=the_loggers/ --num_gpus=1 

2021-09-24 12:31:51.770914: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero                                                                                                
2021-09-24 12:31:51.772761: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero                                                                                                
2021-09-24 12:31:51.774510: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero                                                                                                
2021-09-24 12:31:51.776269: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero                                                                                                
2021-09-24 12:31:51.788422: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero                                                                                                
2021-09-24 12:31:51.790178: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero                                                                                                
2021-09-24 12:31:51.791917: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero                                                                                                
2021-09-24 12:31:51.793655: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero                                                                                                
2021-09-24 12:31:51.795382: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero                                                                                                
2021-09-24 12:31:51.797099: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero                                                                                                
2021-09-24 12:31:51.798811: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero                                                                                                
2021-09-24 12:31:51.800526: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero                                                                                                
I0924 12:31:51.802383 140357833512768 train.py:65] Reading the config file.                                                                                                                                                                                                                                                
I0924 12:31:51.806485 140357833512768 train.py:69] Starting the experiment.                                                                                                                                                                                                                                                
2021-09-24 12:31:51.807014: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA                                                                  
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.                                                                                                                                                                                                                                
2021-09-24 12:31:52.730912: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero                                                                                                
2021-09-24 12:31:52.732469: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero                                                                                                
2021-09-24 12:31:52.737392: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero                                                                                                
2021-09-24 12:31:52.738864: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero                                                                                                
2021-09-24 12:31:52.740358: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero                                                                                                
2021-09-24 12:31:52.741794: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero                                                                                                
2021-09-24 12:31:52.743223: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero                                                                                                
2021-09-24 12:31:52.744669: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero                                                                                                
2021-09-24 12:31:52.745990: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero                                                                                                
2021-09-24 12:31:52.747336: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero                                                                                                
2021-09-24 12:31:52.748791: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero                            
2021-09-24 12:31:52.750222: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-24 12:31:54.611764: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-24 12:31:54.613360: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-24 12:31:54.614868: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-24 12:31:54.616391: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-24 12:31:54.617839: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-24 12:31:54.619291: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-24 12:31:54.620667: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-24 12:31:54.621939: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-24 12:31:54.623393: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-24 12:31:54.624845: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22151 MB memory:  -> device: 0, name: Tesla P40, pci bus id: 0000:00:0b.0, compute capability: 6.1
2021-09-24 12:31:54.625350: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-24 12:31:54.626768: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 22151 MB memory:  -> device: 1, name: Tesla P40, pci bus id: 0000:00:0c.0, compute capability: 6.1
2021-09-24 12:31:54.627146: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-24 12:31:54.628596: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 22151 MB memory:  -> device: 2, name: Tesla P40, pci bus id: 0000:00:0d.0, compute capability: 6.1
2021-09-24 12:31:54.628926: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-24 12:31:54.630366: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 22151 MB memory:  -> device: 3, name: Tesla P40, pci bus id: 0000:00:0e.0, compute capability: 6.1
I0924 12:31:54.633826 140357833512768 train_lib.py:104] Using strategy <class 'tensorflow.python.distribute.one_device_strategy.OneDeviceStrategy'> with 1 replicas
I0924 12:31:55.110026 140357833512768 vip_deeplab.py:52] Synchronized Batchnorm is used.
I0924 12:31:55.111364 140357833512768 axial_resnet_instances.py:144] Axial-ResNet final config: {'num_blocks': [3, 4, 6, 3], 'backbone_layer_multiplier': 1.0, 'width_multiplier': 1.0, 'stem_width_multiplier': 1.0, 'output_stride': 32, 'classification_mode': True, 'backbone_type': 'resnet_beta', 'use_axial_beyond_stride': 0, 'backbone_use_transformer_beyond_stride': 0, 'extra_decoder_use_transformer_beyond_stride': 32, 'backbone_decoder_num_stacks': 0, 'backbone_decoder_blocks_per_stage': 1, 'extra_decoder_num_stacks': 0, 'extra_decoder_blocks_per_stage': 1, 'max_num_mask_slots': 128, 'num_mask_slots': 128, 'memory_channels': 256, 'base_transformer_expansion': 1.0, 'global_feed_forward_network_channels': 256, 'high_resolution_output_stride': 4, 'activation': 'relu', 'block_group_config': {'attention_bottleneck_expansion': 2, 'drop_path_keep_prob': 1.0, 'drop_path_beyond_stride': 16, 'drop_path_schedule': 'constant', 'positional_encoding_type': None, 'use_global_beyond_stride': 0, 'use_sac_beyond_stride': -1, 'use_squeeze_and_excite': False, 'conv_use_recompute_grad': False, 'axial_use_recompute_grad': True, 'recompute_within_stride': 0, 'transformer_use_recompute_grad': False, 'axial_layer_config': {'query_shape': (129, 129), 'key_expansion': 1, 'value_expansion': 2, 'memory_flange': (32, 32), 'double_global_attention': False, 'num_heads': 8, 'use_query_rpe_similarity': True, 'use_key_rpe_similarity': True, 'use_content_similarity': True, 'retrieve_value_rpe': True, 'retrieve_value_content': True, 'initialization_std_for_query_key_rpe': 1.0, 'initialization_std_for_value_rpe': 1.0, 'self_attention_activation': 'softmax'}, 'dual_path_transformer_layer_config': {'num_heads': 8, 'bottleneck_expansion': 2, 'key_expansion': 1, 'value_expansion': 2, 'feed_forward_network_channels': 2048, 'use_memory_self_attention': True, 'use_pixel2memory_feedback_attention': True, 'transformer_activation': 'softmax'}}, 'bn_layer': functools.partial(<class 'keras.layers.normalization.batch_normalization.SyncBatchNormalization'>, momentum=0.9900000095367432, epsilon=0.0010000000474974513), 'conv_kernel_weight_decay': 0.0}
I0924 12:31:55.377276 140357833512768 vip_deeplab.py:80] Setting pooling size to (13, 41)
I0924 12:31:55.377476 140357833512768 aspp.py:135] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0924 12:31:55.377614 140357833512768 aspp.py:135] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0924 12:31:55.377737 140357833512768 aspp.py:135] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
2021-09-24 12:32:00.573922: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
I0924 12:32:00.581362 140357833512768 controller.py:391] restoring or initializing model...
restoring or initializing model...
I0924 12:32:00.615649 140357833512768 controller.py:395] restored model from the_loggers/vipdeeplab-kitti/ckpt-60000.
restored model from the_loggers/vipdeeplab-kitti/ckpt-60000.
I0924 12:32:00.615767 140357833512768 controller.py:217] restored from checkpoint: the_loggers/vipdeeplab-kitti/ckpt-60000
restored from checkpoint: the_loggers/vipdeeplab-kitti/ckpt-60000
I0924 12:32:00.616183 140357833512768 controller.py:277]  eval | step:  60000 | running complete evaluation...
 eval | step:  60000 | running complete evaluation...
2021-09-24 12:32:00.776425: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
I0924 12:32:02.913952 140357833512768 api.py:446] Eval with scales ListWrapper([1.0])
I0924 12:32:04.491621 140357833512768 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0924 12:32:04.535114 140357833512768 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0924 12:32:04.578203 140357833512768 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0924 12:32:04.618946 140357833512768 api.py:446] Eval scale 1.0; setting pooling size to [13, 41]
I0924 12:32:11.837154 140357833512768 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0924 12:32:11.881083 140357833512768 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0924 12:32:11.923661 140357833512768 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0924 12:32:11.976976 140357833512768 api.py:446] Eval with scales ListWrapper([1.0])
I0924 12:32:12.019443 140357833512768 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0924 12:32:12.062103 140357833512768 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0924 12:32:12.104485 140357833512768 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0924 12:32:12.144400 140357833512768 api.py:446] Eval scale 1.0; setting pooling size to [13, 41]
I0924 12:32:14.274583 140357833512768 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0924 12:32:14.317781 140357833512768 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0924 12:32:14.359796 140357833512768 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0924 12:32:18.023258 140357833512768 api.py:446] Eval with scales ListWrapper([1.0])
I0924 12:32:18.067005 140357833512768 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0924 12:32:18.109616 140357833512768 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0924 12:32:18.152309 140357833512768 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0924 12:32:18.192392 140357833512768 api.py:446] Eval scale 1.0; setting pooling size to [13, 41]
I0924 12:32:20.645295 140357833512768 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0924 12:32:20.689916 140357833512768 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0924 12:32:20.731779 140357833512768 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0924 12:32:20.784102 140357833512768 api.py:446] Eval with scales ListWrapper([1.0])
I0924 12:32:20.826084 140357833512768 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0924 12:32:20.868396 140357833512768 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0924 12:32:20.910218 140357833512768 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0924 12:32:20.949800 140357833512768 api.py:446] Eval scale 1.0; setting pooling size to [13, 41]
I0924 12:32:23.106387 140357833512768 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0924 12:32:23.149852 140357833512768 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0924 12:32:23.191850 140357833512768 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
2021-09-24 12:32:25.774627: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:801] layout failed: Invalid argument: Size of values 3 does not match size of permutation 4 @ fanin shape inViPDeepLab/PostProcessor/StatefulPartitionedCall/while/body/_166/while/SelectV2_1-1-TransposeNHWCToNCHW-LayoutOptimizer
2021-09-24 12:32:30.081096: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8204

from deeplab2.

HarborYuan avatar HarborYuan commented on September 25, 2024

@joe-siyuan-qiao

from deeplab2.

joe-siyuan-qiao avatar joe-siyuan-qiao commented on September 25, 2024

Hello @HarborYuan

Yes, the batch size is the whole batch size not the size per device.

We didn’t see similar errors on our end. According to the output, it is likely that there is something wrong with the post processor that ViP-DeepLab uses. Could you please try to set merge_semantic_and_instance_with_tf_op back to false — the default setting, and then see if the error still remains?

Thanks.

from deeplab2.

HarborYuan avatar HarborYuan commented on September 25, 2024

Thanks for your help. @joe-siyuan-qiao

However, same error still appear even if this setting is modified back to false.

from deeplab2.

lxtGH avatar lxtGH commented on September 25, 2024

Hi, I meet the same error. @joe-siyuan-qiao @aquariusjay

from deeplab2.

HarborYuan avatar HarborYuan commented on September 25, 2024

@YknZhu

from deeplab2.

joe-siyuan-qiao avatar joe-siyuan-qiao commented on September 25, 2024

Hello @HarborYuan and @lxtGH

We tried again to reproduce the error but the evaluations were fine on our end. I think we need to figure out the error step by step. Could you please first try TF2.5 + CPU evaluation on your end? If it works, then move to TF2.6 and/or GPU to see which part is not working. Hope this can help you locate the error.

Thanks!

from deeplab2.

joe-siyuan-qiao avatar joe-siyuan-qiao commented on September 25, 2024

@HarborYuan @lxtGH

Here's another tip: based on this solution, the error might be from tf.where. The solution proposed to do the transpose manually. Maybe you can try similar things for the tf.where function calls in post-processor/panoptic_deeplab.py.

Hope this helps.

from deeplab2.

HarborYuan avatar HarborYuan commented on September 25, 2024

Thanks for your help @joe-siyuan-qiao ,

We have tried tf2.4, tf2.5, tf 2.6 with gpu and it seems that they do not work. As for cpu inference, I set the num_gpus=0 following the suggestion here, and it does not work too. I will try to manually transpose tf.where.

Thanks again for your tips.

from deeplab2.

HarborYuan avatar HarborYuan commented on September 25, 2024

Thanks for your help @joe-siyuan-qiao ,

Finally, with the transpose mentioned in the solution you mentioned. I successfully evaluated the vip-deeplab. However, with the training and evaluation config, and the following checkpoint, I only got the following evaluation results. DId I do something wrong? Does the training process only need batch size 4?
results

Thanks.

from deeplab2.

lxtGH avatar lxtGH commented on September 25, 2024

Hi! We are confused that we find all the other deeplabs use batchsize 32. Why VIP-deeplab use batchsize 4? Moreover, Could you share your dvps configs for reference?

from deeplab2.

joe-siyuan-qiao avatar joe-siyuan-qiao commented on September 25, 2024

@lxtGH @HarborYuan

Good to see the evaluation works on your end.

As mentioned earlier in this thread, please refer to config for learning rate, and batch size originally used in the paper. Hope this helps.

Thanks!

from deeplab2.

lxtGH avatar lxtGH commented on September 25, 2024

@joe-siyuan-qiao Thanks for your reply. According to that config, the only changing part is batch size. I will set batch size to 32 and report the results here.

from deeplab2.

joe-siyuan-qiao avatar joe-siyuan-qiao commented on September 25, 2024

@lxtGH I think the base learning rate is also different. You will also need to change that part.

from deeplab2.

lxtGH avatar lxtGH commented on September 25, 2024

@joe-siyuan-qiao Thanks for remind. Should I lower down the learning rate. Also what about training steps ? Will it overfit on Kitti since there already have a large pretrained model?

from deeplab2.

lxtGH avatar lxtGH commented on September 25, 2024

@joe-siyuan-qiao Hi! Are you sure you can obtain the reasonable results using the master code? The results of AQ is near zero.
After I change the bs into 32. It still obtains the much weaker results.

from deeplab2.

joe-siyuan-qiao avatar joe-siyuan-qiao commented on September 25, 2024

@lxtGH Please save the predictions and use the offline evaluation code to compute DSTQ.

from deeplab2.

HarborYuan avatar HarborYuan commented on September 25, 2024

@joe-siyuan-qiao Thanks for your help.
I also want to do the evaluation offline, what should I do to save the predictions, based on the current codebase.

Currently, I use

save_raw_predictions: True
convert_raw_to_eval_ids: False

but it will not output the track id.

Thanks.

from deeplab2.

joe-siyuan-qiao avatar joe-siyuan-qiao commented on September 25, 2024

@HarborYuan The output includes temporally consistent predictions for two frames. To get the predictions for the entire sequence, please propagate the IDs using predictions on the overlapping frames.

from deeplab2.

aquariusjay avatar aquariusjay commented on September 25, 2024

Hi @HarborYuan
Would you please share your solution to fix the evaluation issue with the transposed solution?
This would be helpful for other users as well.
Thanks,

from deeplab2.

HarborYuan avatar HarborYuan commented on September 25, 2024

Hi @aquariusjay,

Thanks for your response.

We would be happy to share our solution, but actually we did not successfully get a meaningful results (low IoU / AQ / depth abs rel, bad visualization results) with the currenct codes. We are wondering whether we did something wrong.

We would let you know or raise a PR if we could successfully get a meaningful results.

Best,

from deeplab2.

lxtGH avatar lxtGH commented on September 25, 2024

@aquariusjay @joe-siyuan-qiao Dear authors, we hope you can provide the resutls of R-50 on DVPS tasks for fair comparison with existing works. Since many many people can not run you code from schools in China (pku thu whu).

from deeplab2.

aquariusjay avatar aquariusjay commented on September 25, 2024

Thanks for the request.
We are currently preparing the model zoo for ViP-DeepLab.
Please stay tuned.
We will close this issue and use the latest one #78 to keep track of the progress.

from deeplab2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.