Hi, I have ran to this error while I was running (<code class="notra

Maybe this is the main cause instead of runtime warning from <code class="notranslate"

Right, thank you and updated with by 4 <div class="snippet-clipboard-content notra

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

RuntimeWarning: invalid value encountered in sqrt about upsnet HOT 10 CLOSED

uber-research commented on May 28, 2024

RuntimeWarning: invalid value encountered in sqrt

from upsnet.

Comments (10)

insop commented on May 28, 2024

Maybe this is the main cause instead of runtime warning from sqrt

 ValueError: Expected input batch_size (510) to match target batch_size (512).

from upsnet.

JoyHuYY1412 commented on May 28, 2024

Maybe this is the main cause instead of runtime warning from sqrt
 ValueError: Expected input batch_size (510) to match target batch_size (512).

I also met this question. But it wouldn't happen when I use multi-gpu.

from upsnet.

YuwenXiong commented on May 28, 2024

When you change # gpu from 4 to 1 you also need to reduce lr by 4x and increase # iter by 4x.

from upsnet.

insop commented on May 28, 2024

With the below change, I was able to run after my post.

And you confirmed it, thank you for the quick update.

--- upsnet/experiments/upsnet_resnet50_coco_4gpu.yaml	2019-05-11 15:21:57.000000000 -0700
+++ upsnet/experiments/upsnet_resnet50_coco_1gpu.yaml	2019-05-12 00:37:26.000000000 -0700
@@ -2,7 +2,7 @@
 output_path: "./output/upsnet/coco"
 model_prefix: "upsnet_resnet_50_coco_"
 symbol: resnet_50_upsnet
-gpus: '0,1,2,3'
+gpus: '0'
 dataset:
   num_classes: 81
   num_seg_classes: 133
@@ -32,12 +32,12 @@
   snapshot_step: 2000
   resume: false
   begin_iteration: 0
-  max_iteration: 360000
+  max_iteration: 720000
   decay_iteration:
-  - 240000
-  - 320000
+  - 480000
+  - 640000
   warmup_iteration: 1500
-  lr: 0.005
+  lr: 0.0025
   wd: 0.0001
   momentum: 0.9
   batch_size: 1
@@ -54,7 +54,7 @@
   - 800
   max_size: 1333
   batch_size: 1
-  test_iteration: 360000
+  test_iteration: 720000
   panoptic_stuff_area_limit: 4096
   vis_mask: false

from upsnet.

YuwenXiong commented on May 28, 2024

changing #iter/lr by 2x may not match the result I reported as you changed batch size from 4 to 1, and they (batch size/lr/#iter) should be matched.

from upsnet.

insop commented on May 28, 2024

Right, thank you and updated with by 4

**--- upsnet/experiments/upsnet_resnet50_coco_4gpu.yaml	2019-05-11 15:21:57.000000000 -0700
+++ upsnet/experiments/upsnet_resnet50_coco_1gpu.yaml	2019-05-12 05:58:38.000000000 -0700
@@ -2,7 +2,7 @@
 output_path: "./output/upsnet/coco"
 model_prefix: "upsnet_resnet_50_coco_"
 symbol: resnet_50_upsnet
-gpus: '0,1,2,3'
+gpus: '0'
 dataset:
   num_classes: 81
   num_seg_classes: 133
@@ -32,12 +32,12 @@
   snapshot_step: 2000
   resume: false
   begin_iteration: 0
-  max_iteration: 360000
+  max_iteration: 1440000
   decay_iteration:
-  - 240000
-  - 320000
+  - 960000
+  - 1280000
   warmup_iteration: 1500
-  lr: 0.005
+  lr: 0.00125
   wd: 0.0001
   momentum: 0.9
   batch_size: 1
@@ -54,7 +54,7 @@
   - 800
   max_size: 1333
   batch_size: 1
-  test_iteration: 360000
+  test_iteration: 1440000
   panoptic_stuff_area_limit: 4096
   vis_mask: false**

from upsnet.

andyhahaha commented on May 28, 2024

Hi I encounter a similar error.
I change the backbone to PeleeNet and train with 4 gpu.
But feat_id will have some elements are nan.

UPSNet/upsnet/operators/modules/fpn_roi_align.py

Line 38 in 3218581

feat_id = np.clip(np.floor(2 + np.log2(np.sqrt(w * h) / 224 + 1e-6)), 0, 3)

It is because the propose rois has x1>x2 or y1>y2 which cause the w<0 or h<0.
np.log2(negative number ) cause nan.

I have tried smaller learning rate. 0.0025 or 0.00125. But it still happen.
Do anyone know how to solve this problem?
Thanks!

from upsnet.

weixianghong commented on May 28, 2024

Hi I encounter a similar error.
I change the backbone to PeleeNet and train with 4 gpu.
But feat_id will have some elements are nan.

UPSNet/upsnet/operators/modules/fpn_roi_align.py

Line 38 in 3218581

feat_id = np.clip(np.floor(2 + np.log2(np.sqrt(w * h) / 224 + 1e-6)), 0, 3)

It is because the propose rois has x1>x2 or y1>y2 which cause the w<0 or h<0.
np.log2(negative number ) cause nan.
I have tried smaller learning rate. 0.0025 or 0.00125. But it still happen.
Do anyone know how to solve this problem?
Thanks!

I also met the same issue. After I change the backbone to ResNeXT-101, RPN will produces negative width or height and causes NaN.
Have you solved it? May you guide me?

from upsnet.

YuwenXiong commented on May 28, 2024

@andyhahaha @weixianghong Please notice that we used pretrained weights converted from caffe, which are expecting different image preprocessing comparing to torchvision model. Please set use_caffe_model to false if you wanna use models with torchvision-style preprocessing

from upsnet.

weixianghong commented on May 28, 2024

@andyhahaha @weixianghong Please notice that we used pretrained weights converted from caffe, which are expecting different image preprocessing comparing to torchvision model. Please set use_caffe_model to false if you wanna use models with torchvision-style preprocessing

It works, thank you!

from upsnet.

RuntimeWarning: invalid value encountered in sqrt about upsnet HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs