Comments (10)
Maybe this is the main cause instead of runtime warning from sqrt
ValueError: Expected input batch_size (510) to match target batch_size (512).
from upsnet.
Maybe this is the main cause instead of runtime warning from
sqrt
ValueError: Expected input batch_size (510) to match target batch_size (512).
I also met this question. But it wouldn't happen when I use multi-gpu.
from upsnet.
When you change # gpu from 4 to 1 you also need to reduce lr by 4x and increase # iter by 4x.
from upsnet.
With the below change, I was able to run after my post.
And you confirmed it, thank you for the quick update.
--- upsnet/experiments/upsnet_resnet50_coco_4gpu.yaml 2019-05-11 15:21:57.000000000 -0700
+++ upsnet/experiments/upsnet_resnet50_coco_1gpu.yaml 2019-05-12 00:37:26.000000000 -0700
@@ -2,7 +2,7 @@
output_path: "./output/upsnet/coco"
model_prefix: "upsnet_resnet_50_coco_"
symbol: resnet_50_upsnet
-gpus: '0,1,2,3'
+gpus: '0'
dataset:
num_classes: 81
num_seg_classes: 133
@@ -32,12 +32,12 @@
snapshot_step: 2000
resume: false
begin_iteration: 0
- max_iteration: 360000
+ max_iteration: 720000
decay_iteration:
- - 240000
- - 320000
+ - 480000
+ - 640000
warmup_iteration: 1500
- lr: 0.005
+ lr: 0.0025
wd: 0.0001
momentum: 0.9
batch_size: 1
@@ -54,7 +54,7 @@
- 800
max_size: 1333
batch_size: 1
- test_iteration: 360000
+ test_iteration: 720000
panoptic_stuff_area_limit: 4096
vis_mask: false
from upsnet.
changing #iter/lr by 2x may not match the result I reported as you changed batch size from 4 to 1, and they (batch size/lr/#iter) should be matched.
from upsnet.
Right, thank you and updated with by 4
**--- upsnet/experiments/upsnet_resnet50_coco_4gpu.yaml 2019-05-11 15:21:57.000000000 -0700
+++ upsnet/experiments/upsnet_resnet50_coco_1gpu.yaml 2019-05-12 05:58:38.000000000 -0700
@@ -2,7 +2,7 @@
output_path: "./output/upsnet/coco"
model_prefix: "upsnet_resnet_50_coco_"
symbol: resnet_50_upsnet
-gpus: '0,1,2,3'
+gpus: '0'
dataset:
num_classes: 81
num_seg_classes: 133
@@ -32,12 +32,12 @@
snapshot_step: 2000
resume: false
begin_iteration: 0
- max_iteration: 360000
+ max_iteration: 1440000
decay_iteration:
- - 240000
- - 320000
+ - 960000
+ - 1280000
warmup_iteration: 1500
- lr: 0.005
+ lr: 0.00125
wd: 0.0001
momentum: 0.9
batch_size: 1
@@ -54,7 +54,7 @@
- 800
max_size: 1333
batch_size: 1
- test_iteration: 360000
+ test_iteration: 1440000
panoptic_stuff_area_limit: 4096
vis_mask: false**
from upsnet.
Hi I encounter a similar error.
I change the backbone to PeleeNet and train with 4 gpu.
But feat_id will have some elements are nan.
It is because the propose rois has x1>x2 or y1>y2 which cause the w<0 or h<0.
np.log2(negative number ) cause nan.
I have tried smaller learning rate. 0.0025 or 0.00125. But it still happen.
Do anyone know how to solve this problem?
Thanks!
from upsnet.
Hi I encounter a similar error.
I change the backbone to PeleeNet and train with 4 gpu.
But feat_id will have some elements are nan.
It is because the propose rois has x1>x2 or y1>y2 which cause the w<0 or h<0.
np.log2(negative number ) cause nan.
I have tried smaller learning rate. 0.0025 or 0.00125. But it still happen.
Do anyone know how to solve this problem?
Thanks!
I also met the same issue. After I change the backbone to ResNeXT-101, RPN will produces negative width or height and causes NaN.
Have you solved it? May you guide me?
from upsnet.
@andyhahaha @weixianghong Please notice that we used pretrained weights converted from caffe, which are expecting different image preprocessing comparing to torchvision model. Please set use_caffe_model
to false if you wanna use models with torchvision-style preprocessing
from upsnet.
@andyhahaha @weixianghong Please notice that we used pretrained weights converted from caffe, which are expecting different image preprocessing comparing to torchvision model. Please set
use_caffe_model
to false if you wanna use models with torchvision-style preprocessing
It works, thank you!
from upsnet.
Related Issues (20)
- Why the PQ is not equal to SQ * RQ?
- Pan output exceeds number of classes
- Defining New Datasets in Cityscapes Format
- batch_size per gpu is limited to 1
- PQ calculation error HOT 2
- Segmentation fault (core dumped) HOT 1
- Can't generate Panoptic results, only Instance. HOT 3
- System and environment
- RuntimeError: DataLoader worker (pid 60087) is killed by signal: Killed. HOT 1
- Inference is slow when the classes is large.
- Getting NaN for loss/accuracy values on 4 GPU config file HOT 2
- Missing nvcc?
- AssertionError: Failed to read image
- How to infer one Image with COCO Config?
- TypeError: new(): invalid data type 'str'
- Panoptic head implementation
- RuntimeError: expected a Variable argument, but got list
- Dataset overview
- Shipments not recieved
- RuntimeError: CUDA error: no kernel image is available for execution on the device HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from upsnet.