Hello Sir. Paul, I already converted the LIDC Database, however after I run python exec.py --mode train --exp_source experiments/lidc_exp/ --exp_dir LIDC-Retina-model
, the training stuck (it shows validate) on folds 1. Note: I change num_epoch
into 50 and num_trainbatches
into 10 since I just use 10 sample dataset.
starting training epoch 50
tr. batch 1/10 (ep. 50) fw 2.251s / bw 0.743s / total 2.993s || loss: 1.03, class: 0.89, bbox: 0.14
tr. batch 2/10 (ep. 50) fw 2.532s / bw 0.744s / total 3.276s || loss: 0.89, class: 0.66, bbox: 0.23
tr. batch 3/10 (ep. 50) fw 2.392s / bw 0.742s / total 3.134s || loss: 0.74, class: 0.73, bbox: 0.01
tr. batch 4/10 (ep. 50) fw 2.535s / bw 0.517s / total 3.053s || loss: 0.47, class: 0.47, bbox: 0.00
tr. batch 5/10 (ep. 50) fw 3.106s / bw 0.744s / total 3.850s || loss: 0.78, class: 0.71, bbox: 0.08
tr. batch 6/10 (ep. 50) fw 2.920s / bw 0.742s / total 3.662s || loss: 0.52, class: 0.49, bbox: 0.03
tr. batch 7/10 (ep. 50) fw 2.220s / bw 0.747s / total 2.967s || loss: 0.67, class: 0.56, bbox: 0.11
tr. batch 8/10 (ep. 50) fw 2.164s / bw 0.758s / total 2.921s || loss: 0.57, class: 0.51, bbox: 0.06
tr. batch 9/10 (ep. 50) fw 2.333s / bw 0.750s / total 3.082s || loss: 0.80, class: 0.70, bbox: 0.10
tr. batch 10/10 (ep. 50) fw 2.390s / bw 0.760s / total 3.150s || loss: 0.70, class: 0.66, bbox: 0.03
evaluating in mode train
evaluating with match_iou: 0.1
starting validation in mode val_sampling.
evaluating in mode val_sampling
evaluating with match_iou: 0.1
non none scores: [0.00000000e+00 0.00000000e+00 0.00000000e+00 1.33691776e-04
1.12577370e-05 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
3.19541394e-06 0.00000000e+00 0.00000000e+00 0.00000000e+00
6.34394073e-05 3.46760788e-04 0.00000000e+00 6.57964466e-05
6.30265885e-06 1.83419772e-04 0.00000000e+00 0.00000000e+00
3.13401814e-05 0.00000000e+00 0.00000000e+00 8.20894272e-05
4.21034540e-06 1.00719716e-03 7.65382661e-07 1.39219383e-05
7.98896203e-04 0.00000000e+00 2.30329873e-04 2.08085640e-04
1.10898187e-06 0.00000000e+00 0.00000000e+00 1.11219310e-05
1.91517091e-04 1.70706726e-04 1.07269665e-06 0.00000000e+00
0.00000000e+00 4.47997328e-05 0.00000000e+00 1.04838946e-06
1.86664529e-03 5.89871320e-06 1.97787268e-04]
trained epoch 50: took 212.29711294174194 sec. (41.897600412368774 train / 170.39951252937317 val)
plotting predictions from validation sampling.
starting testing model of fold 0 in exp LIDC-Retina-TrainTest
feature map shapes: [[32 32 64]
[16 16 32]
[ 8 8 16]
[ 4 4 8]]
anchor scales: {'z': [[2, 2.5198420997897464, 3.1748021039363987], [4, 5.039684199579493, 6.3496042078727974], [8, 10.079368399158986, 12.699208415745595], [16, 20.15873679831797, 25.39841683149119]], 'xy': [[8, 10.079368399158986, 12.699208415745595], [16, 20.15873679831797, 25.39841683149119], [32, 40.31747359663594, 50.79683366298238], [64, 80.63494719327188, 101.59366732596476]]}
level 0: built anchors (589824, 6) / expected anchors 589824 ||| total build (589824, 6) / total expected 673920
level 1: built anchors (73728, 6) / expected anchors 73728 ||| total build (663552, 6) / total expected 673920
level 2: built anchors (9216, 6) / expected anchors 9216 ||| total build (672768, 6) / total expected 673920
level 3: built anchors (1152, 6) / expected anchors 1152 ||| total build (673920, 6) / total expected 673920
using default pytorch weight init
subset: selected 2 instances from df
data set loaded with: 2 test patients
tmp ensembling over rank_ix:0 epoch:LIDC-Retina-TrainTest/fold_0/48_best_params.pth
evaluating patient 0009a for fold 0
forwarding (patched) patient with shape: (180, 1, 128, 128, 64)
forwarding (patched) patient with shape: (180, 1, 128, 128, 64)
forwarding (patched) patient with shape: (180, 1, 128, 128, 64)
forwarding (patched) patient with shape: (180, 1, 128, 128, 64)
evaluating patient 0003a for fold 0
forwarding (patched) patient with shape: (216, 1, 128, 128, 64)
forwarding (patched) patient with shape: (216, 1, 128, 128, 64)
forwarding (patched) patient with shape: (216, 1, 128, 128, 64)
forwarding (patched) patient with shape: (216, 1, 128, 128, 64)
tmp ensembling over rank_ix:1 epoch:LIDC-Retina-TrainTest/fold_0/29_best_params.pth
evaluating patient 0009a for fold 0
forwarding (patched) patient with shape: (180, 1, 128, 128, 64)
forwarding (patched) patient with shape: (180, 1, 128, 128, 64)
forwarding (patched) patient with shape: (180, 1, 128, 128, 64)
forwarding (patched) patient with shape: (180, 1, 128, 128, 64)
evaluating patient 0003a for fold 0
forwarding (patched) patient with shape: (216, 1, 128, 128, 64)
forwarding (patched) patient with shape: (216, 1, 128, 128, 64)
forwarding (patched) patient with shape: (216, 1, 128, 128, 64)
forwarding (patched) patient with shape: (216, 1, 128, 128, 64)
tmp ensembling over rank_ix:2 epoch:LIDC-Retina-TrainTest/fold_0/32_best_params.pth
evaluating patient 0009a for fold 0
forwarding (patched) patient with shape: (180, 1, 128, 128, 64)
forwarding (patched) patient with shape: (180, 1, 128, 128, 64)
forwarding (patched) patient with shape: (180, 1, 128, 128, 64)
forwarding (patched) patient with shape: (180, 1, 128, 128, 64)
evaluating patient 0003a for fold 0
forwarding (patched) patient with shape: (216, 1, 128, 128, 64)
forwarding (patched) patient with shape: (216, 1, 128, 128, 64)
forwarding (patched) patient with shape: (216, 1, 128, 128, 64)
forwarding (patched) patient with shape: (216, 1, 128, 128, 64)
tmp ensembling over rank_ix:3 epoch:LIDC-Retina-TrainTest/fold_0/17_best_params.pth
evaluating patient 0009a for fold 0
forwarding (patched) patient with shape: (180, 1, 128, 128, 64)
forwarding (patched) patient with shape: (180, 1, 128, 128, 64)
forwarding (patched) patient with shape: (180, 1, 128, 128, 64)
forwarding (patched) patient with shape: (180, 1, 128, 128, 64)
evaluating patient 0003a for fold 0
forwarding (patched) patient with shape: (216, 1, 128, 128, 64)
forwarding (patched) patient with shape: (216, 1, 128, 128, 64)
forwarding (patched) patient with shape: (216, 1, 128, 128, 64)
forwarding (patched) patient with shape: (216, 1, 128, 128, 64)
tmp ensembling over rank_ix:4 epoch:LIDC-Retina-TrainTest/fold_0/34_best_params.pth
evaluating patient 0009a for fold 0
forwarding (patched) patient with shape: (180, 1, 128, 128, 64)
forwarding (patched) patient with shape: (180, 1, 128, 128, 64)
forwarding (patched) patient with shape: (180, 1, 128, 128, 64)
forwarding (patched) patient with shape: (180, 1, 128, 128, 64)
evaluating patient 0003a for fold 0
forwarding (patched) patient with shape: (216, 1, 128, 128, 64)
forwarding (patched) patient with shape: (216, 1, 128, 128, 64)
forwarding (patched) patient with shape: (216, 1, 128, 128, 64)
forwarding (patched) patient with shape: (216, 1, 128, 128, 64)
finished predicting test set. starting post-processing of predictions.
applying wcs to test set predictions with iou = 1e-05 and n_ens = 20.
applying 2Dto3D merging to test set predictions with iou = 0.1.
evaluating in mode test
evaluating with match_iou: 0.1
/home/ivan/.virtualenvs/virtual-py3/lib/python3.5/site-packages/numpy/core/fromnumeric.py:2920: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
/home/ivan/.virtualenvs/virtual-py3/lib/python3.5/site-packages/numpy/core/_methods.py:85: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
/home/ivan/.virtualenvs/virtual-py3/lib/python3.5/site-packages/matplotlib/axes/_base.py:3364: UserWarning: Attempting to set identical bottom==top results
in singular transformations; automatically expanding.
bottom=1.0, top=1.0
self.set_ylim(upper, lower, auto=None)
Logging to LIDC-Retina-TrainTest/fold_1/exec.log
performing training in 3D over fold 1 on experiment LIDC-Retina-TrainTest with model retina_net
performing training in 3D over fold 1 on experiment LIDC-Retina-TrainTest with model retina_net
feature map shapes: [[32 32 64]
[16 16 32]
[ 8 8 16]
[ 4 4 8]]
feature map shapes: [[32 32 64]
[16 16 32]
[ 8 8 16]
[ 4 4 8]]
anchor scales: {'z': [[2, 2.5198420997897464, 3.1748021039363987], [4, 5.039684199579493, 6.3496042078727974], [8, 10.079368399158986, 12.699208415745595], [16, 20.15873679831797, 25.39841683149119]], 'xy': [[8, 10.079368399158986, 12.699208415745595], [16, 20.15873679831797, 25.39841683149119], [32, 40.31747359663594, 50.79683366298238], [64, 80.63494719327188, 101.59366732596476]]}
anchor scales: {'z': [[2, 2.5198420997897464, 3.1748021039363987], [4, 5.039684199579493, 6.3496042078727974], [8, 10.079368399158986, 12.699208415745595], [16, 20.15873679831797, 25.39841683149119]], 'xy': [[8, 10.079368399158986, 12.699208415745595], [16, 20.15873679831797, 25.39841683149119], [32, 40.31747359663594, 50.79683366298238], [64, 80.63494719327188, 101.59366732596476]]}
level 0: built anchors (589824, 6) / expected anchors 589824 ||| total build (589824, 6) / total expected 673920
level 0: built anchors (589824, 6) / expected anchors 589824 ||| total build (589824, 6) / total expected 673920
level 1: built anchors (73728, 6) / expected anchors 73728 ||| total build (663552, 6) / total expected 673920
level 1: built anchors (73728, 6) / expected anchors 73728 ||| total build (663552, 6) / total expected 673920
level 2: built anchors (9216, 6) / expected anchors 9216 ||| total build (672768, 6) / total expected 673920
level 2: built anchors (9216, 6) / expected anchors 9216 ||| total build (672768, 6) / total expected 673920
level 3: built anchors (1152, 6) / expected anchors 1152 ||| total build (673920, 6) / total expected 673920
level 3: built anchors (1152, 6) / expected anchors 1152 ||| total build (673920, 6) / total expected 673920
using default pytorch weight init
using default pytorch weight init
loading dataset and initializing batch generators...
loading dataset and initializing batch generators...
data set loaded with: 6 train / 2 val / 2 test patients
data set loaded with: 6 train / 2 val / 2 test patients
starting training epoch 1
starting training epoch 1
tr. batch 1/10 (ep. 1) fw 1.901s / bw 0.557s / total 2.458s || loss: 0.55, class: 0.55, bbox: 0.00
tr. batch 1/10 (ep. 1) fw 1.901s / bw 0.557s / total 2.458s || loss: 0.55, class: 0.55, bbox: 0.00
tr. batch 2/10 (ep. 1) fw 2.057s / bw 0.777s / total 2.834s || loss: 0.77, class: 0.69, bbox: 0.08
tr. batch 2/10 (ep. 1) fw 2.057s / bw 0.777s / total 2.834s || loss: 0.77, class: 0.69, bbox: 0.08
tr. batch 3/10 (ep. 1) fw 1.838s / bw 0.515s / total 2.353s || loss: 0.77, class: 0.77, bbox: 0.00
tr. batch 3/10 (ep. 1) fw 1.838s / bw 0.515s / total 2.353s || loss: 0.77, class: 0.77, bbox: 0.00
tr. batch 4/10 (ep. 1) fw 1.803s / bw 0.741s / total 2.544s || loss: 0.94, class: 0.83, bbox: 0.11
tr. batch 4/10 (ep. 1) fw 1.803s / bw 0.741s / total 2.544s || loss: 0.94, class: 0.83, bbox: 0.11
tr. batch 5/10 (ep. 1) fw 1.717s / bw 0.741s / total 2.458s || loss: 0.85, class: 0.76, bbox: 0.09
tr. batch 5/10 (ep. 1) fw 1.717s / bw 0.741s / total 2.458s || loss: 0.85, class: 0.76, bbox: 0.09
tr. batch 6/10 (ep. 1) fw 1.654s / bw 0.744s / total 2.398s || loss: 1.07, class: 0.90, bbox: 0.17
tr. batch 6/10 (ep. 1) fw 1.654s / bw 0.744s / total 2.398s || loss: 1.07, class: 0.90, bbox: 0.17
tr. batch 7/10 (ep. 1) fw 2.217s / bw 0.742s / total 2.959s || loss: 0.80, class: 0.69, bbox: 0.11
tr. batch 7/10 (ep. 1) fw 2.217s / bw 0.742s / total 2.959s || loss: 0.80, class: 0.69, bbox: 0.11
tr. batch 8/10 (ep. 1) fw 1.733s / bw 0.740s / total 2.473s || loss: 0.80, class: 0.69, bbox: 0.12
tr. batch 8/10 (ep. 1) fw 1.733s / bw 0.740s / total 2.473s || loss: 0.80, class: 0.69, bbox: 0.12
tr. batch 9/10 (ep. 1) fw 1.709s / bw 0.750s / total 2.459s || loss: 1.07, class: 0.89, bbox: 0.18
tr. batch 9/10 (ep. 1) fw 1.709s / bw 0.750s / total 2.459s || loss: 1.07, class: 0.89, bbox: 0.18
tr. batch 10/10 (ep. 1) fw 2.189s / bw 0.743s / total 2.932s || loss: 1.06, class: 0.89, bbox: 0.17
tr. batch 10/10 (ep. 1) fw 2.189s / bw 0.743s / total 2.932s || loss: 1.06, class: 0.89, bbox: 0.17
evaluating in mode train
evaluating in mode train
evaluating with match_iou: 0.1
evaluating with match_iou: 0.1
starting validation in mode val_sampling.
It just stuck at starting validation for more than 4 hours. Please help me.
Thank you in advance Sir.