Comments (18)
Check the last comments of #5. I guess your batch is too small.
from blitznet.
I set the batch size to 4 because of the limitation of GPU memory.
I agree that it is caused by data augmentation when the random crop doesn't contain an object.
I find that the function tf.image.sample_distorted_bounding_box
is used to generate distorted bounding boxes and the min_object_covered is set to [0.0, 0.1, 0.3, 0.5, 0.7, 0.9] (sample_jaccards).
Is it necessary to set the min_object_covered to 0.0? Is it works for this error if 0.0 is removed from the sample_jaccards array?
I think the object-missing crop will still exist but the probability will decrease.
for iou in params['sample_jaccards']: sample_distorted_bounding_box = tf.image.sample_distorted_bounding_box( tf.shape(image), bounding_boxes=bboxes, min_object_covered=iou, aspect_ratio_range=[0.5, 2.0], area_range=[0.3, 1.0], max_attempts=params['crop_max_tries'], use_image_if_no_bounding_boxes=True) samplers.append(sample_distorted_bounding_box[:2]) boxes.append(sample_distorted_bounding_box[2][0][0])
data_augmentation_config = { 'X_out': 4, 'brightness_prob': 0.5, 'brightness_delta': 0.125, 'contrast_prob': 0.5, 'contrast_delta': 0.5, 'hue_prob': 0.5, 'hue_delta': 0.07, 'saturation_prob': 0.5, 'saturation_delta': 0.5, 'sample_jaccards': [0.0, 0.1, 0.3, 0.5, 0.7, 0.9], 'flip_prob': 0.5, 'crop_max_tries': 50, 'zoomout_color': [x/255.0 for x in reversed(MEAN_COLOR)], }
from blitznet.
For data augmentation, we followed the strategy of SSD paper. We didn't evaluate the cases of constraining sampling in this way. You can see some evaluations in the original paper.
What you can do is to not make a forward pass in case you have no positives. This would require minimum modifications in the code.
from blitznet.
I checked the code in training.py.
if number_of_positives is zere, then number_of_negatives will become zero which may cause an error in tf.nn.top_k function.
Dose it work that a line is appened to set number_of_negatives at least one?
def detection_loss(location, confidence, refine_ph, classes_ph, pos_mask):
neg_mask = tf.logical_not(pos_mask)
number_of_positives = tf.reduce_sum(tf.to_int32(pos_mask))
number_of_negatives = tf.minimum(3 * number_of_positives,
tf.shape(pos_mask)[1] - number_of_positives)
normalizer = tf.to_float(tf.add(number_of_positives, number_of_negatives))
tf.summary.scalar('batch/size', normalizer)
num_pos_float = tf.to_float(number_of_positives)
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=confidence,
labels=classes_ph)
pos_class_loss = tf.reduce_sum(tf.boolean_mask(cross_entropy, pos_mask))
tf.summary.scalar('loss/class_pos', pos_class_loss / num_pos_float)
top_k_worst, top_k_inds = tf.nn.top_k(tf.boolean_mask(cross_entropy, neg_mask),
number_of_negatives)
neg_class_loss = tf.reduce_sum(top_k_worst)
class_loss = (neg_class_loss + pos_class_loss) / num_pos_float
tf.summary.scalar('loss/class_neg', neg_class_loss / tf.to_float(number_of_negatives))
tf.summary.scalar('loss/class', class_loss)
from blitznet.
Sorry, I didn't get you question.
from blitznet.
@Engineering-Course I commented this issue and I was waiting to get my new gpu so I could continue testing using a higher batch size. However, in the meanwhile I tried with learning rate = 0 and batch_size = 1 and this error still coming out. @dvornikita Does it means that this error will come out if the batch size is not large enough? Does the code could be change a little to skip this error as mentioned @Engineering-Course ? I understand that batch size = 1 will have this error normally but I was thinking that batch size = 4 should get at least a non error result.
from blitznet.
If the batch size is small, it is more likely that the random cropped batch doesn't contain any objects, which means there are no positive samples.
When it happens, the number_of_positives
is zero.
number_of_positives = tf.reduce_sum(tf.to_int32(pos_mask))
Then number_of_negatives becomes zero too.
number_of_negatives = tf.minimum(3 * number_of_positives,
tf.shape(pos_mask)[1] - number_of_positives)
It will thus cause an error in tf.nn.top_k
function.
top_k_worst, top_k_inds = tf.nn.top_k(tf.boolean_mask(cross_entropy, neg_mask),
number_of_negatives)
So I recommend to appened a line to set number_of_negatives at least one before calling tf.nn.top_k
function.
number_of_negatives = tf.maximum(1, number_of_negatives)
These codes are in detection_loss
function in training.py
from blitznet.
@Engineering-Course let me know when you test it, and if you overcome the error. As I mentioned, I cannot do it by myself right now because my gpu is not good and I cannot even set batch size to 2.
from blitznet.
You can try it even when batch size is 1. It works for me.
from blitznet.
@fastlater, The solution of @Engineering-Course should work fine. Just note that in this case you learn from only one negative example, normalizing the loss by one, which gives the loss of the same order of magnitude as usual but the signal is not very desirable. This might bias the training, especially if this situation comes up often. So in addition to that, I would multiply the loss by zero if this occurs.
from blitznet.
Yes. Agree with you.
from blitznet.
@dvornikita @Engineering-Course Thank you for feedback. Thus, I will have to multiply the neg_class_loss by zero? Is that loss that you are talking about? or it is the class_loss? Let me know if you will add this lines to the code. I guess it will be good, at least for testing the training.
from blitznet.
Pushed that modification. You can test it.
from blitznet.
@dvornikita @Engineering-Course I tried it just for testing with batch_size=1, and the error now is: Nan in summary histogram for: summarize_grads/ssd/confidence/ssd_back/block_rev2/weights_gradiant. This error came out when you tested the code? PD.:I only modified the training.py.
from blitznet.
@fastlater I fixed this in the last commit. Apparently, the error was caused by bbox_loss since it uses the smooth l1 loss that also breaks with no positives. Now the training doesn't break but I didn't manage to make it learn something meaningful with the batch size of 1, which is not so surprising.
from blitznet.
@dvornikita As we all expected, it wont learn something meaningful. However, it was good to fix it.
from blitznet.
@Engineering-Course where is the training.py. I meet the same error too. but i can't find this file in my folder. can you tell me where it is in detial.
from blitznet.
@clxia12 It's in the root folder of the project
from blitznet.
Related Issues (20)
- python demo.py --run_name=BlitzNet300_COCO+VOC07+12 --x4 --detect --segment --eval_min_conf=0.5 --ckpt=1 HOT 7
- files = glob(osp.join(self.folder, '*{}'.format(self.data_format))) HOT 1
- tensorflow.python.framework.errors_impl.NotFoundError: /home/cbl/PycharmProjects/blitznet-master/Datasets/voc07-trainval-segmentation; No such file or directory HOT 1
- python training.py --run_name=BlitzNet300_x4_VOC0712_detseg --dataset=voc07+12-segmentation --trunk=resnet50 --x4 --batch_size=32 --optimizer=adam --detect --segment --max_iterations=65000 --lr_decay 40000 50000 HOT 1
- without Preparation3 ,can I train ? HOT 2
- ModuleNotFoundError: No module named 'progressbar' HOT 1
- 1a
- true_number_of_negatives calculation HOT 4
- About the arguments in class feed_forward HOT 1
- Model download HOT 2
- Demo results HOT 2
- Test on MS COCO dataset based on trained checkpoint HOT 3
- Some illogical problems occurred during the detection process.
- How to train on cityscapes? HOT 2
- How to calculate each class of IoUοΌ HOT 4
- About MIoU result HOT 1
- Where is the deconvolution layers?
- Is there any other hidden settings or tricks for training HOT 4
- Pre-trained model and the interface HOT 1
- Download the pre-trained model
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from blitznet.