GithubHelp home page GithubHelp logo

zhengyang-wang / deeplab-v2--resnet-101--tensorflow Goto Github PK

View Code? Open in Web Editor NEW
175.0 175.0 90.0 519 KB

An (re-)implementation of DeepLab v2 (ResNet-101) in TensorFlow for semantic image segmentation on the PASCAL VOC 2012 dataset.

License: GNU General Public License v3.0

Python 100.00%
deep-learning deeplab-resnet deeplabv2 pascal-voc semantic-segmentation tensorflow

deeplab-v2--resnet-101--tensorflow's People

Contributors

zhengyang-wang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deeplab-v2--resnet-101--tensorflow's Issues

The loss is still about 2 when it's about 4000 iterations.

Hello, thank you for your kind to public your implementation, it's a great work and very helpful for me. When I train the model on the cityscapes dataset, the loss almost can't decrease after 100 iterations and it's about 2. The mIoU can increase slowly(about 80 iterations per 0.001 increasing). I want to know if it's normal? If it's abnormal, do you have advice to me about what things might cause this?
Thanks in advance.^_^
image
image

How to train your project in multiple GPU?

Hello, I have 4 GPU 1080. Could you tell me how can I train your project in my server because I will use bigger batch size? I have tried uncommend the line to allow growth gpu in your main.py but i think we need to modify the gradient update (average them). I am using tf 1.2.1. Thanks

Is training of batchnorm is always "False"

Thank you for your great implementation. Can you please explain why "is_training" in all batch normalization layers are always "False"? This results in no updating of batch normalization parameters.

test mode cannot print block shape

This is the print info after running '--option=test'.
Did anyone encounter the same problem?

-----------build encoder: deeplab pre-trained-----------
after start block: (1, ?, ?, 64)
after block1: (1, ?, ?, 256)
after block2: (1, ?, ?, 512)
after block3: (1, ?, ?, 1024)
after block4: (1, ?, ?, 2048)
-----------build decoder-----------
after aspp block: (1, ?, ?, 2)

Loss goes to nan when uses res101

I am training your code with res101 in cityscape dataset (for deeplab pre-trained it worked well). I set up the main.py as

flags.DEFINE_float('momentum', 0.9, 'momentum')
	flags.DEFINE_string('encoder_name', 'res101', 'name of pre-trained model, res101, res50 or deeplab')
	flags.DEFINE_string('pretrain_file', './reference model/resnet_v1_101.ckpt', 'pre-trained model filename corresponding to encoder_name')
	flags.DEFINE_string('data_list', './dataset_cityscapes/train_fine.txt', 'training data list filename')
...
	flags.DEFINE_integer('input_height', 713, 'input image height')
	flags.DEFINE_integer('input_width', 713, 'input image width')
	flags.DEFINE_integer('num_classes', 19, 'number of classes')

After some iterations, the loss goes to nan. I am using python3 and tensorflow 1.3. Did you meet same problem as me? How could I fix it? Thanks

This is loss log

step 0 	 loss = 6.337, (12.625 sec/step)
step 1 	 loss = 6.133, (1.820 sec/step)
step 2 	 loss = 2.675, (1.625 sec/step)
step 3 	 loss = 6.042, (1.630 sec/step)
step 4 	 loss = 15.278, (1.545 sec/step)
step 5 	 loss = 12.153, (1.532 sec/step)
step 6 	 loss = 52.724, (1.100 sec/step)
step 7 	 loss = 940.443, (1.041 sec/step)
step 8 	 loss = 5393914151199927058379743980628738048.000, (1.052 sec/step)
step 9 	 loss = nan, (1.093 sec/step)
step 10 	 loss = nan, (1.445 sec/step)
step 11 	 loss = nan, (1.025 sec/step)

fc7 and fc8 layers?

Though there are fc7 and fc8 layers On Fig. 7 in the deeplab v2 paper, I couldn't find those in your implementation.
Is it my failure to understand the code or it just has not been implemented?

[use may own pretrain model]

If I want to use may own pretrain resnet50 model, the tensor name in my ckpt file is:
tensor_name group2/block3/conv3/bn/mean/EMA

tensor_name group3/block1/conv2/bn/variance/EMA

tensor_name group3/block0/conv1/bn/beta

tensor_name group3/block2/conv3/bn/beta/Momentum

tensor_name group3/block2/conv1/bn/gamma

tensor_name group2/block4/conv3/W/Momentum

tensor_name group0/block1/conv3/bn/gamma/Momentum

tensor_name group1/block0/conv3/W
........

and in your ckpt file, the tensor name is:
tensor_name resnet_v1_50/block3/unit_2/bottleneck_v1/conv1/BatchNorm/moving_mean

tensor_name resnet_v1_50/block4/unit_1/bottleneck_v1/conv3/BatchNorm/beta

tensor_name resnet_v1_50/block3/unit_2/bottleneck_v1/conv3/BatchNorm/gamma
tensor_name resnet_v1_50/block2/unit_1/bottleneck_v1/conv3/weights

tensor_name resnet_v1_50/block3/unit_1/bottleneck_v1/conv3/BatchNorm/moving_variance
........
I have changed the name in network.py:

	with tf.variable_scope(scope_name) as scope:
		outputs = self._start_block('conv0')
		print("after start block:", outputs.shape)
		with tf.variable_scope('group0') as scope:
			outputs = self._bottleneck_resblock(outputs, 256, 'block0',	identity_connection=False)
			outputs = self._bottleneck_resblock(outputs, 256, 'block1')
			outputs = self._bottleneck_resblock(outputs, 256, 'block2')
			print("after group0 :", outputs.shape)

..........
def _bottleneck_resblock(self, x, num_o, name, half_size=False, identity_connection=True):
first_s = 2 if half_size else 1
assert num_o % 4 == 0, 'Bottleneck number of output ERROR!'
# branch1
if not identity_connection:
o_b1 = self._conv2d(x, 1, num_o, first_s, name='%s/shortcut' % name)
o_b1 = self._batch_norm(o_b1, name='%s/shortcut' % name, is_training=False, activation_fn=None)
else:
o_b1 = x
# branch2
o_b2a = self._conv2d(x, 1, num_o / 4, first_s, name='%s/conv1' % name)
o_b2a = self._batch_norm(o_b2a, name='%s/conv1' % name, is_training=False, activation_fn=tf.nn.relu)

	o_b2b = self._conv2d(o_b2a, 3, num_o / 4, 1, name='%s/conv2' % name)
	o_b2b = self._batch_norm(o_b2b, name='%s/conv2' % name, is_training=False, activation_fn=tf.nn.relu)

	o_b2c = self._conv2d(o_b2b, 1, num_o, 1, name='%s/conv3' % name)
	o_b2c = self._batch_norm(o_b2c, name='%s/conv3' % name, is_training=False, activation_fn=None)
	# add
	outputs = self._add([o_b1,o_b2c], name='%s/add' % name)
	# relu
	outputs = self._relu(outputs, name='%s/relu' % name)
	return outputs

........

what others I should do to run this code with my own pretrain ckpt file?

Image order RGB or BRG?

Hello, I am using the official resnet-101 pre-trained model (as your link). It is trained from ImageNet, with image order is RGB and IMAGE_MEAN is

_R_MEAN = 123.68 / 255
_G_MEAN = 116.78 / 255
_B_MEAN = 103.94 / 255

The official resnet-101 pre-processing L223 is

channels = tf.split(axis=2, num_or_size_splits=num_channels, value=image)
  for i in range(num_channels):
    channels[i] -= means[i]
  return tf.concat(axis=2, values=channels)

While your code is converting RGB to BRG and used another IMAGE_MEAN. I think we should you same pre-processing as pre-trained model did such as RGB order and imagenet image mean. Am I right?

OutOfRangeError Occurred

Hi, there. I came across an out-of-range error after executing the instruction: python main.py

Here's some of my configuration:

OS: Windows 10
python: 3.5 ( not Anaconda )
Tensorflow: 1.3 + CPU

Would you please help me with this? Thanks!

===================================================================

2017-12-28 09:50:03.044001: W C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-12-28 09:50:03.044142: W C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
-----------build encoder: deeplab pre-trained-----------
after start block: (10, 81, 81, 64)
after block1: (10, 81, 81, 256)
after block2: (10, 41, 41, 512)
after block3: (10, 41, 41, 1024)
after block4: (10, 41, 41, 2048)
-----------build decoder-----------
after aspp block: (10, 41, 41, 21)
Restored model parameters from ../reference model/deeplab_resnet_init.ckpt
2017-12-28 09:50:44.847421: W C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\35\tensorflow\core\framework\op_kernel.cc:1192] Not found: NewRandomAccessFile failed to Create/Open: E:\Data\VOC_data\VOC\VOCdevkit\VOC2012/SegmentationClassAug/2008_003519.png : 系统找不到指定的路径。

Traceback (most recent call last):
File "C:\Program Files\Python 3.5\lib\site-packages\tensorflow\python\client\session.py", line 1327, in _do_call
return fn(*args)
File "C:\Program Files\Python 3.5\lib\site-packages\tensorflow\python\client\session.py", line 1306, in _run_fn
status, run_metadata)
File "C:\Program Files\Python 3.5\lib\contextlib.py", line 66, in exit
next(self.gen)
File "C:\Program Files\Python 3.5\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 10, current size 0)
[[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 82, in
tf.app.run()
File "C:\Program Files\Python 3.5\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "main.py", line 76, in main
getattr(model, args.option)()
File "C:\Users\Shane\Desktop\Deeplab-v2--ResNet-101--Tensorflow\model.py", line 60, in train
feed_dict=feed_dict)
File "C:\Program Files\Python 3.5\lib\site-packages\tensorflow\python\client\session.py", line 895, in run
run_metadata_ptr)
File "C:\Program Files\Python 3.5\lib\site-packages\tensorflow\python\client\session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Program Files\Python 3.5\lib\site-packages\tensorflow\python\client\session.py", line 1321, in _do_run
options, run_metadata)
File "C:\Program Files\Python 3.5\lib\site-packages\tensorflow\python\client\session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 10, current size 0)
[[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]

Caused by op 'create_inputs/batch', defined at:
File "main.py", line 82, in
tf.app.run()
File "C:\Program Files\Python 3.5\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "main.py", line 76, in main
getattr(model, args.option)()
File "C:\Users\Shane\Desktop\Deeplab-v2--ResNet-101--Tensorflow\model.py", line 36, in train
self.train_setup()
File "C:\Users\Shane\Desktop\Deeplab-v2--ResNet-101--Tensorflow\model.py", line 169, in train_setup
self.image_batch, self.label_batch = reader.dequeue(self.conf.batch_size)
File "C:\Users\Shane\Desktop\Deeplab-v2--ResNet-101--Tensorflow\utils\image_reader.py", line 179, in dequeue
num_elements)
File "C:\Program Files\Python 3.5\lib\site-packages\tensorflow\python\training\input.py", line 922, in batch
name=name)
File "C:\Program Files\Python 3.5\lib\site-packages\tensorflow\python\training\input.py", line 716, in _batch
dequeued = queue.dequeue_many(batch_size, name=name)
File "C:\Program Files\Python 3.5\lib\site-packages\tensorflow\python\ops\data_flow_ops.py", line 457, in dequeue_many
self._queue_ref, n=n, component_types=self._dtypes, name=name)
File "C:\Program Files\Python 3.5\lib\site-packages\tensorflow\python\ops\gen_data_flow_ops.py", line 1342, in _queue_dequeue_many_v2
timeout_ms=timeout_ms, name=name)
File "C:\Program Files\Python 3.5\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 767, in apply_op
op_def=op_def)
File "C:\Program Files\Python 3.5\lib\site-packages\tensorflow\python\framework\ops.py", line 2630, in create_op
original_op=self._default_original_op, op_def=op_def)
File "C:\Program Files\Python 3.5\lib\site-packages\tensorflow\python\framework\ops.py", line 1204, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

OutOfRangeError (see above for traceback): FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 10, current size 0)
[[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]

Another dataset

Hello,
I want to ask, if there is a possibility to use your deep lab implementation on my own dataset for semantic segmentation task.
I have a dataset with semantic segmentation labels of 12 classes, is it possible to use your model ?

How could the network train batch norm?

This is not an issue. I just want to extend the training process by train with batch norm. As you mentioned, the training BN may not good when the batch size is small. However, I am running on a powerful computer so I think it can train with the batch size of 16. After completed, these BN will be frozen, and then the network trains with small batch size and learning rate.

As you mentioned in the README

Example: If you have a batch normalization layer in the decoder, you should use
outputs = self._batch_norm(inputs, name='g_bn1', is_training=self.phase, activation_fn=tf.nn.relu, trainable=True)

To train with BN, I will set is_training flag to True and trainable=True, in the network.py. Is that all? Do I need change something in the model.py in the lines

 restore_var = [v for v in tf.global_variables() if 'fc' not in v.name]
# Trainable Variables
all_trainable = tf.trainable_variables()
# Fine-tune part
encoder_trainable = [v for v in all_trainable if 'fc' not in v.name] # lr * 1.0
# Decoder part
decoder_trainable = [v for v in all_trainable if 'fc' in v.name]
		....
decoder_w_trainable = [v for v in decoder_trainable if 'weights' in v.name or 'gamma' in v.name] # lr * 10 

decoder_b_trainable = [v for v in decoder_trainable if 'biases' in v.name or 'beta' in v.name] # lr * 20.0

This is my completed code for train BN in the decoder

o=self._conv2d_bn(x, 1, 256, 1, name='fc1', biased=True)
o_bn = self._batch_norm(o, name='fc1_bn', is_training=self.phase, trainable=self.phase,activation_fn=tf.nn.relu)

Thanks so much

compute_IoU_per_class

when i use compute_IoU_per_class function, there is such problem:

print('class %d: %.3f'%(i,IoU))
TypeError: a float is required
how can i fix it?
thanks a lot

tf.flags

Hi zhengyang, I want to know why you set flags.FLAGS.dict['__parsed'] = False in configure( )? What does it mean?
Thank you!

Redundant condition in model.py

I have check your code and your note

'is_training' argument is removed and 'self._batch_norm' changes. Basically, for a small batch size, it is better to keep the statistics of the BN layers (running means and variances) frozen, and to not update the values provided by the pre-trained model by setting 'is_training=False'. Note that is_training=False still updates BN parameters gamma (scale) and beta (offset) if they are presented in var_list of the optimiser definition. Set 'trainable=False' in BN fuctions to remove them from trainable_variables

It means that when we frozen BN, we will keep its parameters fixed (including mean, variance, beta, gamma). in your bn layer in network.py, you have set trainable=False, hence gamma and beta will not appear in list. So, I think you have a redundant condition in the lines of model.py

decoder_w_trainable = [v for v in decoder_trainable if 'weights' in v.name or 'gamma' in v.name] # lr * 10.0
decoder_b_trainable = [v for v in decoder_trainable if 'biases' in v.name or 'beta' in v.name] # lr * 20.0
		

Am I right? Or for a good condition, it must be 'gamma' not in v name, instead of 'gamma' in v.name

Tensor name "bn4b17_branch2c/moving_mean" not found in checkpoint files

I used the pretrained Resnet-101 provided from the tensorflow as you told in 11/09/2017 update
(http://download.tensorflow.org/models/resnet_v1_101_2016_08_28.tar.gz), but an NotFoundError is raised when load the model, as showed below:

NotFoundError (see above for traceback): Tensor name "bn4b17_branch2c/moving_mean" not found in checkpoint files G:/DeepLab/reference_model/tensorflow_official/resnet_v1_101.ckpt
[[Node: save_1/RestoreV2_202 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save_1/Const_0_0, save_1/RestoreV2_202/tensor_names, save_1/RestoreV2_202/shape_and_slices)]]
[[Node: save_1/RestoreV2_242/_183 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_1226_save_1/RestoreV2_242", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

I checked the downloaded "resnet_v1_101.ckpt" files, and found the tensor name in the ckpt is like

tensor_name: resnet_v1_101/block3/unit_5/bottleneck_v1/conv2/BatchNorm/moving_variance
tensor_name: resnet_v1_101/block3/unit_14/bottleneck_v1/conv3/BatchNorm/gamma
tensor_name: resnet_v1_101/block1/unit_3/bottleneck_v1/conv1/weights
tensor_name: resnet_v1_101/block1/unit_3/bottleneck_v1/conv3/weights
tensor_name: resnet_v1_101/block3/unit_16/bottleneck_v1/conv1/BatchNorm/gamma
tensor_name: resnet_v1_101/block3/unit_7/bottleneck_v1/conv3/BatchNorm/gamma
tensor_name: resnet_v1_101/block3/unit_18/bottleneck_v1/conv2/BatchNorm/gamma
tensor_name: resnet_v1_101/block3/unit_7/bottleneck_v1/conv1/BatchNorm/beta
tensor_name: resnet_v1_101/block3/unit_18/bottleneck_v1/conv2/weights

which is exactly different from what is needed.
Is there any process needed to be done before restoring the mode? or have I used a wrong model? Help

question about CRFs

hi Zhengyang,Thanks for your code! here is one question, did you use CRFs in your code?

what should be changed if there is only two classes

Hi, I am new to the ML/CV, now I am doing image segmentation for skin lesion. I only need to separate lesion area from background (two classes)

When I configure the main.py and run train, the loss remains in 1.231 after a few steps. Then I run test, the pixel accuracy keeps 1.00 and mean IoU keeps 0.5. Did you encounter the same problem?

I found in Dr.Sleep's note that when load checkpoint for different class number (not 21), the --not-restore-last should be passed. Have you also implement this?

使用自己的数据集时,mIoU的值很低

您好,非常感谢您的代码。当我使用PASCAL VOC数据集时,mIoU和最终的预测像素值表现的比较好;但是当我用自己的数据集时,loss虽然下降了,但是mIoU的值非常低,如下图
1
我自己的数据集比较小,训练集和验证集都只有100张左右;而且图片分辨率是640X480;这些图片基本上一致,如下图
1

希望大神帮忙看看,十分感谢!

NewRandomAccessFile failed to Create/Open: E:\Datase\VOC2012 : 拒绝访问,

Hi zhengyang, I run your code in windows, and I set
flags.DEFINE_string('data_dir','E:\Dataset\VOC2012','data directory').
Then, I get the NewRandomAccessFile failed to Create/Open: E:\Datase\VOC2012 : 拒绝访问,
and OutOfRangeError: FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 10, current size 0) ,which same as @cclough.
It is very confuse me, can you can tell me where I was wrong?
Thank you!!!!

What's the result of mIOU in val/test with your code?

When I used this code, I reduced 'batch_size' and 'image_size' because my GPU memory is not enough,
batch_size = 7, image_size = 257; unfortunately, I only got 73.2 with mIOU;
additionally, I add a new model(model_msc): multi scale input for train and test, mIOU = 74.2;
they are all far from 76.35(paper's result).
Could you tell me what the reason might be?
And, how many mIOU got by your experiment?
thank you very much!

Save prediction probabilities.

Hi,
this code is great and so easy to implement, thanks!
I would like to have the probability for each class saved (with visual probability map, for each image, I would get 20 probability images (one for each class)). But just with getting a probability matrix would also be nice.
How do I have to modify the code to get this?

Thanks in advance!

The accuracy on the cityscape

Could you tell me the accuracy on the cityscape via using your code. I want to check it is my fault or just the performance of the deeplabv2. I got a very low Mean IoU on cityscape. Thank you very much. @zhengyang-wang

reference model

你好,

  我下了您的工程在自己的机器run 失败,提示找不到匹配文件

2018-03-31 12:26:55.147942: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ../reference model/deeplab_resnet_init.ckpt

请问这个问题如何解决,谢谢!

Results for VOC2012 are not correct

I Use the original images from VOC2012 for training.
training set: 1464
test set : 1449
At the same time , I use the Res101 for pre-train model which is from slim's checkpoint.
The loss for training is about 1.8
when I test the model, the IOU is very low , only 0.1 and some classes are nan,
What's wrong with my configuration ? I have modified the environment with README

Apply the code on my own data set

What code do I need to change if I want to use your code for segmentation of my gray image? Change the channel, change the size and look forward to your reply

MultiGPU

How to run your code on multi-GPU? Thank you very much.

Could you provide the demo/inference code?

This is not a bug. Just request a new feature. I hope it can be used not only me but also another people

Given an image and trained model, we will segment the image and save the prediction result to file. You can use it in your brach. It may help someone to refer it. This is the code

"""Run DeepLab-ResNet on a given image.

This script computes a segmentation mask for a given image.
"""

from __future__ import print_function

import argparse
import os
from PIL import Image
from network import *
from utils import decode_labels


IMG_MEAN = np.array((103.939, 116.779, 123.68), dtype=np.float32)

NUM_CLASSES = 19
SAVE_DIR = './output/'


def get_arguments():
    """Parse all the arguments provided from the CLI.

    Returns:
      A list of parsed arguments.
    """
    parser = argparse.ArgumentParser(description="DeepLabLFOV Network Inference.")
    parser.add_argument("--img_path", type=str,default='./input/frankfurt_000000_000294_leftImg8bit.png',
                        help="Path to the RGB image file.")
    parser.add_argument("--model_weights", type=str, default='./model_cityscape/model.ckpt-10000',
                        help="Path to the file with model weights.")
    parser.add_argument("--num-classes", type=int, default=NUM_CLASSES,
                        help="Number of classes to predict (including background).")
    parser.add_argument("--save-dir", type=str, default=SAVE_DIR,
                        help="Where to save predicted mask.")
    return parser.parse_args()


def load(saver, sess, ckpt_path):
    '''Load trained weights.

    Args:
      saver: TensorFlow saver object.
      sess: TensorFlow session.
      ckpt_path: path to checkpoint file with parameters.
    '''
    saver.restore(sess, ckpt_path)
    print("Restored model parameters from {}".format(ckpt_path))


def main():
    """Create the model and start the evaluation process."""
    args = get_arguments()

    # Prepare image.
    img = tf.image.decode_jpeg(tf.read_file(args.img_path), channels=3)
    # Convert RGB to BGR.
    img_r, img_g, img_b = tf.split(axis=2, num_or_size_splits=3, value=img)
    img = tf.cast(tf.concat(axis=2, values=[img_b, img_g, img_r]), dtype=tf.float32)
    # Extract mean.
    img -= IMG_MEAN

    # Create network. Deeplab_v2(self.image_batch, self.conf.num_classes, False)
    net = Deeplab_v2(tf.expand_dims(img, dim=0), args.num_classes, False)

    # Which variables to load.
    restore_var = tf.global_variables()

    # Predictions.
    raw_output = net.outputs
    raw_output_up = tf.image.resize_bilinear(raw_output, tf.shape(img)[0:2, ])
    raw_output_up = tf.argmax(raw_output_up, dimension=3)
    pred = tf.expand_dims(raw_output_up, dim=3)

    # Set up TF session and initialize variables.
    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    sess = tf.Session(config=config)
    init = tf.global_variables_initializer()

    sess.run(init)

    # Load weights.
    loader = tf.train.Saver(var_list=restore_var)
    load(loader, sess, args.model_weights)

    # Perform inference.
    preds = sess.run(pred)

    msk = decode_labels(preds, num_classes=args.num_classes)
    im = Image.fromarray(msk[0])
    if not os.path.exists(args.save_dir):
        os.makedirs(args.save_dir)
    im.save(args.save_dir + 'mask.png')

    print('The output file has been saved to {}'.format(args.save_dir + 'mask.png'))


if __name__ == '__main__':
    os.environ['CUDA_VISIBLE_DEVICES'] = '0'
    main()

Note that, you must change the label_colours code

label_colours = [(128, 64, 128), (244, 35, 231), (69, 69, 69)
                # 0 = road, 1 = sidewalk, 2 = building
                ,(102, 102, 156), (190, 153, 153), (153, 153, 153)
                # 3 = wall, 4 = fence, 5 = pole
                ,(250, 170, 29), (219, 219, 0), (106, 142, 35)
                # 6 = traffic light, 7 = traffic sign, 8 = vegetation
                ,(152, 250, 152), (69, 129, 180), (219, 19, 60)
                # 9 = terrain, 10 = sky, 11 = person
                ,(255, 0, 0), (0, 0, 142), (0, 0, 69)
                # 12 = rider, 13 = car, 14 = truck
                ,(0, 60, 100), (0, 79, 100), (0, 0, 230)
                # 15 = bus, 16 = train, 17 = motocycle
                ,(119, 10, 32), (1, 1, 1)]
                # 18 = bicycle, 19 = void label

Cityscape training parameters

Thanks for sharing a nice work. I have achieved the Pascal VOC as you did. Right now, I would like to evaluate the performance in cityscape. I have created the training, validation and testing file as your code. I just have a TitanX pascal 12GB. Could you share your parameter setting to perform training in cityscape dataset? Currently, this is my setting but it is not work in testing phase. I am using the
input_height as 512, input_width, as 1024? In additions, do you use both fine and coarse data for training?

IMG_MEAN = np.array((103.939, 116.779, 123.68), dtype=np.float32)

In the main.py

       # training
	flags.DEFINE_integer('num_steps', 20000, 'maximum number of iterations')
	flags.DEFINE_integer('save_interval', 1000, 'number of iterations for saving and visualization')
	flags.DEFINE_integer('random_seed', 1234, 'random seed')
	flags.DEFINE_float('weight_decay', 0.0005, 'weight decay rate')
	flags.DEFINE_float('learning_rate', 2.5e-4, 'learning rate')
	flags.DEFINE_float('power', 0.9, 'hyperparameter for poly learning rate')
	flags.DEFINE_float('momentum', 0.9, 'momentum')
	flags.DEFINE_string('encoder_name', 'deeplab', 'name of pre-trained model, res101, res50 or deeplab')
	flags.DEFINE_string('pretrain_file', './reference model/deeplab_resnet_init.ckpt', 'pre-trained model filename corresponding to encoder_name')
	flags.DEFINE_string('data_list', './dataset_cityscapes/train_fine.txt', 'training data list filename')

	# testing / validation
	flags.DEFINE_integer('valid_step', 2000, 'checkpoint number for testing/validation')
	flags.DEFINE_integer('valid_num_steps', 1449, '= number of testing/validation samples')
	flags.DEFINE_string('valid_data_list', './dataset_cityscapes/val_fine.txt', 'testing/validation data list filename')

	# data
	flags.DEFINE_string('data_dir', './cityscapes/leftImg8bit_trainvaltest', 'data directory')
	flags.DEFINE_integer('batch_size', 2, 'training batch size')
	flags.DEFINE_integer('input_height', 512, 'input image height')
	flags.DEFINE_integer('input_width', 1024, 'input image width')
	flags.DEFINE_integer('num_classes', 19, 'number of classes')
	flags.DEFINE_integer('ignore_label', 255, 'label pixel value that should be ignored')
	flags.DEFINE_boolean('random_scale', True, 'whether to perform random scaling data-augmentation')
	flags.DEFINE_boolean('random_mirror', True, 'whether to perform random left-right flipping data-augmentation')	
	```

Summary of all trained models?

Could you summarize a little on the trained models with different configurations and different databases? For example, for pascal voc 2012 validation set, pre-trained models (resnet50/resnet101/deeplab), training with/without msc, evaluate with/without msc.

(by the way, the difference between two pre-trained models, resnet101 and deeplab, is just the resnet101 is pre-trained on ImageNet, and deeplab is pre-trained on ImageNet and COCO?)

there is no directory called SegmentationClassAug,how to solve it

-----------build encoder: deeplab pre-trained-----------
after start block: (10, 81, 81, 64)
after block1: (10, 81, 81, 256)
after block2: (10, 41, 41, 512)
after block3: (10, 41, 41, 1024)
after block4: (10, 41, 41, 2048)
-----------build decoder-----------
after aspp block: (10, 41, 41, 21)
INFO:tensorflow:Restoring parameters from ../reference model/deeplab_resnet.ckpt
Restored model parameters from ../reference model/deeplab_resnet.ckpt
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.NotFoundError'>, /home/hp/VOCdevkit/VOC2012/SegmentationClassAug/2008_008300.png; No such file or directory
[[Node: create_inputs/ReadFile_1 = ReadFile_device="/job:localhost/replica:0/task:0/device:CPU:0"]]

How to print the mean IoU during training

Hello, I would like to add one thing in your training code. During training, I want to print mean IoU (besides step and loss). I add in your model.py in line 262

 #mIoU
pred_logits = tf.reshape(self.pred, [-1, ])
gt = tf.reshape(self.label_batch, [-1, ])
 # Ignoring all labels greater than or equal to n_classes.
temp = tf.less_equal(gt, self.conf.num_classes - 1)
weights = tf.cast(temp, tf.int32)
# fix for tf 1.3.0
gt = tf.where(temp, gt, tf.cast(temp, tf.uint8))        
 self.mIoU, self.mIou_update_op = tf.contrib.metrics.streaming_mean_iou(pred_logits, gt, num_classes=self.conf.num_classes, weights=weights)

Line 39

self.sess.run(tf.local_variables_initializer())

Line 53

                loss_value, images, labels, preds, summary, _,_, = self.sess.run(
                    [self.reduced_loss,
                    self.image_batch,
                    self.label_batch,
                    self.pred,
                    self.total_summary,
                    self.train_op,
                    self.mIou_update_op],
                    feed_dict=feed_dict)
m_IoU = self.mIoU.eval(session=self.sess)

And line 68

print('step {:d} \t loss = {:.3f}, ({:.3f} sec/step), Mean IoU: {:.3f}'.format(step, loss_value, duration, m_IoU))

But the result of mIoU did not change among steps. Do you know what is reason and how could I fix it? Thanks

step 0 	 loss = 4.039, (3.927 sec/step), Mean IoU: 0.004
step 1 	 loss = 3.142, (0.893 sec/step), Mean IoU: 0.004
step 2 	 loss = 2.124, (0.660 sec/step), Mean IoU: 0.004
step 3 	 loss = 1.549, (0.655 sec/step), Mean IoU: 0.004
step 4 	 loss = 2.705, (0.677 sec/step), Mean IoU: 0.004
step 5 	 loss = 1.956, (0.700 sec/step), Mean IoU: 0.004
...
step 502 	 loss = 0.769, (0.852 sec/step), Mean IoU: 0.004
step 503 	 loss = 0.525, (0.831 sec/step), Mean IoU: 0.004
step 504 	 loss = 1.202, (0.812 sec/step), Mean IoU: 0.004

TypeErro

Hello, what is the format of your data? Can you upload the data set you use. I always get the error "TypeError: Value passed to parameter 'x' has DataType uint8 not in list of allowed values: float16, float32, float64, int32, int64, complex64, complex128"

how to fine-tune from the original pretrained model?

I try to use original ImageNet pre-trained ResNet models to finetune, but I get the problem that "NotFoundError (see above for traceback): Key resnet_v1_50/block1/unit_1/bottleneck_v1/conv2/BatchNorm/beta not found in checkpoint".
Could anybody tell me how to solve this problem?

Bug for create folder

This is a bug when someone run prediction in the first time. Because the source code does not have some folder such as ./output/prediction, /output/visual_prediction, model. So it is better to provide a short script to generate these folders checkif it is not exist.

In addition, could you tell me the performance of mIOU did you achieve with multiple scale code in the PASCAL and CITYSCAPE? Thanks for your good job.

about dataset

你好,为什么我下载的数据集只有2000多张分割的图片,代码里的train.txt有10000多张,是我下载的数据集不对吗

Optimizer choice: Adam VS SGD

Dear Doctor Wang,

I noticed that DrSleep uses Adam for training, while yours and the original paper employed standard SGD. I am curious that do you have any experience on the performance difference of these two approach on this problem?

Bests,

Xiong

ResourceExhaustedError: OOM when allocating tensor with shape[5760,4,4,2048]

I successfully deployed the project on the system with Windows 10 + Tensorflow 1.3.0 (CPU only) . However, when I deployed it on the system with Ubuntu + tensorflow 1.4 (GeForce GTX 1080 Ti), I ran into following problem.

sheldon@amax:~/Projects/Deeplab-v2--ResNet-101--Tensorflow$ python3 main.py
2017-12-28 11:41:44.924418: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2017-12-28 11:41:45.623542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:8a:00.0
totalMemory: 10.91GiB freeMemory: 10.75GiB
2017-12-28 11:41:45.623612: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:8a:00.0, compute capability: 6.1)
-----------build encoder: deeplab pre-trained-----------
after start block: (10, 81, 81, 64)
after block1: (10, 81, 81, 256)
after block2: (10, 41, 41, 512)
after block3: (10, 41, 41, 1024)
after block4: (10, 41, 41, 2048)
-----------build decoder-----------
after aspp block: (10, 41, 41, 21)
Restored model parameters from /data2/deeplab_resnet_init.ckpt
2017-12-28 11:42:09.929249: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 720.00MiB.  Current allocation summary follows.
2017-12-28 11:42:09.929406: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (256):   Total Chunks: 278, Chunks in use: 216. 69.5KiB allocated for chunks. 54.0KiB in use in bin. 15.6KiB client-requested in use in bin.
2017-12-28 11:42:09.929433: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (512):   Total Chunks: 65, Chunks in use: 64. 32.5KiB allocated for chunks. 32.0KiB in use in bin. 32.0KiB client-requested in use in bin.
2017-12-28 11:42:09.929452: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (1024):  Total Chunks: 401, Chunks in use: 401. 401.2KiB allocated for chunks. 401.2KiB in use in bin. 401.0KiB client-requested in use in bin.
2017-12-28 11:42:09.929471: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (2048):  Total Chunks: 88, Chunks in use: 88. 176.0KiB allocated for chunks. 176.0KiB in use in bin. 176.0KiB client-requested in use in bin.

> (There were way too much similar outputs, so I just left out most of the lines here)

2017-12-28 11:42:09.950169: I tensorflow/core/common_runtime/bfc_allocator.cc:679] 1 Chunks of size 424673280 totalling 405.00MiB
2017-12-28 11:42:09.950182: I tensorflow/core/common_runtime/bfc_allocator.cc:679] 1 Chunks of size 663552000 totalling 632.81MiB
2017-12-28 11:42:09.950194: I tensorflow/core/common_runtime/bfc_allocator.cc:683] Sum Total of in-use chunks: 9.55GiB
2017-12-28 11:42:09.950211: I tensorflow/core/common_runtime/bfc_allocator.cc:685] Stats:
Limit:                 10968825856
InUse:                 10253331200
MaxInUse:              10265177344
NumAllocs:                    4856
MaxAllocSize:            802160640

2017-12-28 11:42:09.950341: W tensorflow/core/common_runtime/bfc_allocator.cc:277] **********************************************************************************************______
2017-12-28 11:42:09.950375: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[5760,4,4,2048]
2017-12-28 11:42:09.973697: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.00GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-12-28 11:42:09.973763: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-12-28 11:42:09.973796: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 928.77MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-12-28 11:42:09.999611: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.58GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-12-28 11:42:09.999662: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-12-28 11:42:09.999689: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 415.06MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-12-28 11:42:10.017460: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.21GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-12-28 11:42:10.017501: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.67GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
Traceback (most recent call last):
  File "/home/sheldon/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1323, in _do_call
    return fn(*args)
  File "/home/sheldon/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1302, in _run_fn
    status, run_metadata)
  File "/home/sheldon/.local/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[5760,4,4,2048]
         [[Node: fc1_voc12_c3/convolution/SpaceToBatchND = SpaceToBatchND[T=DT_FLOAT, Tblock_shape=DT_INT32, Tpaddings=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](res5c_relu, fc1_voc12_c3/convolution/SpaceToBatchND/block_shape, fc1_voc12_c3/convolution/SpaceToBatchND/paddings)]]
         [[Node: add/_1131 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_5776_add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 82, in <module>
    tf.app.run()
  File "/home/sheldon/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "main.py", line 76, in main
    getattr(model, args.option)()
  File "/home/sheldon/Projects/Deeplab-v2--ResNet-101--Tensorflow/model.py", line 60, in train
    feed_dict=feed_dict)
  File "/home/sheldon/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/home/sheldon/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/sheldon/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/home/sheldon/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[5760,4,4,2048]
         [[Node: fc1_voc12_c3/convolution/SpaceToBatchND = SpaceToBatchND[T=DT_FLOAT, Tblock_shape=DT_INT32, Tpaddings=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](res5c_relu, fc1_voc12_c3/convolution/SpaceToBatchND/block_shape, fc1_voc12_c3/convolution/SpaceToBatchND/paddings)]]
         [[Node: add/_1131 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_5776_add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'fc1_voc12_c3/convolution/SpaceToBatchND', defined at:
  File "main.py", line 82, in <module>
    tf.app.run()
  File "/home/sheldon/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "main.py", line 76, in main
    getattr(model, args.option)()
  File "/home/sheldon/Projects/Deeplab-v2--ResNet-101--Tensorflow/model.py", line 36, in train
    self.train_setup()
  File "/home/sheldon/Projects/Deeplab-v2--ResNet-101--Tensorflow/model.py", line 177, in train_setup
    net = Deeplab_v2(self.image_batch, self.conf.num_classes, True)
  File "/home/sheldon/Projects/Deeplab-v2--ResNet-101--Tensorflow/network.py", line 34, in __init__
    self.build_network()
  File "/home/sheldon/Projects/Deeplab-v2--ResNet-101--Tensorflow/network.py", line 38, in build_network
    self.outputs = self.build_decoder(self.encoding)
  File "/home/sheldon/Projects/Deeplab-v2--ResNet-101--Tensorflow/network.py", line 64, in build_decoder
    outputs = self._ASPP(encoding, self.num_classes, [6, 12, 18, 24])
  File "/home/sheldon/Projects/Deeplab-v2--ResNet-101--Tensorflow/network.py", line 125, in _ASPP
    o.append(self._dilated_conv2d(x, 3, num_o, d, name='fc1_voc12_c%d' % i, biased=True))
  File "/home/sheldon/Projects/Deeplab-v2--ResNet-101--Tensorflow/network.py", line 150, in _dilated_conv2d
    o = tf.nn.atrous_conv2d(x, w, dilation_factor, padding='SAME')
  File "/home/sheldon/.local/lib/python3.5/site-packages/tensorflow/python/ops/nn_ops.py", line 1137, in atrous_conv2d
    name=name)
  File "/home/sheldon/.local/lib/python3.5/site-packages/tensorflow/python/ops/nn_ops.py", line 751, in convolution
    return op(input, filter)
  File "/home/sheldon/.local/lib/python3.5/site-packages/tensorflow/python/ops/nn_ops.py", line 835, in __call__
    return self.conv_op(inp, filter)
  File "/home/sheldon/.local/lib/python3.5/site-packages/tensorflow/python/ops/nn_ops.py", line 499, in __call__
    return self.call(inp, filter)
  File "/home/sheldon/.local/lib/python3.5/site-packages/tensorflow/python/ops/nn_ops.py", line 490, in _with_space_to_batch_call
    paddings=paddings)
  File "/home/sheldon/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 4922, in space_to_batch_nd
    paddings=paddings, name=name)
  File "/home/sheldon/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/sheldon/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/home/sheldon/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[5760,4,4,2048]
         [[Node: fc1_voc12_c3/convolution/SpaceToBatchND = SpaceToBatchND[T=DT_FLOAT, Tblock_shape=DT_INT32, Tpaddings=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](res5c_relu, fc1_voc12_c3/convolution/SpaceToBatchND/block_shape, fc1_voc12_c3/convolution/SpaceToBatchND/paddings)]]
         [[Node: add/_1131 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_5776_add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

It seems that the system ran out of resources. What shall I do to fix the problem?

[use may own pretrain model]

In my ckpt file the tensor name is :
group3/block1/conv2/bn/mean
group3/block1/conv2/bn/variance
......
after my change, when I run the code,there will be error like:
Not found: key group3/block1/conv2/bn/moving_mean not found in checkpoint
Not found: key group3/block1/conv2/bn/moving_variance not found in checkpoint
......
where I need to change to make the code read the mean and variance instead of moving_mean and moving_variance?

some class Iou is nan

2018-04-12 18-12-07
2018-04-12 19-07-06

I downloaded the aug dataset and converted the mat format label to a png format .Then I trained with your train.txt file. I adjust the picture input_height and input_width to 270*270 because of GPU(1080), after 20000 training loss is 1.18. I test the model with Test.txt but found Iou is very low and some class value is nan. And every predict result only have two classes (tow colors) and some is wrong result.How can I solve it ? Thank you very much.
.

Getting error 'FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 10, current size 0)'

When I run main.py I get:

OutOfRangeError (see above for traceback): FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 10, current size 0) [[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]

I've downloaded VOC and have the path in main.py set to:
flags.DEFINE_string('data_dir', './VOCdevkit/VOC2012', 'data directory')

I've also set the following in main.py:
flags.DEFINE_string('encoder_name', 'res101', 'name of pre-trained model, res101, res50 or deeplab') flags.DEFINE_string('pretrain_file', './resnet_v1_101.ckpt', 'pre-trained model filename corresponding to encoder_name')

Apologies if I'm doing something obvious wrong.

Problem with pre-trained model

Hey, I want to use resnet v2 to train deeplab v2. Now , I have doubt about the pre-trained model.
From your code , you mentioned deeplab_resnet.ckpt and resnet101.ckpt, Can I just use the resnet101.ckpt as the initialize model to train deeplab v2 . If it works, then I can use the resnet v2 to train deeplab_v2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.