sercant / mobile-segmentation Goto Github PK

Real-time semantic image segmentation on mobile devices

License: Apache License 2.0

Python 98.19% Shell 1.81%

mobilenetv2 semantic-segmentation image-processing deeplab-v3-plus mscoco-dataset shufflenet-v2 android real-time tensorflow-lite semantic-image-segmentation neural-network

mobile-segmentation's Introduction

An efficient solution for semantic segmentation: ShuffleNet V2 with atrous separable convolutions

We present a computationally efficient approach to semantic segmentation, while achieving a high mean intersection over union (mIOU), 70.33% on Cityscapes challenge. The network proposed is capable of running real-time on mobile devices.

Paper: 10.1007/978-3-030-20205-7_4

If you find the code useful for your research, please consider citing us:

@InProceedings{turkmen2019efficient,
  author    = {Sercan T{\"u}rkmen and Janne Heikkil{\"a}},
  title     = {An Efficient Solution for Semantic Segmentation: {ShuffleNet} V2 with Atrous Separable Convolutions},
  booktitle = {Image Analysis},
  year      = {2019},
  editor    = {Michael Felsberg and Per-Erik Forss{\'e}n and Ida-Maria Sintorn and Jonas Unger},
  volume    = {11482},
  pages     = {41--53},
  address   = {Cham},
  publisher = {Springer International Publishing},
  doi       = {10.1007/978-3-030-20205-7_4},
  isbn      = {978-3-030-20205-7},
  url       = {http://dx.doi.org/10.1007/978-3-030-20205-7_4},
}

Getting ready

Add tensorflow/models/slim to your python path in order to run most of the scripts! To do so follow these steps:
1. Clone or download the tensorflow/models/slim repository to a separate folder.
2. Add the path to the repository by running the following code: export PYTHONPATH=path_to_the_cloned_folder/tensorflow_models/research/slim:${PYTHONPATH}
Prepare dataset. Example scripts and code is available under the dataset folder. The dataset should be in tfrecord format.

Model zoo

Checkpoint name	Trained on	Uses DPC	Eval OS	Eval scales	Left-right Flip	mIOU	File Size
shufflenetv2_basic_cityscapes_67_7	MS COCO 2017* + Cityscapes coarse + Cityscapes fine	No	16	[1.0]	No	67.7% (val)	4.9MB
shufflenetv2_dpc_cityscapes_71_3	MS COCO 2017* + Cityscapes coarse + Cityscapes fine	Yes	16	[1.0]	No	71.3% (val)	6.3MB

* Filtered to include only person, car, truck, bus, train, motorcycle, bicycle, stop sign, parking meter classes and samples that contain over 1000 annotated pixels.

Training

To learn more about the available flags you can check common.py and the specific script that you are trying to run (e.g. train.py).

Example training configuration

python train.py \
    --model_variant=shufflenet_v2 \
    --tf_initial_checkpoint=./checkpoints/model.ckpt \
    --training_number_of_steps=120000 \
    --base_learning_rate=0.001 \
    --fine_tune_batch_norm=True \
    --initialize_last_layer=False \
    --output_stride=16 \
    --train_crop_size=769 \
    --train_crop_size=769 \
    --train_batch_size=16 \
    --dataset=cityscapes \
    --train_split=train \
    --dataset_dir=./dataset/cityscapes/tfrecord \
    --train_logdir=./logs \
    --loss_function=sce

Important: To use DPC architecture in your model, you should also set this parameter:

--dense_prediction_cell_json=./core/dense_prediction_cell_branch5_top1_cityscapes.json

Example evaluation configuration

python evaluate.py \
    --model_variant=shufflenet_v2 \
    --eval_crop_size=1025 \
    --eval_crop_size=2049 \
    --output_stride=16 \
    --eval_logdir=./logs/eval \
    --checkpoint_dir=./logs \
    --dataset=cityscapes \
    --dataset_dir=./dataset/cityscapes/tfrecord

Important: If you are trying to evaluate a checkpoint that uses DPC architecture, you should also set this parameter:

--dense_prediction_cell_json=./core/dense_prediction_cell_branch5_top1_cityscapes.json

Exporting to TFLITE model

export_tflite.py script contains several parameters at the top of the script.

Running on Android

You can find an example script to run the this model and Tensorflow Lite interpreter for segmentation on Android in this repository and also an example application in sercant/android-segmentation-app.

mobile-segmentation's People

Contributors

Stargazers

Watchers

mobile-segmentation's Issues

Error with checkpoint on evaluate.py and visualize.py

Hey, @sercant!

In the case of running the file evaluate.py or visualize.py with your weights (shufflenetv2_dpc_cityscapes_71_3), the following error occurs:

For visualize.py:
NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key aspp0/BatchNorm/beta not found in checkpoint
[[node save/RestoreV2 (defined at visualize.py:276) ]]
[[node save/RestoreV2 (defined at visualize.py:276) ]]

For evaluate.py:
NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key aspp0/BatchNorm/beta not found in checkpoint
[[node save/RestoreV2 (defined at evaluate.py:162) ]]

trained on my own dataset with one class, why loss all increase？

Errors with visualization

First of all I thank you for good job.

When the file is started visualize.py the following error occurs:

File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape mismatch in tuple component 1. Expected [769,769,3], got [1024,2048,3]
[[{{node batch/padding_fifo_queue_enqueue}}]]

I tried to resize, but it did not help. When I change the size to [1024,2048] the following error appears.
However, the first image is successfully processed.

Traceback (most recent call last):
File "visualize.py", line 335, in
tf.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "visualize.py", line 320, in main
train_id_to_eval_id=train_id_to_eval_id)
File "visualize.py", line 183, in _process_batch
colormap_type=FLAGS.colormap_type)
File "/code/ShuffleNetV2/utils/save_annotation.py", line 33, in save_annotation
label, colormap_type)
File "/code/ShuffleNetV2/utils/get_dataset_colormap.py", line 389, in label_to_color_image
raise ValueError('label value too large.')
ValueError: label value too large.

COCO2017

Hi ,author.
I really want to get the images like the "the folder of getfine in cityscapes data set".such as
'cityscapes/gtFine/train/aachen/aachen_000000_000019_gtFine_labelIds.png' and corresponding to the image such as 'cityscapes/leftImg8bit/train/aachen/aachen_000000_000019_leftImg8bit.png'.
I can make txt files to input data.

Training my own data

Hi Author!

I'm trying to train using my own data.
I made tfdata and tried to excute train.py.
But I can't define "tf_initial_checkpoint".
Could you please help me out?

Some things about MS COCO 2017

Hi author，I see that you have used COCO 2017 data set for network pre-training processing.
I implemente a semantic segmentation network and download the COCO 2017 data set. I want to know how to get the label images(grey images) from the annotations'json.

Pre trained weights

Folks awesome job with being able to do real time segmentation with shufflenet 2.

Do you mind sharing the pretrained weights

The model architecture may different from paper.

Hi, @sercant !

Thank you for sharing this fantastic project!

I found that in core/shufflenet_v2.py line 69 "layer_stride = 1" will cause the parameter "rate" as 1 instead of 2 in shufflenet V2 stage 4.

Could you check this small issue?

Thanks

Run code on the mobile

Hi! I‘m really interested in your paper. Do you plan to implement the code in pytorch? How to run the code on the mobile platform? Thanks!

some thing about the DPC module

@sercant
the DPC module, the input is the ShuffleNet V2 stage4 (that means the input channel is 464) and here, the five branches conv's kernel out_channel is 256(as reffered in the paper "Searching for Efficient Multi-Scale Architectures for Dense Image Prediction") of DPC.
While, as for depthwise_conv2d, the out_channel of convs result is in_channel(inputs )* 256(kernel out_channel). So after 5 braches, we use tf.concat, the final result's out_channel is a very big number ?

Three questions about DPC module

Hi,author.Your proposed network is very interesting and effective！
The first questions, the DPC encoder head have 5 franches.
Every franches have 256 channels, then we can get 1280 channels by concat, but in Table 1,
the DPC module's Output Channels are 512. Why?
The second, the rates in DPC is interesting, why are 1X6, 18X15, 6X3, 6X21?I see the deeplabv3+'s rate are 6, 12 ,18.
The last, the each franch in DPC troubles me.I can't understand the reasons.

I'm sorry for asking so many questions, I have an additional question. Why you preprocess the input images by standardizing each pixel to [-1, 1] rather than [0, 1] or not do this operation?

How to test my pictures with the model you trained?

Hello, author!
At present, I want to test my own pictures through the model you trained. There is no label for my pictures. Can I test this? If so, how can I do it?

the tflite file generated by export_tflite.py did not work correctly

Dear Sir,
Thank you for sharing your code, It do help me a lot.
The training and testing part of the model work very well.
But when I start to generate a pb and tflite file, problem comes.
Right now, I use tensorflow with a version of 2.0 beta. Just modify the directory of logs in order to execute the conversion script. As I look deeper into the code, the input node is "input_0" , the output node is "Cast" , which convert the data from int64 to int32. I also notices that, the outputs_to_scales_to_logits is generated by model.predict_labels, which include a operator "ArgMax", it also supported by the latest version of tflite. But the conversion progress did not report any errors, so I test the generated tflie file by the scripy below:

import numpy as np
import tensorflow as tf

Load TFLite model and allocate tensors.

interpreter = tf.lite.Interpreter(model_path="segmentation.tflite")
interpreter.allocate_tensors()

Get input and output tensors.

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

Test model on random input data.

input_shape = input_details[0]['shape']
print(input_shape)
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
print(input_data)
interpreter.set_tensor(input_details[0]['index'], input_data)

interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
print(np.unique(output_data))

I just used a random picture to do the test, willing to see different labels [total 14 classes in my case] sorted by np.unique function. but unfortunately, just one number[4] or two numbers[3,4] is shown. Can you give me some ideas how to solve this problem? Thank you very much!

I also test the tflite file with a test picture, it also did not report expected results.

all xxx.tfrecord files is empty !

Hello, author!
Thank you very much for your open source code.
I currently want to test my own atlas with the pre-training model you provided.
In the process of implementing this task, I first configured the version of tensorflow1.15 and converted the data of the citysacpes data set (png2tfrecord) according to your suggestion.
However, in the process of data conversion, I executed according to your code build_cityscapes_data.py, and the resulting tfrecord files were all empty, as follows:

I'd like you to help me with this problem. Thank you！

How can I get the each image fo eval's forward time ?

Hi author, I want to know the forward time to test the FPS. How do I do ?