ternaus / ternausnet Goto Github PK

View Code? Open in Web Editor NEW

1.0K 31.0 249.0 83.87 MB

UNet model with VGG11 encoder pre-trained on Kaggle Carvana dataset

Home Page: https://arxiv.org/abs/1801.05746

License: MIT License

Python 100.00%

pytorch image-segmentation

ternausnet's Introduction

TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation

By Vladimir Iglovikov and Alexey Shvets

Introduction

TernausNet is a modification of the celebrated UNet architecture that is widely used for binary Image Segmentation. For more details, please refer to our arXiv paper.

Pre-trained encoder speeds up convergence even on the datasets with a different semantic features. Above curve shows validation Jaccard Index (IOU) as a function of epochs for Aerial Imagery

This architecture was a part of the winning solutiuon (1st out of 735 teams) in the Carvana Image Masking Challenge.

Installation

pip install ternausnet

Citing TernausNet

Please cite TernausNet in your publications if it helps your research:

@ARTICLE{arXiv:1801.05746,
         author = {V. Iglovikov and A. Shvets},
          title = {TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation},
        journal = {ArXiv e-prints},
         eprint = {1801.05746},
           year = 2018
        }

Example of the train and test pipeline

https://github.com/ternaus/robot-surgery-segmentation

ternausnet's People

Contributors

Stargazers

Watchers

Forkers

venheads statist-bhfz bekerov pchank jdc08161063 zumbalamambo paojianghu qgzang aspirinkb liyuanyaun elyorcv jerrychiao yuechengyin samuelschen whaozl kevinmtian candleinwindsteve eshter cosecant-csc zhongkailv shubhampachori12110095 lyk125 jfkkf123 psychomonkey911 linus007 chelovek21 fanyishu123 94mia gwli shvetsiya andrewganjinrui zebrajack yangxu351 shuyaoyimei weiyunfei yaokeepmoving tangxinkevin alexliyang sashuiya hedgefair ml-lab yspaik lonestar686 dhw-master liufeng1990 teddybuy yqjaoshuang murari023 hefeicyp neo4reo rangerli pixang ivonnechen grseb9s ilyadobrynin amiremadz machinelearningch melspectrum007 fastlater pandinosaurus dolphinamy studentlynn eglrp sifanw jakecowton rintukutum xindytt ieee820 brucekyle99 fishexpert peterxiaoguo briando2005 hassna30 sr-parashift chenyingmei rykilla chenailian gothos-folly tgfandtcz shiutang-li morindaz eternalsunshine1314 dandrade-altran tqdavid nhannguyen2709 ashishpatel26 snooble amwons sermakarevich magicwangs soywu tony32769 hungtran122 taotaoyuhust machengcheng2016 winwinjjiang hongdayu libo9562 sunguangyu55 fangyuxin

ternausnet's Issues

'from unet_models import unet11' reports error

Hello, authors,
Thanks to your sharing code and paper. I read 'Example.ipynb' then implement related codes. When I type 'from unet_models import unet11', I get an error
'File "unet_models.py", line 16
def init(self, in_: int, out: int):
SyntaxError: invalid syntax'

How can I fix it?
My Pytorch version is 0.1.12.

About the unsampling

Thank you for your effort.
Have you compared the result between bilinear interpolation and deconvolution when used in decoder? Which one would be better?

what is the difference between UNet11 and UNet16

Hello, I want to ask what is the difference between UNet11 and UNet16. I only find that their channel numbers may be different，I hope you will kindly reply me

What ‘s your prediction time for one image?

Carvana Pre-trained Weights

Are the pre-trained weights using the Carvana dataset available for either VGG11 or VGG16? Thanks.

conv5 pooling

Hi!
I was wondering what was the motivation behind performing pooling after already downsampled layer here:

TernausNet/unet_models.py

Line 279 in 85ade9f

center = self.center(self.pool(conv5))

Did you test if it was improving score compared to using downsampling conv layer one more time or simply removing this bottom layer and having conv5 in bottleneck?

How can I make this code work with images of 3 bands only?

Question of num_classes

Change VGG11_2D to VGG11_3D

I followed this code, everything good for 2D model.
Now, I want to use it for 3D model. I changed 2D to 3D, example:
self.pool = nn.MaxPool2d(2,2) => self.pool = nn.MaxPool3d(2,2)
.......
but, at the line 49 ( the picture); : models.vgg11(pretrained=pretrained).features is 2D model. I want to use vgg11 for 3D. How do I do? Thank you

p/s: I visualizered vgg11; it is 2D model

training example

Hello,
Thank you for this upload and congrats on winning the Kaggle challenge. Is it possible for you to provide an example of how to train the network? I can't seem to put everything just right for it to run.

sigmoid output?

Your paper indicates a final sigmoid output layer, but the models in model.py do not have such a layer and the outputs are not in the range [0,1]. Do I just add a torch.sigmoid in the forward call? In a somewhat unrelated matter, it doesn't seem that your loss function is available. I'm new to machine learning/pytorch/computers and any help would be appreciated. Thank you.

About num_classes

Hello, @ternaus!
Thanks for your code, it's really cool, but meaning of num_classes, for instance, in Unet16, isn't clear.

If I have car and background then I consider that image has 2 num_classes: 0 - background and 1 - car. But according to your code, in my example, you suppose that image has only one class. Why?

Thanks in advance.

Clarification about loss?

As I can see in paper you use joint loss function L = H-log(J), I wonder why log is used? is it because H and J have different possible range?

Are the pertained weights frozen while training?

Hi! Great work! I just wanted to inquire in more detail whether, while training, you're freezing the old weights. Thanks!

How to make final prediction and tuning the model?

Thank you very much for developing this model!
I am quite new to image segmentation. So, I still learning. The question I put here might be a very silly, and it is definitely not any issues of your codes.

I am using your pretrained VGG11 model for Kaggle AirBus competition. The output class is binary. The first problem is that during training the loss score continued to decrease, however the Jaccard score do not change at all.

Epoch 1, lr 0.01:   0%|          | 0/57424 [00:00<?, ?it/s]
0.01
Epoch 1, lr 0.01: 100%|█████████▉| 57422/57424 [1:49:41<00:00,  8.33it/s, loss=0.00074]
Epoch 2, lr 0.01:   0%|          | 0/57424 [00:00<?, ?it/s]
Valid loss: 0.00063, jaccard: 0.37004
0.01
Epoch 2, lr 0.01: 100%|█████████▉| 57422/57424 [1:50:11<00:00,  8.14it/s, loss=0.00350]
Epoch 3, lr 0.01:   0%|          | 0/57424 [00:00<?, ?it/s]
Valid loss: 0.63987, jaccard: 0.37004
0.01
Epoch 3, lr 0.01: 100%|█████████▉| 57422/57424 [1:49:52<00:00,  8.04it/s, loss=0.00102]
Epoch 4, lr 0.01:   0%|          | 0/57424 [00:00<?, ?it/s]
Valid loss: 0.00081, jaccard: 0.37004
0.01
Epoch 4, lr 0.01: 100%|█████████▉| 57422/57424 [1:49:59<00:00,  8.02it/s, loss=0.00036]
Epoch 5, lr 0.01:   0%|          | 0/57424 [00:00<?, ?it/s]
Valid loss: 0.00043, jaccard: 0.37004
0.01
Epoch 5, lr 0.01: 100%|█████████▉| 57422/57424 [1:49:53<00:00,  8.01it/s, loss=0.00035]
Epoch 6, lr 0.001:   0%|          | 0/57424 [00:00<?, ?it/s]
Valid loss: 0.00039, jaccard: 0.37004
0.001
Epoch 6, lr 0.001: 100%|█████████▉| 57422/57424 [1:49:33<00:00,  7.97it/s, loss=0.00038]
Epoch 7, lr 0.001:   0%|          | 0/57424 [00:00<?, ?it/s]
Valid loss: 0.00039, jaccard: 0.37004
0.001
Epoch 7, lr 0.001: 100%|█████████▉| 57422/57424 [1:49:32<00:00,  8.26it/s, loss=0.00051]
Epoch 8, lr 0.001:   0%|          | 0/57424 [00:00<?, ?it/s]
Valid loss: 0.00039, jaccard: 0.37004
0.001
Epoch 8, lr 0.001: 100%|█████████▉| 57422/57424 [1:49:33<00:00,  8.15it/s, loss=0.00085]
Epoch 9, lr 0.001:   0%|          | 0/57424 [00:00<?, ?it/s]
Valid loss: 0.00039, jaccard: 0.37004
0.001
Epoch 9, lr 0.001:  92%|█████████▏| 52952/57424 [1:40:36<08:59,  8.29it/s, loss=0.00052]

My next question is how to make final prediction? I check your paper. In the paper, you claim that after applying sigmoid function to output, you just pick a "0.3" threshold. So, if I want to do my own problem, I just do the same way, correct? also, I tried with my output. I tried with different numbers, here is an example output I got with 0.509 threshold. It is clearly detecting something. However, the predicted ship area is not very continuous, unlike the one in your paper. Do you know why? or how to deal with it? how to better select a threshold?

Any suggestion for my next step?

Thank you!

train test

Hello,can you provide your train and test files?