xuebinqin / u-2-net Goto Github PK

The code for our newly accepted paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection."

License: Apache License 2.0

Python 100.00%

computer-vision deep-learning image-background-removal image-processing image-segmentation u-2-net u2net

u-2-net's Introduction

U²-Net: U Square Net

This is the official repo for our paper U²-Net(U square net) published in Pattern Recognition 2020:

U²-Net: Going Deeper with Nested U-Structure for Salient Object Detection

Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood Dehghan, Osmar R. Zaiane and Martin Jagersand

Contact: xuebin[at]ualberta[dot]ca

Updates !!!

** (2022-Aug.-24) ** We are glad to announce that our U²-Net published in Pattern Recognition has been awarded the 2020 Pattern Recognition BEST PAPER AWARD !!!

** (2022-Aug.-17) ** Our U²-Net models are now available on PlayTorch, where you can build your own demo and run it on your Android/iOS phone. Try out this demo on and bring your ideas about U²-Net to truth in minutes!

** (2022-Jul.-5)** Our new work **Highly Accurate Dichotomous Image Segmentation (DIS) Project Page, Github is accepted by ECCV 2022. Our code and dataset will be released before July 17th, 2022. Please be aware of our updates.

** (2022-Jun.-3)** Thank Adir Kol for sharing the iOS App 3D Photo Creator based on our U²-Net.

** (2022-Mar.-31)** Thank [Hikaru Tsuyumine] for implementing the iOS App Portrait Drawing based on our U²-Net portrait generation model.

** (2022-Apr.-12)** Thank Kevin Shah for providing us a great iOS App Lensto, (Demo Video), based on U²-Net.

** (2022-Mar.-31)** Our U²-Net model is also integrated by Hotpot.ai for art design.

** (2022-Mar-19)** Thank Kikedao for providing a fantastic webapp Silueta based on U²-Net. More details can be found at https://github.com/xuebinqin/U-2-Net/issues/295.

** (2022-Mar-17)** Thank Ezaldeen Sahb for implementing the iOS library for image background removal based on U²-Net, which will greatly facilitate the developing of mobile apps.

** (2022-Mar-8)** Thank Levin Dabhi for training the amazing clothes segmentation U²-Net.

** (2022-Mar-3)** Thank Renato Violin for providing an awesome webapp for image background removal and replacement based on our U²-Net.

(2021-Dec-21) This blog clearly describes the way of converting U²-Net to CoreML and running it on iphone.

(2021-Nov-28) Interesting Sky Segmentation models developed by xiongzhu using U²-Net.

(2021-Nov-28) Awesome image editing app Pixelmator pro uses U²-Net as one of its background removal models.

(2021-Aug-24) We played a bit more about fusing the orignal image and the generated portraits to composite different styles. You can
(1) Download this repo by

git clone https://github.com/NathanUA/U-2-Net.git

(2) Download the u2net_portrait.pth from GoogleDrive or Baidu Pan(提取码：chgd)model and put it into the directory: ./saved_models/u2net_portrait/,
(3) run the code by command

python u2net_portrait_composite.py -s 20 -a 0.5

,where -s indicates the sigma of gaussian function for blurring the orignal image and -a denotes the alpha weights of the orignal image when fusing them.

(2021-July-16) A new background removal webapp developed by Изатоп Василий.

(2021-May-26) Thank Dang Quoc Quy for his Art Transfer APP built upon U²-Net.

(2021-May-5) Thank AK391 for sharing his Gradio Web Demo of U²-Net.

(2021-Apr-29) Thanks Jonathan Benavides Vallejo for releasing his App LensOCR: Extract Text & Image, which uses U²-Net for extracting the image foreground.

(2021-Apr-18) Thanks Andrea Scuderi for releasing his App Clipping Camera, which is an U²-Net driven realtime camera app and "is able to detect relevant object from the scene and clip them to apply fancy filters".

(2021-Mar-17) Dennis Bappert re-trained the U²-Net model for human portrait matting. The results look very promising and he also provided the details of the training process and data generation(and augmentation) strategy, which are inspiring.

(2021-Mar-11) Dr. Tim developed a video version rembg for removing video backgrounds using U²-Net. The awesome demo results can be found on YouTube.

(2021-Mar-02) We found some other interesting applications of our U²-Net including MOJO CUT, Real-Time Background Removal on Iphone, Video Background Removal, Another Online Portrait Generation Demo on AWS, AI Scissor.

(2021-Feb-15) We just released an online demo http://profu.ai for the portrait generation. Please feel free to give it a try and provide any suggestions or comments.

(2021-Feb-06) Recently, some people asked the problem of using U²-Net for human segmentation, so we trained another example model for human segemntation based on Supervisely Person Dataset.

(1) To run the human segmentation model, please first downlowd the u2net_human_seg.pth model weights into ./saved_models/u2net_human_seg/.
(2) Prepare the to-be-segmented images into the corresponding directory, e.g. ./test_data/test_human_images/.
(3) Run the inference by command: python u2net_human_seg_test.py and the results will be output into the corresponding dirctory, e.g. ./test_data/u2net_test_human_images_results/
Notes: Due to the labeling accuracy of the Supervisely Person Dataset, the human segmentation model (u2net_human_seg.pth) here won't give you hair-level accuracy. But it should be more robust than u2net trained with DUTS-TR dataset on general human segmentation task. It can be used for human portrait segmentation, human body segmentation, etc.

(2020-Dec-28) Some interesting applications and useful tools based on U²-Net:
(1) Xiaolong Liu developed several very interesting applications based on U²-Net including Human Portrait Drawing(As far as I know, Xiaolong is the first one who uses U²-Net for portrait generation), image matting and so on.
(2) Vladimir Seregin developed an interesting tool, NN based lineart, for comparing the portrait results of U²-Net and that of another popular model, ArtLine, developed by Vijish Madhavan.
(3) Daniel Gatis built a python tool, Rembg, for image backgrounds removal based on U²-Net. I think this tool will greatly facilitate the application of U²-Net in different fields.

(2020-Nov-21) Recently, we found an interesting application of U²-Net for human portrait drawing. Therefore, we trained another model for this task based on the APDrawingGAN dataset.

Usage for portrait generation

Clone this repo to local

git clone https://github.com/NathanUA/U-2-Net.git

Download the u2net_portrait.pth from GoogleDrive or Baidu Pan(提取码：chgd)model and put it into the directory: ./saved_models/u2net_portrait/.
Run on the testing set.
(1) Download the train and test set from APDrawingGAN. These images and their ground truth are stitched side-by-side (512x1024). You need to split each of these images into two 512x512 images and put them into ./test_data/test_portrait_images/portrait_im/. You can also download the split testing set on GoogleDrive.
(2) Running the inference with command python u2net_portrait_test.py will ouptut the results into ./test_data/test_portrait_images/portrait_results.
Run on your own dataset.
(1) Prepare your images and put them into ./test_data/test_portrait_images/your_portrait_im/. To obtain enough details of the protrait, human head region in the input image should be close to or larger than 512x512. The head background should be relatively clear.
(2) Run the prediction by command python u2net_portrait_demo.py will outputs the results to ./test_data/test_portrait_images/your_portrait_results/.
(3) The difference between python u2net_portrait_demo.py and python u2net_portrait_test.py is that we added a simple face detection step before the portrait generation in u2net_portrait_demo.py. Because the testing set of APDrawingGAN are normalized and cropped to 512x512 for including only heads of humans, while our own dataset may varies with different resolutions and contents. Therefore, the code python u2net_portrait_demo.py will detect the biggest face from the given image and then crop, pad and resize the ROI to 512x512 for feeding to the network. The following figure shows how to take your own photos for generating high quality portraits.

(2020-Sep-13) Our U²-Net based model is the 6th in MICCAI 2020 Thyroid Nodule Segmentation Challenge.

(2020-May-18) The official paper of our U²-Net (U square net) (PDF in elsevier(free until July 5 2020), PDF in arxiv) is now available. If you are not able to access that, please feel free to drop me an email.

(2020-May-16) We fixed the upsampling issue of the network. Now, the model should be able to handle arbitrary input size. (Tips: This modification is to facilitate the retraining of U²-Net on your own datasets. When using our pre-trained model on SOD datasets, please keep the input size as 320x320 to guarantee the performance.)

(2020-May-16) We highly appreciate Cyril Diagne for building this fantastic AR project: AR Copy and Paste using our U²-Net (Qin et al, PR 2020) and BASNet(Qin et al, CVPR 2019). The demo video in twitter has achieved over 5M views, which is phenomenal and shows us more application possibilities of SOD.

U²-Net Results (176.3 MB)

Our previous work: BASNet (CVPR 2019)

Required libraries

Python 3.6
numpy 1.15.2
scikit-image 0.14.0
python-opencv PIL 5.2.0
PyTorch 0.4.0
torchvision 0.2.1
glob

Usage for salient object detection

Clone this repo

git clone https://github.com/NathanUA/U-2-Net.git

Download the pre-trained model u2net.pth (176.3 MB) from GoogleDrive or Baidu Pan 提取码: pf9k or u2netp.pth (4.7 MB) from GoogleDrive or Baidu Pan 提取码: 8xsi and put it into the dirctory './saved_models/u2net/' and './saved_models/u2netp/'
Cd to the directory 'U-2-Net', run the train or inference process by command: python u2net_train.py or python u2net_test.py respectively. The 'model_name' in both files can be changed to 'u2net' or 'u2netp' for using different models.

We also provide the predicted saliency maps (u2net results,u2netp results) for datasets SOD, ECSSD, DUT-OMRON, PASCAL-S, HKU-IS and DUTS-TE.

U²-Net Architecture

Quantitative Comparison

Qualitative Comparison

Citation

@InProceedings{Qin_2020_PR,
title = {U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection},
author = {Qin, Xuebin and Zhang, Zichen and Huang, Chenyang and Dehghan, Masood and Zaiane, Osmar and Jagersand, Martin},
journal = {Pattern Recognition},
volume = {106},
pages = {107404},
year = {2020}
}

u-2-net's People

Contributors

Stargazers

Watchers

Forkers

vamsirajendra ml-and-ai-repo fmabid jingmouren trendingtechnology bill007bill chenyangh ofirkris aihill satoshirobatofujimoto jaedukseo joelone peternara jonathanfly liannice manshiro tchigher fablos mrm8488 dt021 vince2003 gamcoh justisk happog dankwartrustow johndpope xzycr7 wyzhe mbyase felixzhang7 uptodiff fanchouille pkusnail s-p-z chenyh19 raptorzhou soccergame templeblock hami-gitgud zhyj3038 szerintedmi mazzzystar kivs pangfd rogalag sdwivedi cy-dev-tex zetnim haifeiforwork hackgoofer yecq moon989 xiaojinu tcapelle iansyahr hadryan hbcbh1999 xeransis andgomezri josephch405 fgergo wydonglove dbpprt hongxin001 paozhuanyinyuba cs-util kusdavletov akhil451 robosina lwzbuaa hmate9 efishocean kevinchen1223 yujitong nishathussain fabienfp waterknight1998 jwuphysics 17702513221 shgidi dengpingfan junqiangchen yangtong1989 garspace unography stjordanis huguensjean hefv57 magi803 jygan sangharshgautam zeta1999 quuhua911 hwangjohn cv-ip xiaodongdreams naotokui qosmoinc antonlinderer onestep00

u-2-net's Issues

evaluation code

Hi, thanks for your great work.
could you please share your evaluation code?

Google Colab

Thank you for this code!

Just a thought. There are some Colab's available, like: https://colab.research.google.com/drive/1BDj3eZWL4I-8QtkxAAZxF8in35fvR_xU?usp=sharing

If there's an official Google Colab, it would be nice to have "Open in Colab" button.

U-2-Net model different with your paper description

https://github.com/NathanUA/U-2-Net/blob/master/model/u2net.py

    d1 = self.side1(hx1d)
    
    d2 = self.side2(hx2d)
    d2 = _upsample_like(d2,d1)

    d3 = self.side3(hx3d)
    d3 = _upsample_like(d3,d1)

    d4 = self.side4(hx4d)
    d4 = _upsample_like(d4,d1)

    d5 = self.side5(hx5d)
    d5 = _upsample_like(d5,d1)

    d6 = self.side6(hx6)
    d6 = _upsample_like(d6,d1)

    d0 = self.outconv(torch.cat((d1,d2,d3,d4,d5,d6),1))

    return F.sigmoid(d0), F.sigmoid(d1), F.sigmoid(d2), F.sigmoid(d3), F.sigmoid(d4), F.sigmoid(d5), F.sigmoid(d6)

generates six side output saliency probability maps from stages En6, De5, De4, De3, De2 and De1 by a 3x3 convolution layer, without sigmoid function.
But in your paper description，generates six side output saliency probability maps from stages En6, De5, De4, De3, De2 and De1 by a 3x3 convolution layer and a sigmoid function.

Why the difference?

RuntimeError: expected dtype Half but got dtype Long

I am trying to use this model for binary segmentation.

When i pass the Mask as Tensor to muti_bce_loss_fusion I get this error:

    547 def muti_bce_loss_fusion(d0, d1, d2, d3, d4, d5, d6, labels_v):
    548     print(d0.shape)
--> 549     loss0 = bce_loss(d0, labels_v)
    550     loss1 = bce_loss(d1, labels_v)
    551     loss2 = bce_loss(d2, labels_v)

~/anaconda3/envs/seg/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    556             result = self._slow_forward(*input, **kwargs)
    557         else:
--> 558             result = self.forward(*input, **kwargs)
    559         for hook in self._forward_hooks.values():
    560             hook_result = hook(self, input, result)

~/anaconda3/envs/seg/lib/python3.7/site-packages/torch/nn/modules/loss.py in forward(self, input, target)
    518 
    519     def forward(self, input, target):
--> 520         return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
    521 
    522 

~/anaconda3/envs/seg/lib/python3.7/site-packages/torch/nn/functional.py in binary_cross_entropy(input, target, weight, size_average, reduce, reduction)
   2415 
   2416     return torch._C._nn.binary_cross_entropy(
-> 2417         input, target, weight, reduction_enum)
   2418 
   2419 

RuntimeError: expected dtype Half but got dtype Long

What is the format of the output of the model? What is the expected format for the label?

How could I use this model for binary segmentation?

This is a great job. Where is the link to the paper？

Using with torch.jit.trace and C++?

I started a discussion here https://discuss.pytorch.org/t/debugging-runtime-error-module-forward-inputs-libtorch-1-4/82415

I modified u2net_test.py and used torch.jit.trace to save a module

traced_script_module = torch.jit.trace(net, inputs_test)
traced_script_module.save("traced_model.pt")
print(inputs_test.size()) # shows (1, 3, 320, 320)

Then in c++

auto module = torch::jit::load("traced_model.pt");
torchinputs.clear();
torchinputs.push_back(torch::ones({1, 3, 320, 320 }, torch::kCUDA).to(at::kFloat)); // because python was torch.FloatTensor
module.forward(torchinputs); // error

The error:

 Unhandled exception at 0x00007FFFD8FFA799 in TouchDesigner.exe: Microsoft C++ exception: std::runtime_error at memory location 0x000000EA677F1B30. occurred

The error is at https://github.com/pytorch/pytorch/blob/4c0bf93a0e61c32fd0432d8e9b6deb302ca90f1e/torch/csrc/jit/api/module.h#L112 It says inputs has size 0. I don't know if that's the cause of the exception or a result.

Do you have advice about running U-2-Net in C++? Thank you.

IndexError: invalid index of a 0-dim tensor

python3 u2net_train.py

Traceback (most recent call last):
File "u2net_train.py", line 140, in
loss2, loss = muti_bce_loss_fusion(d0, d1, d2, d3, d4, d5, d6, labels_v)
File "u2net_train.py", line 40, in muti_bce_loss_fusion
print("l0: %3f, l1: %3f, l2: %3f, l3: %3f, l4: %3f, l5: %3f, l6: %3f\n"%(loss0.data[0],loss1.data[0],loss2.data[0],loss3.data[0],loss4.data[0],loss5.data[0],loss6.data[0]))
IndexError: invalid index of a 0-dim tensor. Use tensor.item() in Python or tensor.item<T>() in C++ to convert a 0-dim tensor to a number

RuntimeWarning: invalid value encountered in true_divide

Your work is so great, thank you for sharing your code!

I tried to inference some images using your model and your code.
Almost everything is good, but with some images, I receive warning:
data_loader.py:197: RuntimeWarning: invalid value encountered in true_divide
image = image/np.max(image)
Like this image: https://drive.google.com/file/d/1iFTb29lu3cWQzrMMdMB3y03Fcoqd7Gkg/view?usp=sharing

I do not know why this happen? I mean what happens in data_loader.py file give this warning. Could warning affect the quality of result?

Waiting for the code on the edge my seat :-)

Hi,

I'm a great fan of your work so far. Your readme states that the code will be released May 4th, has the timeline changed?

Thanks!

training data?

I'm interested to retrain the model from scratch. It looks like the code expects a train_datafolder, which doesn't exist. The readme mentions a bunch of datasets that you've trained on, but it seems like the code expects a different format than the one available e.g. here.

Can you please confirm that the expectation is jpg images as input, and a png encoded in uint8 where the entire object takes the value 255 and all other pixels are 0?

Best way to fine-tune the model

Hi everyone!

I'd be very grateful if somebody shared their fine-tuning experience.
How many pictures in the dataset is enough?
Which hyperparameters fit best?
Which layers are better to freeze? Are there any pros in freezing them?
How long does it take to train such a model?

Currently, I try to improve segmenting flowers (the stem of the flowers is often ignored). I collected & labeled 200 images, applied all augmentation techniques possible.
But haven't found the framework to train the model successfully.

Would be appreciative to hear any thoughts of improving it, if you have any :)

How to capture more fine details

First of all, great work! The model is able to capture some fine details such as human hair. However, it is not capable of capturing the majority of the finer details such as little holes and hair. I tried to add the iou loss function and the model becomes very confident on predicting the object edges. However, the little holes/gaps in human hair are not captured and a lumped area with solid foreground will be predicted.

Do you have any suggestion as to modifying the model to capture more fine details of the objects?

How are masks created ? maybe out of topic

This is from the DUTS Dataset. just out of curiosity how is such a mask created ?

Unexpected item included in the final mask

Congratulations for your amazing work with u2net！
Recently,I have tried your net on portrait segmentation task.Also I trained it from scratch by portrait segmentation dataset without other objects;
It got great performance.However,I find that sometimes,objects such as chairs and street nameplates can also be included. I am confused about it.
Since I trained it from scratch by the given dataset.It can be seen as a semantic segmentation task,right?Why the other object can be inclued?
Thankyou!

关于网络输出

您好，感谢您开源自己的研究项目，我遇到了一个问题，在预测的时候产生的二值图，如何在原图中将这个图片扣出来呢，网络输出有轮廓的坐标吗？

Reproducing results

I am trying to reproduce your results but having some uninspiring signs at the start. I start with your model with all the settings as you stated in the paper except that I use 15% of the train data as validation every epoch and my batch size is 8. The validation loss after 50 epochs or so stops decreasing and there emerges a noticeable gap between train and validation. I trained for more 40 epochs but validation did not fall lower, it is almost twice of that of the train loss.

The model seems to be overfitting to me. A lower batch-size than yours should cause more regularisation so that should not be the issue.

Can you please give me some advice on how to interpret this and if I should keep going? I know i am not using 100% of data like you but 85% should be suboptimial but similar. Can you share your training curves or anything as such?

input size and crop

Thanks a lot for you awesome performing model!
I'm wondering about scaling and random crop, for training you first scale and then crop to 288x288 and thus the tensor has this size (288), what role does then scaling play here and why you talk about 320x320 as input size instead of 288x288?

RescaleT(320), 
RandomCrop(288),

With your latest model update, upscaling supports different ratios, as it looks like for me, or is only squarish input supported or e.g. 640x480 as well?

Problems related to the training set

First，appreciate your excellent work！In the training set, I need a little guidance.
In BASNet, you detailed the composition of the training set，In this job, did you use the same training set during training？

How to resume training ?

Hello,

how to resume training on a saved .pth ?

CRF post processing

Hi,

First of all i want say thanks a lot for your work. Its really one of the best model for image segmentation i have faced so far. Output may be little less than few others but no comparison of speed for this model. It super fast.

I am trying to train this model on human images and waiting for he output. However curious about CRF post processing.

As while using pre-trained model output edges are not that sharp. So thinking post processing may help on that.

Can you please suggest how to do that.

Is there a docker image?

It would be great to have a docker image for this so people less experienced with Python (me 😛 ) can try it out.

Or maybe just pointing out in the README some already made docker image that would work.

How do I need to change the model to get it to learn depth information.

Hi,
I wanted to know how should I change the model configuration for it to learn depth information too. I have read that adding altrous convolutions may help the model to learn depth information. How can I do that?

For the below RSU of length 7 here, how can I add more dilated convolutions?

class RSU6(nn.Module):#UNet06DRES(nn.Module):

    def __init__(self, in_ch=3, mid_ch=12, out_ch=3):
        super(RSU6,self).__init__()

        self.rebnconvin = REBNCONV(in_ch,out_ch,dirate=1)

        self.rebnconv1 = REBNCONV(out_ch,mid_ch,dirate=1)
        self.pool1 = nn.MaxPool2d(2,stride=2,ceil_mode=True)

        self.rebnconv2 = REBNCONV(mid_ch,mid_ch,dirate=1)
        self.pool2 = nn.MaxPool2d(2,stride=2,ceil_mode=True)

        self.rebnconv3 = REBNCONV(mid_ch,mid_ch,dirate=1)
        self.pool3 = nn.MaxPool2d(2,stride=2,ceil_mode=True)

        self.rebnconv4 = REBNCONV(mid_ch,mid_ch,dirate=1)
        self.pool4 = nn.MaxPool2d(2,stride=2,ceil_mode=True)

        self.rebnconv5 = REBNCONV(mid_ch,mid_ch,dirate=1)

        self.rebnconv6 = REBNCONV(mid_ch,mid_ch,dirate=2)

        self.rebnconv5d = REBNCONV(mid_ch*2,mid_ch,dirate=1)
        self.rebnconv4d = REBNCONV(mid_ch*2,mid_ch,dirate=1)
        self.rebnconv3d = REBNCONV(mid_ch*2,mid_ch,dirate=1)
        self.rebnconv2d = REBNCONV(mid_ch*2,mid_ch,dirate=1)
        self.rebnconv1d = REBNCONV(mid_ch*2,out_ch,dirate=1)

Fine tune the existing Model

Hi @Nathanua

I want to fine tune the existing model(u2net). I understand that we can resume the training simply as discussed in this issue.
#33

if(model_name=='u2net'):
    net = U2NET(3, 1)
elif(model_name=='u2netp'):
    net = U2NETP(3,1)
net.load_state_dict(torch.load(saved_model_dir))

if torch.cuda.is_available():
    net.cuda()

In addition, to resume the training from where exactly it was, one usually needs to save and load the optimizer (especially for Adam)

So my question is In addition to weights, how can I load the optimizer state as well ?

Technical Issue On Resuming Training

Hello,

I have trained my model on DUTS at 300 iterations and saved my model state together with adams . Now i removed the DUTS images to add new images so that the model can continue training on these new images. Meaning, the model will have had trained on DUTS and also these new images.

But unfortunately , it seems the model overrides the previous trained images and starts training on the new images from scratch . so when i load any DUTS image the results are not good at all.

Is this the intended behavior ? or am supposed to mix all images and continue training , old and new ones in one folder ?

How to increase model capacity for training on a larger dataset?

First of all thanks for the amazing work on U-2-net. Now i am trying to train the model from scratch on my own dataset of 60k images which is larger than your dataset. I would like to know how i can increase the model capacity to be able to train on such a dataset.

I have considered replacing the standard rebnconv blocks with residuals as suggested in another issue. What other options i could try? I understand that i need to make the architecture deeper, does this mean that i should make RSU-8 or RSU-9 blocks by adding more convolution layers?

Inconsistency between paper and code

Hi,

first thank you so much for the great work! I have found one issue regarding the mismatch of your paper and the released code while I was trying to integrate the code into my own project:

It is in the part "3.2. Architecture of U2-Net" in the paper where you wrote:
"our U2-Net first generates six side output saliency probability maps S(6), S(5), S(4), S(3), S(2), S(1) from stages En 6, De 5, De 4, De 3, De 2 and De 1 by a 3 x 3 convolution layer and a sigmoid function. Then, it upsamples these saliency maps to the input image size and fuses them with a concatenation operation followed by a 1 x 1 convolution layer and a sigmoid function to generate the final saliency probability map S(fuse)."

But in your code, it seems that you concatenate the upscaled side saliency maps before passing them into sigmoid function:

    d0 = self.outconv(torch.cat((d1,d2,d3,d4,d5,d6),1))

    return F.sigmoid(d0), F.sigmoid(d1), F.sigmoid(d2), F.sigmoid(d3), F.sigmoid(d4), F.sigmoid(d5), F.sigmoid(d6)

Could you please help clarify what is the correct order? Thank you.

U-2-Net for binary segmentation

Hey @Nathanua,

just to share my experience here, using the small U-2-Net architecture:

I'm comparing the results to a baseline model (DeepLabV3, ResNet101 ~450MB) and achieved 82,... mIOU after 500 Epochs on my rather small benchmarking Dataset (license plate binary segmentation).
The small U-2-Net model achieved 78,... mIOU (~5MB) after roughly 600 Epochs

I did the following modifications to the model in order to fine-tune my results:

I introduced a distillation loss between each nested Unet and used temperature annealing on the sigmoid. Based on the assumption that the more nested Unets have more computational power, we can define a second loss specifying the bce between adjacent layers. The model converges way faster with this approach, however the 3 new hyperparameters ("temperature", alpha and a scaling factor for the loss) seem to be quite "sensitive". was not working.
I changed to SGD with a poly learning rate starting at 0.01
I'm currently exploring the options to further prune the network or use an early exit while inferencing to reduce inferencing time.

I just wanted to share my experience here, maybe it can be helpful for this repository.

Best,
Dennis

Image preprocessing

Closed.

Epoch size

Hi Nathan,
Your results are fantastic and thank you for sharing the code but it would be really appreciated if you can kindly answer a few queries that I have from my end. I would like to now the following things :

What was the max epoch size that you trained the model with Duts-Dataset and how did u get to the tar loss of 0.01?

index error

I tried to train on my image set. But got the following error.
My folder setup
train_images - has 2 folders - 'images ' and another folder 'mask'
When I ran the script it showed the correct number of images, but then got the following error

Traceback (most recent call last):
File "u2net_train.py", line 143, in
loss2, loss = muti_bce_loss_fusion(d0, d1, d2, d3, d4, d5, d6, labels_v)
File "u2net_train.py", line 42, in muti_bce_loss_fusion
print("l0: %3f, l1: %3f, l2: %3f, l3: %3f, l4: %3f, l5: %3f, l6: %3f\n"%(loss0.data[0],loss1.data[0],loss2.data[0],loss3.data[0],loss4.data[0],loss5.data[0],loss6.data[0]))
IndexError: invalid index of a 0-dim tensor. Use tensor.item() in Python or tensor.item<T>() in C++ to convert a 0-dim tensor to a number

prediction confidence score?

Hi Nathan,

Such a great job, much better results than other nets I played with ( including my experimentations training a blank U-Net). Thank you for making it available.

How would you calculate some kind of "prediction confidence" score?
It could be used during inference to flag predictions which require human review.

Thinking on my feet I was considering to experiment with these naive approaches:

Calculate it based on how much is the deviance from 0 and 1 in predicted pixels. Does a prediction closer to 0.5 for a pixel indicates less confidence in whether the pixel is fg or bg?
Calculate losses on a bunch of input images with ground truth. Then based on it train a network to predict loss on any arbitrary image.

Should we save running_loss and running_tar_loss also to model to resume training ?

Hello,

currently am able to resume training but just wanted to ask a quick question,
should i also save running_loss and running_tar_loss to my checkpoint model ?

because when i resume training the values go back to 0.

Input Size

Hi Team,
Great work!!!
I have a doubt is the input size fixed with 320*320?

Architecture Insight

I find it very interesting how the model is able to pick up tiny gaps in salient objects as well as segment out delicate hairs like the ones shown below

I am surprised at how it can do this at such a low 320X320 resolution input and while the paper does motivate the architecture by saying it allows the model to repeatedly get a global view, it doesn't quite explain how this level of detail can be reached at this resolution. I don't quite understand how it is able to detect minor gaps within the object of interest and trace it so well. It is able to provide a more fine mask for human hairs than even networks trained on humans which also fail to pick up other minute details. I would appreciate if the author could provide his thoughts and some more insights into the working of the model because I have never seen any segmentation network being able to segment such details. Any comments would be much appreciated.

RunetimeError - CPU only

Can't seem to get it to work due to:
"Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location='cpu' to map your storages to the CPU."

However i do not see where i should use torch.load with map_location='cpu'.

full error:
Traceback (most recent call last):
File "u2net_test.py", line 116, in
main()
File "u2net_test.py", line 86, in main
net.load_state_dict(torch.load(model_dir))
File "./U2NET/venv/lib/python3.6/site-packages/torch/serialization.py", line 529, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "./U2NET/venv/lib/python3.6/site-packages/torch/serialization.py", line 702, in _legacy_load
result = unpickler.load()
File "./U2NET/venv/lib/python3.6/site-packages/torch/serialization.py", line 665, in persistent_load
deserialized_objects[root_key] = restore_location(obj, location)
File "./U2NET/venv/lib/python3.6/site-packages/torch/serialization.py", line 156, in default_restore_location
result = fn(storage, location)
File "./U2NET/venv/lib/python3.6/site-packages/torch/serialization.py", line 132, in _cuda_deserialize
device = validate_cuda_device(location)
File "./U2NET/venv/lib/python3.6/site-packages/torch/serialization.py", line 116, in validate_cuda_device
raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

Use Salient Object Detection to realize Salient-Instance Segmentation

Is it possible to realize this task?@Nathanua

How to change Input/Output image dimension from 320x320 to 640x640

Dear Nathan ,
I hope you are doing well . Your results are really stunning thank you for sharing the project . It would be deeply appreciated if you can kindly answer the following question for me .

I want to change the model input image and output prediction size from 320x320 to 640x640 . Can you please guide me as to how I can get this done .

Thanks a lot

Kinds Regards
Kamal Kanta Maity

About inference speed

Thanks for your great job!
From paper I know that U2Net runs at a speed of 30 FPS, with input size of 320×320×3) on a 1080Ti, U2-Net+(4.7 MB) runs at 40 FPS, but on which GPU U2Net running? 1080Ti?

Design Questions

Thanks for you amazing work. I learnt alot from your 2 papers. I had a few questions:

Why have you only used Cross entropy loss and not also ssim and iou as in basnet? How are advantages of those losses, which were outlined with detailed analysis in basnet, made up for with only CE here?
Why did you set all the deep-supervision weights to 1? It is normal to set them to values between 0.2-0.8 so the model focuses most on the final output.
There does not seem to be a lr scheduler which is a default today. May i know why you did not use one?
How much did taking all the predications and concatenating them to predict the final map make a difference vs just taking the top most prediction? As a 1X1 conv is used, we are just taking a linear combination of the previous prediction and it is apparent that the later most predictions will be the most accurate. Did you look at the weights learnt by the 1X1 to see what was going on?

Higher resolutions

First of all, thank you for the excellent work.

I tried to infer some images with higher resolution (around 2000 x 1500) and the mask generated seems blurry on the edges.
Do you think it's because of the resolution of the images used for training ?
Do you think that by training the u-net with higher resolution images, I would get better results?
Do you know if a library similar to DUTS exist with higher resolution images?

Thank you!

Feature : Only save model when training has improved

it would be great to only save model when training has improved.

Can I specify object to be segmented?

Hello, thank you for this work.

This issue is as the following images:

Is there a way to choose object to be segmented? Or how do I keep the guitar along the sequence of images?

Thank you.

Loss & accuracy

I'm trying to retrain your model for our specific use case. I'm training with images augmented from a 30k set. I also added accuracy calculations and validations.

The loss and accuracy seems to stall no matter how I change the learning rate.
What would you recommend? Should I just train longer ? Shall I try to "freeze" or lower the LR on part of the layers (which layers? all encoders?). Or is that how far it can get?
Have you experimented with different LR algos (Cyclic etc.)?

I ran these with 120k training images (50 epochs , 200 Iterations each, batch size 12 ). Validation size: 600 images after each epoch

Training from scratch

Training on your pre-trained model (173.6 MB) LR=0.001 (as yours)

LR reduced to 0.0001 (on pre-trained model)

cannot download weights

I cannot download file from google drive.
Can you provide download website of Baidu?
Thanks a lot!

Stalled in reading data

Hi all

I found it always got stalled in reading data. I press Ctrl+C and it print some information as below

程序运行到for i_test, data_test in enumerate(test_salobj_dataloader)总会卡住不动。我按 Ctrl+C 后显示如下

Traceback (most recent call last):
  File "u2net_test.py", line 119, in <module>
    main()
  File "u2net_test.py", line 93, in main
    for i_test, data_test in enumerate(test_salobj_dataloader):
  File "/local/mnt/workspace/ruodcui/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363, in __next__
    data = self._next_data()
  File "/local/mnt/workspace/ruodcui/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 974, in _next_data
    idx, data = self._get_data()
  File "/local/mnt/workspace/ruodcui/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 941, in _get_data
    success, data = self._try_get_data()
  File "/local/mnt/workspace/ruodcui/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 779, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/local/mnt/workspace/ruodcui/anaconda3/lib/python3.7/multiprocessing/queues.py", line 104, in get
    if not self._poll(timeout):
  File "/local/mnt/workspace/ruodcui/anaconda3/lib/python3.7/multiprocessing/connection.py", line 257, in poll
    return self._poll(timeout)
  File "/local/mnt/workspace/ruodcui/anaconda3/lib/python3.7/multiprocessing/connection.py", line 414, in _poll
    r = wait([self], timeout)
  File "/local/mnt/workspace/ruodcui/anaconda3/lib/python3.7/multiprocessing/connection.py", line 920, in wait
    ready = selector.select(timeout)
  File "/local/mnt/workspace/ruodcui/anaconda3/lib/python3.7/selectors.py", line 415, in select
    fd_event_list = self._selector.poll(timeout)
KeyboardInterrupt

Any ideas?

Wrong results with simple test

Hello,

By doing tests, I created a model with only 1 image and 10 epoq and I tested the model with this same image, to get bad results in the end. Do you have any explanations?

Thank you

permission denied... any way to go around this?

Loss looks strange on training

At training validation loss is bouncing which looks very strange.
Here I attach you a plot with training and validation loss curves:

People segmentation quality, pretrained model vs other datasets, and comparison

Congratulations for your amazing work with u2net at

quick question, I'm trying to apply u2net to do segmentation of people. I tried your pre-trained model, the one available at https://github.com/NathanUA/U-2-Net , and its pretty good but not as good segmenting people as deeplabv3 for example. However I love u2net because it's faster and uses less memory. So now I'm trying to train u2net with the 64k images of the Coco Dataset. The question is:

Your pretrained model, on what was it pretrained? because if you tell me that it was already pretrained with coco, then no point in continuing to try that route. But if it was pretrained in a different way then maybe it´s worth trying to train it with the 64k images of people in Coco dataset.
Do you think that with the right dataset the u2net model could achieve similar IoU segmentation quality on people as the level reached by deeplabv3?

thank you so very much and congrats again

About RescaleT

Xuebin Qin, 您好：
感谢您的分享，真是一项很棒的工作。有一点小疑问想向您请教下，请问您train和test的时候为什么直接把图片resize到(320,320)呢？而不是保持图片的长宽比呢？是跟训练数据有关系还是如果保持长宽比的话，RandomCrop会影响图片中显著性物体的完整性？还是其他什么原因呢？我想训练自己的数据，我的应用场景是1920*1080分辨率的视频，并有一些该分辨率下的训练数据，请问您觉得train和test的size应该怎样调整最合适呢？多谢您的指点！

xuebinqin / u-2-net Goto Github PK

u-2-net's Introduction

U2-Net: U Square Net

Updates !!!

Usage for portrait generation

U2-Net Results (176.3 MB)

Our previous work: BASNet (CVPR 2019)

Required libraries

Usage for salient object detection

U2-Net Architecture

Quantitative Comparison

Qualitative Comparison

Citation

u-2-net's People

Contributors

Stargazers

Watchers

Forkers

u-2-net's Issues

Training from scratch

Training on your pre-trained model (173.6 MB) LR=0.001 (as yours)

LR reduced to 0.0001 (on pre-trained model)

Recommend Projects

Recommend Topics

Recommend Org

Jobs

U²-Net: U Square Net

U²-Net Results (176.3 MB)

U²-Net Architecture