GithubHelp home page GithubHelp logo

plemeri / inspyrenet Goto Github PK

View Code? Open in Web Editor NEW
304.0 12.0 59.0 148.95 MB

Official PyTorch implementation of Revisiting Image Pyramid Structure for High Resolution Salient Object Detection (ACCV 2022)

License: MIT License

Python 99.24% Shell 0.76%
deep-learning high-resolution pytorch salient-object-detection accv2022 computer-vision background-removal image-matting image-segmantation dichotomous-image-segmentation

inspyrenet's Introduction

Revisiting Image Pyramid Structure for High Resolution Salient Object Detection (InSPyReNet)

PWC PWC PWC PWC PWC

PWC PWC PWC

PWC PWC PWC PWC PWC

Official PyTorch implementation of PyTorch implementation of Revisiting Image Pyramid Structure for High Resolution Salient Object Detection (InSPyReNet)

To appear in the 16th Asian Conference on Computer Vision (ACCV2022)

Taehun Kim, Kunhee Kim, Joonyeong Lee, Dongmin Cha, Jiho Lee, Daijin Kim

Abstract: Salient object detection (SOD) has been in the spotlight recently, yet has been studied less for high-resolution (HR) images. Unfortunately, HR images and their pixel-level annotations are certainly more labor-intensive and time-consuming compared to low-resolution (LR) images. Therefore, we propose an image pyramid-based SOD framework, Inverse Saliency Pyramid Reconstruction Network (InSPyReNet), for HR prediction without any of HR datasets. We design InSPyReNet to produce a strict image pyramid structure of saliency map, which enables to ensemble multiple results with pyramid-based image blending. For HR prediction, we design a pyramid blending method which synthesizes two different image pyramids from a pair of LR and HR scale from the same image to overcome effective receptive field (ERF) discrepancy. Our extensive evaluation on public LR and HR SOD benchmarks demonstrates that InSPyReNet surpasses the State-of-the-Art (SotA) methods on various SOD metrics and boundary accuracy.

Contents

  1. News
  2. Demo
  3. Applications
  4. Easy Download
  5. Getting Started
  6. Model Zoo
  7. Results
  8. Citation
  9. Acknowledgement
  10. References

News ๐Ÿ“ฐ

[2022.10.04] TasksWithCode mentioned our work in Blog and reproducing our work on Colab. Thank you for your attention!

[2022.10.20] We trained our model on Dichotomous Image Segmentation dataset (DIS5K) and showed competitive results! Trained checkpoint and pre-computed segmentation masks are available in Model Zoo). Also, you can check our qualitative and quantitative results in Results section.

[2022.10.28] Multi GPU training for latest pytorch is now available.

[2022.10.31] TasksWithCode provided an amazing web demo with HuggingFace. Visit the WepApp and try with your image!

[2022.11.09] ๐Ÿš— Lane segmentation for driving scene built based on InSPyReNet is available in LaneSOD repository.

[2022.11.18] I am speaking at The 16th Asian Conference on Computer Vision (ACCV2022). Please check out my talk if you're attending the event! #ACCV2022 #Macau - via #Whova event app

[2022.11.23] We made our work available on pypi package. Please visit transparent-background to download our tool and try on your machine. It works as command-line tool and python API.

[2023.01.18] rsreetech shared a tutorial for our pypi package transparent-background using colab. ๐Ÿ“บ [Youtube]

Demo ๐Ÿš€

Image Sample Video Sample

Applications ๐ŸŽฎ

Here are some applications/extensions of our work.

Web Application

TasksWithCode provided WepApp on HuggingFace to generate your own results!

Web Demo

Command-line Tool / Python API ๐Ÿ“Ÿ

Try using our model as command-line tool or python API. More details about how to use is available on transparent-background.

pip install transparent-background

Lane Segmentation ๐Ÿš—

We extend our model to detect lane markers in a driving scene in LaneSOD

Lane Segmentation

Easy Download ๐Ÿฐ

How to use easy download

Downloading each dataset, checkpoint is quite bothering, even for me ๐Ÿ’ค. Instead, you can download data we provide including ImageNet pre-trained backbone checkpoints, Training Datasets, Testing Datasets for benchmark, Pre-trained model checkpoints, Pre-computed saliency maps with single command below.

python utils/download.py --extra --dest [DEST]
  • --extra, -e: Without this argument, only the datasets, checkpoint, and results from our main paper will be downloaded. Otherwise, all data will be downloaded including results from supplementary material and DIS5K results.
  • --dest [DEST], -d [DEST]: If you want to specify the destination, use this argument. It will automatically create a symbolic links of the destination folders inside data and snapshots. Use this argument if you want to download data on other physical disk. Otherwise, it will download inside this repository folder.

If you want to download a certain checkpoint or pre-computed map, please refer to Getting Started and Model Zoo.

Getting Started ๐Ÿ›ซ

Please refer to getting_started.md for training, testing, and evaluating on benchmarks, and inferencing on your own images.

Model Zoo ๐Ÿฆ’

Please refer to model_zoo.md for downloading pre-trained models and pre-computed saliency maps.

Results ๐Ÿ’ฏ

Quantitative Results

LR Benchmark HR Benchmark HR Benchmark (Trained with extra DB) DIS

Qualitative Results

DAVIS-S & HRSOD UHRSD UHRSD (w/ HR scale) DIS

Citation

@inproceedings{kim2022revisiting,
  title={Revisiting Image Pyramid Structure for High Resolution Salient Object Detection},
  author={Kim, Taehun and Kim, Kunhee and Lee, Joonyeong and Cha, Dongmin and Lee, Jiho and Kim, Daijin},
  booktitle={Proceedings of the Asian Conference on Computer Vision},
  pages={108--124},
  year={2022}
}

Acknowledgement

This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2017-0-00897, Development of Object Detection and Recognition for Intelligent Vehicles) and (No.B0101-15-0266, Development of High Performance Visual BigData Discovery Platform for Large-Scale Realtime Data Analysis)

Special Thanks to ๐ŸŽ‰

  • TasksWithCode team for sharing our work and making the most amazing web app demo.

References

Related Works

  • Towards High-Resolution Salient Object Detection (paper | github)
  • Disentangled High Quality Salient Object Detection (paper | github)
  • Pyramid Grafting Network for One-Stage High Resolution Saliency Detection (paper | github)

Resources

inspyrenet's People

Contributors

plemeri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

inspyrenet's Issues

train custom dataset

Hello! After testing, removing the background has shown much better results compared to some other models. I would like to try training my own model.

This is a new field for me, and I have a few questions. Thank you!

  1. Can I continue from the latest.pth and fine-tune it? If so, when adding additional datasets, should I only keep the clothes and will the person be preserved when removing the background?

  2. When fine-tuning, should I include the training config YAML and the datasets mentioned in it?

  3. How should I prepare the dataset? I have looked at the folder structure of DIS5K, which seems to be as follows. Then I will put the data in and run train.py.

  4. How Many should i prepared? image and masked

res is about 1028โ€Šร—โ€Š1828

DIS-TR
  im
  gt

DIS-VD
  im
  gt
image

configs/extra_dataset/Plus_Ultra.yaml

image

Train with another shape

Thank you for your great work. I want to train this on my custom dataset, but my images have width much higher than height, like 1300*48. Is that okay that I use the original config on my dataset , because the original config is for square image.

Overflow NaN can be happen in training

Thanks for the great work!

However, I found that the overflow NaN/Inf can happen when computing the attention score. The problem is that query, key in attention layers are not normalized and the bmm can cause inf during training and inference, especially with mixed precision, float16. I suggested authors or anyone who want to train on your own dataset to edit the code at:

https://github.com/plemeri/InSPyReNet/blob/main/lib/modules/layers.py#L167
https://github.com/plemeri/InSPyReNet/blob/main/lib/modules/attention_module.py#L82
https://github.com/plemeri/InSPyReNet/blob/main/lib/modules/attention_module.py#L90

Best,

Model accuracy drops after a few epoch when training with custom dataset !

Hi @plemeri

Thank you for your work.

I tried training the swin b model from scratch with my custom dataset which is having 420000 images of humans , products , cars . etc (210000 * 2 Horizontal Flip ) , after a few epoch of training it starts giving weird output and the accuracy drops and the output from the model becomes very poor.

I had changed the batch size to 8 in order to train the model little quicker but the results started to get very horrible after a few epochs. Can you please tell me why this is occurring ?

Please see the attached input and output images for a better understanding of the problem.
image-compare.pdf

Thanks :)

InSPyReNet for mask refinement?

Hi there!

Thank you for your great work on this project.
I was wondering if InSPyReNet could take in a pre-computed mask instead of generating one on its own.
InSPyReNet excels at fine details, but sometimes does not include the objects I would like in the image.
I have a list of images along with their trimaps, and would like for InSPyReNet to refine the edges for a smooth background removal.

Is this possible, and if so, what would be a good way to take on this challenge?

Thank you!

Error when loading pre-trained models

Great work! The results look very nice. I was hoping to try out the pretrained models but I am getting errors when trying to load the checkpoints.

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

I was wondering if this error is only on my end, or if there is an issue with the checkpoint files somehow?

Unable to convert the model to onnx

I'm unable to convert the model to onnx and seems the L1 loss is not supported in opset15 of onnx. Have you faced any such issue. Also may I know how you converted the model to torchscript (.pt) from pytorch file(.pth)

unable to load jit model

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/utils/misc.py", line 10, in forward
x: Tensor) -> Tensor:
model = self.model
return (model).forward(x, )
~~~~~~~~~~~~~~ <--- HERE
File "code/torch/lib/InSPyReNet.py", line 33, in forward
_2 = ops.prim.NumToTensor(torch.size(x, 3))
_3 = int(_2)
_4, _5, _6, _7, _8, = (backbone).forward(x, )
~~~~~~~~~~~~~~~~~ <--- HERE
_9 = (context1).forward(_4, )
_10 = (context2).forward(_5, )
File "code/torch/lib/backbones/SwinTransformer.py", line 42, in forward
input = torch.contiguous(_4)
input0 = torch.transpose(torch.flatten(_4, 2), 1, 2)
_13 = (_0).forward(H, W, (pos_drop).forward(input0, ), _8, _12, _7, _11, _6, _10, )
~~~~~~~~~~~ <--- HERE
_14, _15, _16, _17, _18, _19, _20, _21, _22, _23, _24, _25, = _13
_26 = torch.view((norm0).forward(_14, ), [-1, _5, _9, 128])
File "code/torch/lib/backbones/SwinTransformer.py", line 157, in forward
_135 = torch.masked_fill(attn_mask, torch.ne(attn_mask, 0), -100.)
mask = torch.masked_fill(_135, torch.eq(attn_mask, 0), 0.)
_136 = (_0).forward(argument_3, H, W, argument_4, argument_5, )
~~~~~~~~~~~ <--- HERE
_137 = (_1).forward(_136, H, W, argument_6, argument_7, mask, )
_138 = (downsample).forward(_137, H, W, argument_8, argument_9, )
File "code/torch/lib/backbones/SwinTransformer.py", line 235, in forward
_173 = torch.permute(x4, [0, 1, 3, 2, 4, 5])
x5 = torch.view(torch.contiguous(_173), [1, _158, _159, -1])
x6 = torch.view(x5, [_147, int(torch.mul(H, W)), _149])
~~~~~~~~~~ <--- HERE
_174 = (drop_path).forward()
input = torch.add(argument_1, x6)

Traceback of TorchScript, original code (most recent call last):
F:\CODE\SODandDIS\InSPyReNet-main\lib\backbones\SwinTransformer.py(244): forward
D:\ProgramData\miniconda3\envs\py39pt1121\lib\site-packages\torch\nn\modules\module.py(1118): _slow_forward
D:\ProgramData\miniconda3\envs\py39pt1121\lib\site-packages\torch\nn\modules\module.py(1130): _call_impl
F:\CODE\SODandDIS\InSPyReNet-main\utils\misc.py(25): forward
D:\ProgramData\miniconda3\envs\py39pt1121\lib\site-packages\torch\nn\modules\module.py(1118): _slow_forward
D:\ProgramData\miniconda3\envs\py39pt1121\lib\site-packages\torch\nn\modules\module.py(1130): _call_impl
D:\ProgramData\miniconda3\envs\py39pt1121\lib\site-packages\torch\jit_trace.py(967): trace_module
D:\ProgramData\miniconda3\envs\py39pt1121\lib\site-packages\torch\jit_trace.py(750): trace
F:\CODE\SODandDIS\InSPyReNet-main\run\Inference.py(67): inference
F:\CODE\SODandDIS\InSPyReNet-main\run\Inference.py(177):
RuntimeError: shape '[1, 51520, 128]' is invalid for input of size 7077888

Not getting the same quality as the hugging face web demo model

Hi @plemeri

This is to inform you that when I test the hugging face model and compare the quality against this model which is trained with LR dataset only (DUTS-TR, 384 X 384) InSPyReNet_SwinB https://github.com/plemeri/InSPyReNet/blob/main/configs/InSPyReNet_SwinB.yaml) then i am getting 2 different outputs and the quality differs and we see an outline forming and when we inference with the same image and the outline doesn't exist in the web demo model.

Can you please tell me why this is occurring as i see from one of the old thread that you say that the model that is being used in the web demo is InSPyReNet_SwinB trained with LR dataset only DUTS-TR, 384 X 384 so i dont understand why should there be any difference in the qualities.

Please see the attached original image , output from the hugging face model and output from inference Trained with LR dataset only (DUTS-TR, 384 X 384). InSPyReNet_SwinB

Also can you please let me know the versions that you are using for torch , torch vision and opencv-python and do u think that this can occur because there might be due to a change in the versions ?

Original image :

6666666

Output from Huggingface Model :

ss_test_rgba_6666666

Output from inference with InSPyReNet_SwinB which is trained with LR dataset only (DUTS-TR, 384 X 384) :

6666666

P.S. download and zoom the image boundaries to see the differences between both the outputs.

Also can you please tell me if you are doing any post processing in the hugging face demo.

Looking forward to your reply.

Fine tuning

Hey I have been fine tuning using pre-trained checkpoint . however Having issues for it to converge , I have 20k Mask and image set for background removal . I have run 30 Epochs and the results are worse than the pretrained one

Any advice on how many epoch or learning rate ?

Compressed Model

Have you tried compressing the model for mobile devices. As the model size is large. Or if you can give suggestions in terms of deploying the solution on mobile.

Training InSPyReNet_SwinB.yaml on large dataset

Hi @plemeri

I have a custom dataset of 200K+ images containing real world images of cars , products , humans, animals etc , I wanted to know how many epoch should I train for ? I am using InSPyReNet_SwinB.yaml

I would like to inform that my model still in training and it has reached currently 80 epochs , but I see that the train loss was decreasing gradually for the first 4-5 epochs , then from 5 to 80 epoch the train loss is fluctuating between 0.2 to 0.5 on each iteration .

Also I notice that few images are getting better after a few epochs and then again a few ecochs those images are getting worse. I am unable to see any consistent improvement in training .

Can you please guide me ?

Looking forward to your reply.

Error when resume training

I trained the model with my own 62k human dataset similar to DUTS with 60 epochs. The result was not desired so I decided to resume the training with more epochs but I got the following error when resuming the training.

Traceback (most recent call last):
File "run/Train.py", line 178, in
train(opt, args)
File "run/Train.py", line 140, in train
optimizer.step()
File "/root/inspyrenet/venv/lib/python3.6/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper
return wrapped(*args, **kwargs)
File "/root/inspyrenet/venv/lib/python3.6/site-packages/torch/optim/optimizer.py", line 88, in wrapper
return func(*args, **kwargs)
File "/root/inspyrenet/venv/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/root/inspyrenet/venv/lib/python3.6/site-packages/torch/optim/adam.py", line 144, in step
eps=group['eps'])
File "/root/inspyrenet/venv/lib/python3.6/site-packages/torch/optim/functional.py", line 98, in adam
param.addcdiv
(exp_avg, denom, value=-step_size)
RuntimeError: value cannot be converted to type float without overflow: (-9.99425e-08,-1.86689e-11)

Screenshot from 2023-02-14 23-26-27

Wrong links on Model Zoo page

Hello, I've noticed that there's a problem with the links on the Model ZOO page to the "Trained with massive SOD datasets". I speak only about HR SwinB line, not the LR. So

  1. Gdrive and OneDrive links are swapped
  2. The links bring to two different files with different hashes and different performance. The one from OneDrive (the column Gdrive) brings to poor results respect to the other one.

HR CustomDataset Finetuning

Hi, I have a question so I'm leaving an issue.

I currently have around 9000 HR custom dataset and I am training by adding a path to the Plus_Ultra.yaml config file. (We are also training the learning dataset specified in the Plus_Ultra.yaml.)

However, The more I train the model, the lower the performance of the model.(almost 10 epoch)

May I ask you a few questions as I want to solve the cause of a problem like this?

  1. Is it right to include LR datasets (ECSSD, FSS-1000, โ€ฆ) when creating models with 1024x1024 inputs?
  2. Is there a reason to use dynamic_resize instead of static_resize for the test part of Plus_Ultra.yaml config?

If you have any other solution, I would appreciate it if you could let me know.

No Validation Set?

Hi, I am very impressed by idea of your work!

I have a question tho..
Usually in Training supervised model, we tend to validate at the end of each epoch.

Can you describe why validation step has not been include in this work?

Thanks!

Unexpected keys in state_dict when trying inference

Hello,

I would simply like to inference the model.
Downloaded the checkpoint swin_base_patch_4_window12_384_22kto1k.pth and placed it in the snapshots folder, renamed it to latest.pth.

Running
python run/Inference.py --config configs/InSPyReNet_SwinB.yaml --source [SOURCE] --dest [DEST] --type [TYPE] --gpu --jit --verbose
gives a torch key-error when trying to load from the checkpoint. It seems that the expected architecture differs from the one in the .ckpt-file.

Thank you

train problem

thank you for your nice work.๐Ÿ‘. i try to run this code, but i got this error as follow.
RuntimeError: NCCL communicator was aborted on rank 0. Original reason for failure was: [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=719709, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1807565 milliseconds before timing out. ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1530483) of binary: /home/cenchaojun/.conda/envs/sod/bin/python3.8
what could be the reason for this๏ผŸ
best wishes to you.โค๏ธ

Unable to reproduce models and Increasing validation loss

Hello and thank you for the great work.

While working with this project I came across a few problems and I hope you could give me some suggestions.

1. Unable to reproduce models

Firstly I tried reproducing one of the LR+HR trainings, InSPyReNet_SwinB_HU (HRSOD-TR and UHRSD-TR), but I do not obtain the same results. I gathered the results in the following table:

Dataset Model Sm mae adpEm maxEm avgEm adpFm maxFm avgFm wFm mBA
DUTS-TE yours 0.939 0.0221 0.931 0.9657 0.951 0.865 0.936 0.908 0.901 0.735
DUTS-TE mine 0.882 0.0396 0.897 0.909 0.889 0.799 0.847 0.8185 0.799 0.6437
HRSOD-TE yours 0.9565 0.0173 0.9527 0.9746 0.9641 0.9090 0.9564 0.933 0.9234 0.7714
HRSOD-TE mine 0.9136 0.0322 0.9023 0.9370 0.9199 0.815 0.8934 0.8579 0.8304 0.6412
UHRSD-TE yours 0.9528 0.02038 0.9223 0.9708 0.9617 0.9029 0.9576 0.9431 0.9331 0.7897
UHRSD-TE mine 0.9202 0.0332 0.9133 0.9477 0.9316 0.8615 0.9179 0.8967 0.8713 0.6621

Although the metrics are quite close, the quality of the predictions with the model trained by me are far more inferior than the provided model. I also tried training the PlusUltraHR model and I am experiencing the same thing. Why could this happen? Why can I not reproduce the model?

2. Increasing loss during validation

Additionally, I added validation to the training script in order to monitor the model's performance during training:


for epoch in epoch_iter:


        if args.local_rank <= 0 and args.verbose is True:
            step_iter = tqdm.tqdm(enumerate(train_loader, start=1), desc='Iter', total=len(
                train_loader), position=1, leave=False, bar_format='{desc:<5.5}{percentage:3.0f}%|{bar:40}{r_bar}')
            if args.device_num > 1 and train_sampler is not None:
                train_sampler.set_epoch(epoch)
        else:
            step_iter = enumerate(train_loader, start=1)

        train_loss = []

        for i, sample in step_iter:
            optimizer.zero_grad()
            if opt.Train.Optimizer.mixed_precision is True and scaler is not None:
                with autocast():
                    sample = to_cuda(sample)
                    out = model(sample)

                scaler.scale(out['loss']).backward()
                scaler.step(optimizer)
                scaler.update()
                scheduler.step()
            else:
                sample = to_cuda(sample)
                out = model(sample)
                out['loss'].backward()
                optimizer.step()
                scheduler.step()

            if args.local_rank <= 0 and args.verbose is True:
                step_iter.set_postfix({'loss': out['loss'].item()})

            train_loss.append(out['loss'].item())

        average_loss = np.mean(train_loss)
        

        step_iter_test = enumerate(test_loader, start=1)

        # model.eval()
        df = df.append({'epoch': epoch, 'scope': 'train', 'set': 'all', 'metric': 'loss', 'value': average_loss}, ignore_index=True)
        writer.add_scalar('Train/loss', average_loss, epoch)

        mse_sum = {}
        loss_sum = {}
        count = {}

        with torch.no_grad():
            for i, sample in step_iter_test:
                sample = to_cuda(sample) #ads 50 MB to GPU memory
                set_name = sample['set'][0]
                out = model(sample)
                loss = out['loss'].detach().cpu().numpy()
                pred = to_numpy(out['pred'], sample['shape'])
                gt = to_numpy(out['gt'], sample['shape'])
                mse = compute_mse(predict=pred, alpha=gt)

                if set_name not in mse_sum:
                    mse_sum[set_name] = 0.0
                    loss_sum[set_name] = 0.0
                    count[set_name] = 0
                mse_sum[set_name] += mse
                loss_sum[set_name] += loss
                count[set_name] += 1

            for set_name in mse_sum:
                mean_mse = mse_sum[set_name] / count[set_name]
                mean_loss = loss_sum[set_name] / count[set_name]
                df = df.append({'epoch': epoch, 'scope': 'valid', 'set': set_name, 'metric': 'mse', 'value': mean_mse}, ignore_index=True)
                df = df.append({'epoch': epoch, 'scope': 'valid', 'set': set_name, 'metric': 'loss', 'value': mean_loss}, ignore_index=True)
                writer.add_scalar('Valid/' + set_name + '/mse', mean_mse, epoch)
                writer.add_scalar('Valid/' + set_name + '/loss', mean_loss, epoch)

        df_path = os.path.join(opt.Train.Checkpoint.checkpoint_dir, f'{log_id}.json')
        df.to_json(df_path, orient='records')

        model.train()

InSPyReNet_SwinB_HU training & validation

For InSPyReNet_SwinB_HU training, the validation set I used is DUTS-TE. The training loss is constantly decreasing but the validation loss starts increasing after some epochs:

image
image

My assumptions were the following: the model is overfitting or the data distribution between train sets and test set is too different.

Overfitting Check

To check if overfitting is a problem I trained a LR model (using Plus_Ultra_LR config) on 43K samples ('MSRA-10K','HRSOD-TR','HRSOD-TE','ECSSD','HKU-IS','PASCAL-S','DAVIS','UHRSD-TR','UHRSD-TE','FSS-1000','DIS5K') and validated the model after each epoch on 300 images from DUTS-TE. I chose to train a LR model and only a subset of DUTS-TE for faster training. The loss during validation is still increasing:

image

I know that overfitting occurs when the training set has a small number of samples or the model is complex. After this experiment with 43K images I doubt that overfitting is responsible for the increase in validation loss.

Difference in data distribution Check

I was also thinking that the difference of data distribution between training sets might be too big and the model struggles in finding the optimum to accommodate all cases, making it hard to generalize. To test this, I decided to train a LR model on UHRSD2K-TR only for 150 epochs and validate the model on several testing sets:

image
image
image

I was expecting the loss to decrease for UHRSD2K-TE and increase for HRSOD-TE and PASCAL-S but the validation loss increases for all tests. Along the mentioned experiments, I have trained InspyreNet with different configurations and datasets and for each one the loss during validation increases. What can be the problem? Why is the validation loss always increasing?

Image mask alignment problem

Hello author, for a recent test, I wrote an inference script myself, but the inference results between threshold value 512 and None seem to be 1-2 pixels different.

`base_size` and `stage` parameters are not used for encoder and decoder

Hi, thank you for a great work!

I was digging into the used NN architecture and found out that base_size and stage parameters are not actually used inside PAA_e and PAA_d modules. They are used to initialize stage_size parameter which is then used in SelfAttention module. And the thing is that SelfAttention does not actually use this parameter.

I am just wondering, is it some sort of a bug and maybe the whole model may be further improved or these parameters simply should be not used in these modules?

Some confusion about dynamic_resize

When to use dynamic_resize and when to use static_resize during inference.
Can I understand that dynamic_resize is for compatibility with LR and HR images during inference? But in this case, the large image may burst the gpu memory.

Model difference between Res2Net50 Backbone and Res2Net50 [DUTS-TR] trained checkpoint

Thank you for your amazing paper, I was really facinated by your works.

But when I run inference code about Res2net50, It stucked by error which is about state_dict error
SwinB works well but only for Res2net50.
I attached my run command and error.

run command
python run/Inference.py --config configs/InSPyReNet_Res2Net50.yaml --source myfolder --type rgba --gpu --verbose

error
RuntimeError: Error(s) in loading state_dict for InSPyReNet:
Unexpected key(s) in state_dict: "backbone.fc.weight", "backbone.fc.bias".

I download checkpoint by here
(https://github.com/plemeri/InSPyReNet/blob/main/docs/getting_started.md )
and checkpoint by here
(https://github.com/plemeri/InSPyReNet/blob/main/docs/model_zoo.md)

Thanks.

autocast for inference

Hi, I've been using this repo and https://github.com/plemeri/transparent-background excellent work!

Now I'm trying to use autocast to lower the inference times a bit, I couldn't get it to work correctly

What I did was replace this part of the code https://github.com/plemeri/transparent-background/blob/main/transparent_background/Remover.py#L111 with this:

with torch.no_grad():
            with autocast():
                pred = self.model(x)

the model runs apparently without errors but the result of pred is a vector of nan like this:โ€จโ€จ


tensor([[[[nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          ...,
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan]]]], device='cuda:0',
       dtype=torch.float16)

Do you have any suggestion?

Edges are not high quality when trained with Swin Tiny version

While training with SwinB (384* 384) and keeping the dynamic resize to 1280 during inference I see a high quality edges without any lack of global object saliency. But when train a Tiny version of SwinTransformer (224* 224) I see the global object saliency is lacking and not getting the high quality edges. May I know if I have to change anything for tiny versions or any suggestions here?

Multiple masks area

can i split the cloth masking to tops and bottoms?
or should training multiple models for such task?

Training Swin Transformer?

I see that there are changes in comparison with the official implementation of Swin Transformer. Also is the Swin transformer getting trained while training Inspyrenet?

Pretrained model on ECSSD dataset

I see that the code is updated for res2net but the backbone model isn't due to which the keys are mis-matched while loading. Can I get the backbone model with the updated network?

Can you recomment which layer to freeze?

Hi, plemeri

I'm trying transfer learing from ckpt_base(Plus_Ultra) with my custom dataset

My dataset include about 100 samples until now, planned to be 10,000

  1. To avoid overfitting, Can you recommend whith layer to be frozen?
  2. It would be nice to have some additional advice on hyper-parameters, but think it's better to do it after learning

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.