plemeri / inspyrenet Goto Github PK
View Code? Open in Web Editor NEWOfficial PyTorch implementation of Revisiting Image Pyramid Structure for High Resolution Salient Object Detection (ACCV 2022)
License: MIT License
Official PyTorch implementation of Revisiting Image Pyramid Structure for High Resolution Salient Object Detection (ACCV 2022)
License: MIT License
I trained the model with my own 62k human dataset similar to DUTS with 60 epochs. The result was not desired so I decided to resume the training with more epochs but I got the following error when resuming the training.
Traceback (most recent call last):
File "run/Train.py", line 178, in
train(opt, args)
File "run/Train.py", line 140, in train
optimizer.step()
File "/root/inspyrenet/venv/lib/python3.6/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper
return wrapped(*args, **kwargs)
File "/root/inspyrenet/venv/lib/python3.6/site-packages/torch/optim/optimizer.py", line 88, in wrapper
return func(*args, **kwargs)
File "/root/inspyrenet/venv/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/root/inspyrenet/venv/lib/python3.6/site-packages/torch/optim/adam.py", line 144, in step
eps=group['eps'])
File "/root/inspyrenet/venv/lib/python3.6/site-packages/torch/optim/functional.py", line 98, in adam
param.addcdiv(exp_avg, denom, value=-step_size)
RuntimeError: value cannot be converted to type float without overflow: (-9.99425e-08,-1.86689e-11)
thank you for your nice work.👍. i try to run this code, but i got this error as follow.
RuntimeError: NCCL communicator was aborted on rank 0. Original reason for failure was: [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=719709, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1807565 milliseconds before timing out. ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1530483) of binary: /home/cenchaojun/.conda/envs/sod/bin/python3.8
what could be the reason for this?
best wishes to you.❤️
Great work! The results look very nice. I was hoping to try out the pretrained models but I am getting errors when trying to load the checkpoints.
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
I was wondering if this error is only on my end, or if there is an issue with the checkpoint files somehow?
While training with SwinB (384* 384) and keeping the dynamic resize to 1280 during inference I see a high quality edges without any lack of global object saliency. But when train a Tiny version of SwinTransformer (224* 224) I see the global object saliency is lacking and not getting the high quality edges. May I know if I have to change anything for tiny versions or any suggestions here?
I see that the code is updated for res2net but the backbone model isn't due to which the keys are mis-matched while loading. Can I get the backbone model with the updated network?
Hi @plemeri
I have a custom dataset of 200K+ images containing real world images of cars , products , humans, animals etc , I wanted to know how many epoch should I train for ? I am using InSPyReNet_SwinB.yaml
I would like to inform that my model still in training and it has reached currently 80 epochs , but I see that the train loss was decreasing gradually for the first 4-5 epochs , then from 5 to 80 epoch the train loss is fluctuating between 0.2 to 0.5 on each iteration .
Also I notice that few images are getting better after a few epochs and then again a few ecochs those images are getting worse. I am unable to see any consistent improvement in training .
Can you please guide me ?
Looking forward to your reply.
Have you tried compressing the model for mobile devices. As the model size is large. Or if you can give suggestions in terms of deploying the solution on mobile.
Hi, plemeri
I'm trying transfer learing from ckpt_base(Plus_Ultra) with my custom dataset
My dataset include about 100 samples until now, planned to be 10,000
Can I train your model with distributed training?
I see that there are changes in comparison with the official implementation of Swin Transformer. Also is the Swin transformer getting trained while training Inspyrenet?
I'm unable to convert the model to onnx and seems the L1 loss is not supported in opset15 of onnx. Have you faced any such issue. Also may I know how you converted the model to torchscript (.pt) from pytorch file(.pth)
Hi, thank you for a great work!
I was digging into the used NN architecture and found out that base_size
and stage
parameters are not actually used inside PAA_e
and PAA_d
modules. They are used to initialize stage_size
parameter which is then used in SelfAttention
module. And the thing is that SelfAttention
does not actually use this parameter.
I am just wondering, is it some sort of a bug and maybe the whole model may be further improved or these parameters simply should be not used in these modules?
Hi, I have a question so I'm leaving an issue.
I currently have around 9000 HR custom dataset and I am training by adding a path to the Plus_Ultra.yaml config file. (We are also training the learning dataset specified in the Plus_Ultra.yaml.)
However, The more I train the model, the lower the performance of the model.(almost 10 epoch)
May I ask you a few questions as I want to solve the cause of a problem like this?
If you have any other solution, I would appreciate it if you could let me know.
Thanks for the great work!
However, I found that the overflow NaN/Inf can happen when computing the attention score. The problem is that query, key in attention layers are not normalized and the bmm
can cause inf during training and inference, especially with mixed precision, float16. I suggested authors or anyone who want to train on your own dataset to edit the code at:
https://github.com/plemeri/InSPyReNet/blob/main/lib/modules/layers.py#L167
https://github.com/plemeri/InSPyReNet/blob/main/lib/modules/attention_module.py#L82
https://github.com/plemeri/InSPyReNet/blob/main/lib/modules/attention_module.py#L90
Best,
Hello! After testing, removing the background has shown much better results compared to some other models. I would like to try training my own model.
This is a new field for me, and I have a few questions. Thank you!
Can I continue from the latest.pth and fine-tune it? If so, when adding additional datasets, should I only keep the clothes and will the person be preserved when removing the background?
When fine-tuning, should I include the training config YAML and the datasets mentioned in it?
How should I prepare the dataset? I have looked at the folder structure of DIS5K, which seems to be as follows. Then I will put the data in and run train.py.
How Many should i prepared? image and masked
res is about 1028 × 1828
DIS-TR
im
gt
DIS-VD
im
gt
configs/extra_dataset/Plus_Ultra.yaml
Hi, can I know what is the reason of being NaN for mBA result only in PASCAL-S dataset?
Hi @plemeri
This is to inform you that when I test the hugging face model and compare the quality against this model which is trained with LR dataset only (DUTS-TR, 384 X 384) InSPyReNet_SwinB https://github.com/plemeri/InSPyReNet/blob/main/configs/InSPyReNet_SwinB.yaml) then i am getting 2 different outputs and the quality differs and we see an outline forming and when we inference with the same image and the outline doesn't exist in the web demo model.
Can you please tell me why this is occurring as i see from one of the old thread that you say that the model that is being used in the web demo is InSPyReNet_SwinB trained with LR dataset only DUTS-TR, 384 X 384 so i dont understand why should there be any difference in the qualities.
Please see the attached original image , output from the hugging face model and output from inference Trained with LR dataset only (DUTS-TR, 384 X 384). InSPyReNet_SwinB
Also can you please let me know the versions that you are using for torch , torch vision and opencv-python and do u think that this can occur because there might be due to a change in the versions ?
Original image :
Output from Huggingface Model :
Output from inference with InSPyReNet_SwinB which is trained with LR dataset only (DUTS-TR, 384 X 384) :
P.S. download and zoom the image boundaries to see the differences between both the outputs.
Also can you please tell me if you are doing any post processing in the hugging face demo.
Looking forward to your reply.
Hello and thank you for the great work.
While working with this project I came across a few problems and I hope you could give me some suggestions.
Firstly I tried reproducing one of the LR+HR trainings, InSPyReNet_SwinB_HU (HRSOD-TR and UHRSD-TR), but I do not obtain the same results. I gathered the results in the following table:
Dataset | Model | Sm | mae | adpEm | maxEm | avgEm | adpFm | maxFm | avgFm | wFm | mBA |
---|---|---|---|---|---|---|---|---|---|---|---|
DUTS-TE | yours | 0.939 | 0.0221 | 0.931 | 0.9657 | 0.951 | 0.865 | 0.936 | 0.908 | 0.901 | 0.735 |
DUTS-TE | mine | 0.882 | 0.0396 | 0.897 | 0.909 | 0.889 | 0.799 | 0.847 | 0.8185 | 0.799 | 0.6437 |
HRSOD-TE | yours | 0.9565 | 0.0173 | 0.9527 | 0.9746 | 0.9641 | 0.9090 | 0.9564 | 0.933 | 0.9234 | 0.7714 |
HRSOD-TE | mine | 0.9136 | 0.0322 | 0.9023 | 0.9370 | 0.9199 | 0.815 | 0.8934 | 0.8579 | 0.8304 | 0.6412 |
UHRSD-TE | yours | 0.9528 | 0.02038 | 0.9223 | 0.9708 | 0.9617 | 0.9029 | 0.9576 | 0.9431 | 0.9331 | 0.7897 |
UHRSD-TE | mine | 0.9202 | 0.0332 | 0.9133 | 0.9477 | 0.9316 | 0.8615 | 0.9179 | 0.8967 | 0.8713 | 0.6621 |
Although the metrics are quite close, the quality of the predictions with the model trained by me are far more inferior than the provided model. I also tried training the PlusUltraHR model and I am experiencing the same thing. Why could this happen? Why can I not reproduce the model?
Additionally, I added validation to the training script in order to monitor the model's performance during training:
for epoch in epoch_iter:
if args.local_rank <= 0 and args.verbose is True:
step_iter = tqdm.tqdm(enumerate(train_loader, start=1), desc='Iter', total=len(
train_loader), position=1, leave=False, bar_format='{desc:<5.5}{percentage:3.0f}%|{bar:40}{r_bar}')
if args.device_num > 1 and train_sampler is not None:
train_sampler.set_epoch(epoch)
else:
step_iter = enumerate(train_loader, start=1)
train_loss = []
for i, sample in step_iter:
optimizer.zero_grad()
if opt.Train.Optimizer.mixed_precision is True and scaler is not None:
with autocast():
sample = to_cuda(sample)
out = model(sample)
scaler.scale(out['loss']).backward()
scaler.step(optimizer)
scaler.update()
scheduler.step()
else:
sample = to_cuda(sample)
out = model(sample)
out['loss'].backward()
optimizer.step()
scheduler.step()
if args.local_rank <= 0 and args.verbose is True:
step_iter.set_postfix({'loss': out['loss'].item()})
train_loss.append(out['loss'].item())
average_loss = np.mean(train_loss)
step_iter_test = enumerate(test_loader, start=1)
# model.eval()
df = df.append({'epoch': epoch, 'scope': 'train', 'set': 'all', 'metric': 'loss', 'value': average_loss}, ignore_index=True)
writer.add_scalar('Train/loss', average_loss, epoch)
mse_sum = {}
loss_sum = {}
count = {}
with torch.no_grad():
for i, sample in step_iter_test:
sample = to_cuda(sample) #ads 50 MB to GPU memory
set_name = sample['set'][0]
out = model(sample)
loss = out['loss'].detach().cpu().numpy()
pred = to_numpy(out['pred'], sample['shape'])
gt = to_numpy(out['gt'], sample['shape'])
mse = compute_mse(predict=pred, alpha=gt)
if set_name not in mse_sum:
mse_sum[set_name] = 0.0
loss_sum[set_name] = 0.0
count[set_name] = 0
mse_sum[set_name] += mse
loss_sum[set_name] += loss
count[set_name] += 1
for set_name in mse_sum:
mean_mse = mse_sum[set_name] / count[set_name]
mean_loss = loss_sum[set_name] / count[set_name]
df = df.append({'epoch': epoch, 'scope': 'valid', 'set': set_name, 'metric': 'mse', 'value': mean_mse}, ignore_index=True)
df = df.append({'epoch': epoch, 'scope': 'valid', 'set': set_name, 'metric': 'loss', 'value': mean_loss}, ignore_index=True)
writer.add_scalar('Valid/' + set_name + '/mse', mean_mse, epoch)
writer.add_scalar('Valid/' + set_name + '/loss', mean_loss, epoch)
df_path = os.path.join(opt.Train.Checkpoint.checkpoint_dir, f'{log_id}.json')
df.to_json(df_path, orient='records')
model.train()
For InSPyReNet_SwinB_HU training, the validation set I used is DUTS-TE. The training loss is constantly decreasing but the validation loss starts increasing after some epochs:
My assumptions were the following: the model is overfitting or the data distribution between train sets and test set is too different.
To check if overfitting is a problem I trained a LR model (using Plus_Ultra_LR config) on 43K samples ('MSRA-10K','HRSOD-TR','HRSOD-TE','ECSSD','HKU-IS','PASCAL-S','DAVIS','UHRSD-TR','UHRSD-TE','FSS-1000','DIS5K') and validated the model after each epoch on 300 images from DUTS-TE. I chose to train a LR model and only a subset of DUTS-TE for faster training. The loss during validation is still increasing:
I know that overfitting occurs when the training set has a small number of samples or the model is complex. After this experiment with 43K images I doubt that overfitting is responsible for the increase in validation loss.
I was also thinking that the difference of data distribution between training sets might be too big and the model struggles in finding the optimum to accommodate all cases, making it hard to generalize. To test this, I decided to train a LR model on UHRSD2K-TR only for 150 epochs and validate the model on several testing sets:
I was expecting the loss to decrease for UHRSD2K-TE and increase for HRSOD-TE and PASCAL-S but the validation loss increases for all tests. Along the mentioned experiments, I have trained InspyreNet with different configurations and datasets and for each one the loss during validation increases. What can be the problem? Why is the validation loss always increasing?
Hello author, for a recent test, I wrote an inference script myself, but the inference results between threshold value 512 and None seem to be 1-2 pixels different.
Thank you for your great work. I want to train this on my custom dataset, but my images have width much higher than height, like 1300*48. Is that okay that I use the original config on my dataset , because the original config is for square image.
Hi, I've been using this repo and https://github.com/plemeri/transparent-background excellent work!
Now I'm trying to use autocast to lower the inference times a bit, I couldn't get it to work correctly
What I did was replace this part of the code https://github.com/plemeri/transparent-background/blob/main/transparent_background/Remover.py#L111 with this:
with torch.no_grad():
with autocast():
pred = self.model(x)
the model runs apparently without errors but the result of pred is a vector of nan like this:
tensor([[[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]]]], device='cuda:0',
dtype=torch.float16)
Do you have any suggestion?
Hello, where can I get the latest training file? (I mean the latest.pth file)
@plemeri First of all thank you for the work. Its really awesome
I am trying to use this but i am getting these errors and I cannot figure out from where we are using these. Can you please help me out.. This is in the backbones/SwinTransformer file
Hey I have been fine tuning using pre-trained checkpoint . however Having issues for it to converge , I have 20k Mask and image set for background removal . I have run 30 Epochs and the results are worse than the pretrained one
Any advice on how many epoch or learning rate ?
Can you tell me the web-demo's checkpoint? I try to infer with these checkpoints
But I can't attach the same result of web-demo
Looking forward to your response!
can i split the cloth masking to tops and bottoms?
or should training multiple models for such task?
Hi, I see that the model zoo doesn't have Res2Net50 pretrained checkpoint except for DUTS dataset.
Or can you provide relevant training configurations?
Hello, I've noticed that there's a problem with the links on the Model ZOO page to the "Trained with massive SOD datasets". I speak only about HR SwinB line, not the LR. So
I found out that you've not inherit nn.Module in ImagePyramid.
was there a specific reason for this?
thanks!
Hello,
I would simply like to inference the model.
Downloaded the checkpoint swin_base_patch_4_window12_384_22kto1k.pth
and placed it in the snapshots folder, renamed it to latest.pth
.
Running
python run/Inference.py --config configs/InSPyReNet_SwinB.yaml --source [SOURCE] --dest [DEST] --type [TYPE] --gpu --jit --verbose
gives a torch key-error when trying to load from the checkpoint. It seems that the expected architecture differs from the one in the .ckpt-file.
Thank you
I'm trying to use on some HR photography where the hair super important.
Any suggestions on model adjustments, additional training or would subsampling work?
When to use dynamic_resize and when to use static_resize during inference.
Can I understand that dynamic_resize is for compatibility with LR and HR images during inference? But in this case, the large image may burst the gpu memory.
Thank you for your great work
When using mixed_precision: True
InSPyReNet/configs/InSPyReNet_SwinB.yaml
Line 39 in bfe0819
Do you have an idea how to fix this issue?
Thanks
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/utils/misc.py", line 10, in forward
x: Tensor) -> Tensor:
model = self.model
return (model).forward(x, )
~~~~~~~~~~~~~~ <--- HERE
File "code/torch/lib/InSPyReNet.py", line 33, in forward
_2 = ops.prim.NumToTensor(torch.size(x, 3))
_3 = int(_2)
_4, _5, _6, _7, _8, = (backbone).forward(x, )
~~~~~~~~~~~~~~~~~ <--- HERE
_9 = (context1).forward(_4, )
_10 = (context2).forward(_5, )
File "code/torch/lib/backbones/SwinTransformer.py", line 42, in forward
input = torch.contiguous(_4)
input0 = torch.transpose(torch.flatten(_4, 2), 1, 2)
_13 = (_0).forward(H, W, (pos_drop).forward(input0, ), _8, _12, _7, _11, _6, _10, )
~~~~~~~~~~~ <--- HERE
_14, _15, _16, _17, _18, _19, _20, _21, _22, _23, _24, _25, = _13
_26 = torch.view((norm0).forward(_14, ), [-1, _5, _9, 128])
File "code/torch/lib/backbones/SwinTransformer.py", line 157, in forward
_135 = torch.masked_fill(attn_mask, torch.ne(attn_mask, 0), -100.)
mask = torch.masked_fill(_135, torch.eq(attn_mask, 0), 0.)
_136 = (_0).forward(argument_3, H, W, argument_4, argument_5, )
~~~~~~~~~~~ <--- HERE
_137 = (_1).forward(_136, H, W, argument_6, argument_7, mask, )
_138 = (downsample).forward(_137, H, W, argument_8, argument_9, )
File "code/torch/lib/backbones/SwinTransformer.py", line 235, in forward
_173 = torch.permute(x4, [0, 1, 3, 2, 4, 5])
x5 = torch.view(torch.contiguous(_173), [1, _158, _159, -1])
x6 = torch.view(x5, [_147, int(torch.mul(H, W)), _149])
~~~~~~~~~~ <--- HERE
_174 = (drop_path).forward()
input = torch.add(argument_1, x6)
Traceback of TorchScript, original code (most recent call last):
F:\CODE\SODandDIS\InSPyReNet-main\lib\backbones\SwinTransformer.py(244): forward
D:\ProgramData\miniconda3\envs\py39pt1121\lib\site-packages\torch\nn\modules\module.py(1118): _slow_forward
D:\ProgramData\miniconda3\envs\py39pt1121\lib\site-packages\torch\nn\modules\module.py(1130): _call_impl
F:\CODE\SODandDIS\InSPyReNet-main\utils\misc.py(25): forward
D:\ProgramData\miniconda3\envs\py39pt1121\lib\site-packages\torch\nn\modules\module.py(1118): _slow_forward
D:\ProgramData\miniconda3\envs\py39pt1121\lib\site-packages\torch\nn\modules\module.py(1130): _call_impl
D:\ProgramData\miniconda3\envs\py39pt1121\lib\site-packages\torch\jit_trace.py(967): trace_module
D:\ProgramData\miniconda3\envs\py39pt1121\lib\site-packages\torch\jit_trace.py(750): trace
F:\CODE\SODandDIS\InSPyReNet-main\run\Inference.py(67): inference
F:\CODE\SODandDIS\InSPyReNet-main\run\Inference.py(177):
RuntimeError: shape '[1, 51520, 128]' is invalid for input of size 7077888
Hi @plemeri
Thank you for your work.
I tried training the swin b model from scratch with my custom dataset which is having 420000 images of humans , products , cars . etc (210000 * 2 Horizontal Flip ) , after a few epoch of training it starts giving weird output and the accuracy drops and the output from the model becomes very poor.
I had changed the batch size to 8 in order to train the model little quicker but the results started to get very horrible after a few epochs. Can you please tell me why this is occurring ?
Please see the attached input and output images for a better understanding of the problem.
image-compare.pdf
Thanks :)
inspyrenet beats birefnet, mvanet, unet,... in my use cases, only being inferior in very few cases. Please keep improving and creating better models! Thank you so much!
I'm currently using transparent-background, "Plus_Ultra" and base-nightly
Hi there!
Thank you for your great work on this project.
I was wondering if InSPyReNet could take in a pre-computed mask instead of generating one on its own.
InSPyReNet excels at fine details, but sometimes does not include the objects I would like in the image.
I have a list of images along with their trimaps, and would like for InSPyReNet to refine the edges for a smooth background removal.
Is this possible, and if so, what would be a good way to take on this challenge?
Thank you!
Hi, I am very impressed by idea of your work!
I have a question tho..
Usually in Training supervised model, we tend to validate at the end of each epoch.
Can you describe why validation step has not been include in this work?
Thanks!
Thank you for your amazing paper, I was really facinated by your works.
But when I run inference code about Res2net50, It stucked by error which is about state_dict error
SwinB works well but only for Res2net50.
I attached my run command and error.
run command
python run/Inference.py --config configs/InSPyReNet_Res2Net50.yaml --source myfolder --type rgba --gpu --verbose
error
RuntimeError: Error(s) in loading state_dict for InSPyReNet:
Unexpected key(s) in state_dict: "backbone.fc.weight", "backbone.fc.bias".
I download checkpoint by here
(https://github.com/plemeri/InSPyReNet/blob/main/docs/getting_started.md )
and checkpoint by here
(https://github.com/plemeri/InSPyReNet/blob/main/docs/model_zoo.md)
Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.