GithubHelp home page GithubHelp logo

tsing90 / pytorch_semantic_human_matting Goto Github PK

View Code? Open in Web Editor NEW
83.0 83.0 18.0 2.82 MB

This is an unofficial implementation of the paper "Semantic human matting":

Home Page: https://arxiv.org/pdf/1809.01354.pdf

Python 100.00%

pytorch_semantic_human_matting's People

Contributors

tsing90 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch_semantic_human_matting's Issues

How big is your dataset?

Hi,
Your demo result is great, and I wanna reproduce results like yours by creating a new dataset.
So I wanna know how many images (or how many high-quality alpha matte) did you use to train your model?
From what I know , the DIM dataset has 202 forground humans, the SHM dataset has 34311, and the dataset in paper "A Late Fusion CNN for Digital Matting" has 228.
Also, I can relate to the fact the T-Net is pretty hard to train. So another question is that, the GT trimaps you use, are they annotated manually or dilated like in these papers?
Thanks for the great work!

Background composition

Hi, I wonder when generating training data, you composite the foreground image with a bunch of background images, it is for training purpose or to enlarge your training dataset?

The paper did not explain how the composition loss works and the former implementation of @lizhengwei1992 just uses the alpha to mask on the origin image, no background image is used.

How did you construct bg image set

Hi tsing90, I have noticed the bg_list.txt. There is 18000 image paths. But the last image name is 000000581042.jpg, so I guess your total amount is at least 60w images and then you sample 1.8w here to use as bg set. I wonder where to collect so many images? How did you construct the set? Thank you.

Dataset

Can I run your code without the Adobe Dataset?

My images I only have the original image and the PNG mask. Does that work?

about m_net output

Hi, thanks for your great work.
I have a question, when i'm training m_net, why I got the output like this

111

here is my alpha label and input trimap.

2

3

Error running train.

I made the change to remove the Adobe Dataset and Coco. I'm only using a dataset with Image and Mask.

When running the code I'm getting this error.

python train.py --patch_size=400 --nEpochs=500 --save_epoch=5 --train_batch=8 --train_phase=pre_train_t_net
Namespace(continue_train=False, dataDir='./data/', debug=False, lr=0.0001, lrDecay=100, lrdecayType='keep', nEpochs=500, nThreads=4, patch_size=400, print_iter=1000, saveDir='./ckpt', save_epoch=5, trainData='human_matting_data', train_batch=8, train_phase='pre_train_t_net', without_gpu=False)
============> Building model ...
Dataset : file number 6
============> Loading datasets ...
============> Set optimizer ...
============> Start Train ! ...
epoch end, shuffle datasets again ...
epoch end, shuffle datasets again ...
epoch end, shuffle datasets again ...
epoch end, shuffle datasets again ...
epoch end, shuffle datasets again ...
Traceback (most recent call last):
File "train.py", line 379, in
main()
File "train.py", line 347, in main
loss_ = loss_ / (i+1)
UnboundLocalError: local variable 'i' referenced before assignment

预训练的模型

请问在哪里可以下载预训练的模型呢?或者有没有好心人分享一下模型?
非常感谢!

训练M-net的中间结果

image
训练4000张左右的图片,从一开始训练到最后就一直会有右图这样毛躁的结果,会是什么原因?

compositing on the fly

Thank you for the great work. I am trying to understand how this project works:
If I understand correctly, the model is trained using 2 lists: one with foregrounds and one with backgrounds that are then composited on the fly during the training.

I can't seem to understand where in the train.py code does the compositing happen?
Thanks for your help

test M net

Hi tsing, I am trying to use the test.py file. There is no problem with T_net but when I try with M-net I get an error: "indices and input shapes do not match: indices [1 x 64 x 225 x 400], input [1 x 64 x 224 x 400]" when trying to do the last pooling layer of the decoder. Do you know why I am getting this 2 pixels difference?

test m_net

在测试test里的m_net时候,发现代码跑不通,原因如下
1, trimap = np.eye(3)[trimap.reshape(-1)].reshape(list(trimap.shape) + [3])
这里应该是准备把一通道的trimap变成三通道,但np.eye(3)[trimap.reshape(-1)]会out of bounds,3*3的对角矩阵下标越界

2,frame_seg = seg_process(args, (tensor_img, tensor_tri), net, trimap=trimap_src)
应该把最后一个参数去掉,seg_process方法没有这个参数

Predict alpha result

After trained, I got alpha results like this:
1803280224-00000154
Then, Is this correct alpha result? And, how can I get the final result like yours
image

一个关于数据准备的问题

您好,我想问一下关于['DIM.txt','SHM.txt']这2个文件的关系,在文中你作了一定的说明,但是我有点疑惑,我个人的理解是:这2个文件里的数据没有任何关系,都属于png格式的前景图片,只是在数量上满足一定条件而已。甚至2个文件可以合二为一,将所有图片都放到一个文件夹下,然后生成一个txt文件,只需要修改一下--dataRatio参数就可以了,请问这样理解,是对的吗?如果是对的,那分成2个txt文件的意义或者优势是什么呢?
--fgLists: a list, contains list files in which all images share the same fg-bg ratio, e.g. ['DIM.txt','SHM.txt']

关于test.py中测试t_net的一个问题

@tsing90
已经完成t模型和m模型的训练,但是在test.py中测试m_net是遇到一个问题,运行命令python test.py --train_phase=pre_train_m_net,报错如下:
use GPU
Loading model from ./ckpt/pre_train_m_net/model/model_obj.pth...
torch.Size([1, 3, 635, 408]) torch.Size([1, 3, 640, 408])
Traceback (most recent call last):
File "test.py", line 159, in
main(args)
File "test.py", line 155, in main
test(args, myModel)
File "test.py", line 138, in test
frame_seg = seg_process(args, (tensor_img, tensor_tri), net)
File "test.py", line 67, in seg_process
alpha = net(inputs[0], inputs[1])
File "/home/gpower/anaconda3/envs/pytorch0.4/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/gpower/ztj/cv/Matting/pytorch_semantic_human_matting/model/network.py", line 39, in forward
m_net_input = torch.cat((input, trimap), 1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 640 and 635 in dimension 2 at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/generic/THCTensorMath.cu:87

发现图片通过t_net预测的trimap图发生了尺寸的改变,原图是(635,408),而trimap图是(640,408)我仔细看了源代码,也没有找到原因,,请问这是什么原因造成的呢?谢谢!

One question

I really appreciate your work. Have you ever tried to replace T-Net's loss function with BCELoss, put alpha_pre and alpha_gt into it?

generate trimap question

@tsing90 Hi, I use TNET to generate trimap,but the pixel of the trimap not only the 0,128,255 and has the others. I don't know why it is. can you give me some advice. Thank you

Qualitative results

Were you able to reproduce or get the results close to the results claimed in the paper?
Could you show some sample results?

Thank you for sharing the implementation.

May be a small bug...

I tried several times to train the M_net, but it always failed to generate corect alpha. I find a possible mistake in the 36th line of pytorch_semantic_human_matting/model/network.py. It is written that bg, fg, unsure = torch.split(trimap, 1, dim=1). However, after generating some images to show the bg, fg and unsure, I found fg and unsure seemed to be reversed. I changed this line of code into: bg, unsure, fg = torch.split(trimap, 1, dim=1) and fixed my problem.

M net output

raw_alpha = self.conv_0(x1d)

and

alpha_r = self.m_net(m_net_input)

# fusion module

# paper : alpha_p = fs + us * alpha_r

alpha_p = fg + unsure * alpha_r

alpha_p may not in [0,1]

is that trouble?

predict abnormal results of T-net

  • I use data enhancement to generate 3000 + pairs of images and mask images to train T-net. Losses are as follows

average loss: 0.75172
saving model ....
epoch end, shuffle datasets again ...
[1 / 300] loss: 0.69093 time: 1548 average loss: 0.55106
saving model .... epoch end, shuffle datasets again ...
[2 / 300] loss: 0.53391 time: 1538
average loss: 0.47728 saving model ....epoch end, shuffle datasets again ...
[3 / 300] loss: 0.46429 time: 1537 average loss: 0.42489
saving model ....epoch end, shuffle datasets again ...
[4 / 300] loss: 0.41367 time: 1537 average loss: 0.38102
saving model .... epoch end, shuffle datasets again ...
---Ellipsis part
[46 / 300] loss: 0.07379 time: 1542
average loss: 0.07126 saving model ....epoch end, shuffle datasets again ...

  • When the loss dropped to 0.07, I thought there would be a good image segmentation result, so I predicted that T-net (PSPNet50) did not seem to work properly.
    4
    ====================================================

  • Change the training load data code to:
    4

  • In theory, T-Net (PSPNet) will have normal segmentation results, but my results don't look right, but I'm not sure where the problem is.

Train time and Train loss

@tsing90 thanks!
I prepared the data according to the description of your Readme file, re-prepared the data, 180 DIM images and my own 880 collected images were processed into four-channel (RGBA) images, currently training, ask a few Questions about model training:

  1. How long have you trained T-Net training? How much do you think the loss will fall to get a robust T-Net model?
  2. How long did it take to train M-net, and how long did you train in end-to-end mode?

I feel it is slow when training.

When I train the TNet, I feel frustrated because the train speed is too slow.
I use 'aisegmentcom-matting-human-datasets' to train this model (the number of images is about 34000), batch_size=8, nearly 4400 batches per one epoch.
Totally 300 epochs may train about 15 days.
Is it normal? Or actually 300 epochs is no necessary?

请问效果如何

大神,请问testing的效果如何?我们在考虑是否值得尝试一下这个网络,我们有百万级的训练数据。

训练T网络效果不好

作者,你好,非常感谢你的SHM,我在训练T网络的时候 输出中间结果效果不错,训练的[32 / 300] loss: 0.05447;但是测试的时候效果很差
训练中间的trimap图:
trimap_0_111
测试结果:
20190724180817

我基本上没改源码,修改了dataset.py;请问作者知道原因吗?谢谢

loading data for e2e training

Hey,

Great work! I found there is a bug when loading data for end-to-end training.

In train.py, it requires data in the following format.

img, trimap_gt, alpha_gt, bg, fg = sample_batched['image'], sample_batched['trimap'], sample_batched['alpha'], sample_batched['bg'], sample_batched['fg']

However, in dataset.py, you return the data as follows.

return (img_m, trimap_m), (img, trimap, a, bg, fg)

I think we should return the second part. Is it right? Thanks,

why test result is very different from train an validation result?

问题描述:我训练的是trimap生成网络部分。 为了更好地训练,我在训练过程中加入了验证集,把训练数据10:1比例分为训练集和验证集。验证集是在每个训练epoch之后进行验证。下图是第50个epoch的验证结果,中间是ground truth,右边是验证结果。
微信图片_20190415010053
然后我把第50个epoch的模型保存,然后用test.py重新加载后对同一张图片进行测试,下面是测试结果。
微信图片_20190415010105
可以看出来测试结果要比验证集的结果差好多,请问这是什么原因呢?
下面是保存和加载模型的代码段:
保存模型:
lastest_out_path = "{}/ckpt_{}.pth".format(self.save_dir_model, epoch)
torch.save(model, lastest_out_path)
加载模型:
myModel = torch.load(model_path)
myModel.eval()
myModel.to(device)

M-Net loss too small

Hi, when I do the pretrain_m_net stage, my losses are extrmely small, is that wrong?
image

Error on the new Train file.

I'm running the new train file and I'm getting this error message.

C:\pytorch_semantic_human_matting-master>python train.py --patch_size=400 --train_phase=pre_train_t_net --fgList /data/super_img.txt --bg_list /data/bg_list.txt --dataRatio [100,1]
============> Building model ...
Traceback (most recent call last):
File "train.py", line 433, in
main()
File "train.py", line 277, in main
train_data = dataset.human_matting_data(args)
File "C:\pytorch_semantic_human_matting-master\data\dataset.py", line 255, in init
assert os.path.isfile(fg_path), "missing file at {}".format(fg_path)
AssertionError: missing file at /

My directory structure of the project:

pytorch_semantic_human_matting-master
│ README.md
│ test.py
| train.py

└───model
│ │ extractors.py
│ │ M_Net.py
│ │ network.py
│ │ T_Net_psp.py

└───data
│ data_util.py
│ dataset.py
| gen_trimap.py
| super_img.txt
| super_msk.txt
| bg_list.txt
└───super_img
└───super_msk
└───background

My list files look like this:
/pytorch_semantic_human_matting-master/data/super_img/adult-black-body-costume-41667.png
/pytorch_semantic_human_matting-master/data/super_img/ache-adult-depression-expression-41253.jpg
/pytorch_semantic_human_matting-master/data/super_img/active-activity-ball-exercise-41213.jpg
/pytorch_semantic_human_matting-master/data/super_img/active-athletic-exercise-female-40974.jpg
/pytorch_semantic_human_matting-master/data/super_img/active-cold-female-girl-41371.jpg
/pytorch_semantic_human_matting-master/data/super_img/adorable-baby-beautiful-boy-41000.png
/pytorch_semantic_human_matting-master/data/super_img/adult-attractive-full-body-41215.png
/pytorch_semantic_human_matting-master/data/super_img/adult-baby-background-bump-41286.png
/pytorch_semantic_human_matting-master/data/super_img/adult-background-business-computer-53508.png

I have one per line but looks like your list file you have several files on the same line.

how does alpha_r look like?

according to the network structure in the origin paper, the output of M-Net alpha_r looks kind of like the final alpha, but the loss of alpha seems just relate to the unknown regions. Did you get the similar output of M-Net like the paper or it was more similar to the unsure layer of trimap? Cause my training result is more like the unsure layer, with most of the foreground region and background region dark.

question about `network.py`

For the code in network.py :

class net_T(nn.Module):
    # Train T_net
    def __init__(self):

        super(net_T, self).__init__()

        self.t_net = PSPNet()

    def forward(self, input):
        
    	# trimap
        trimap= self.t_net(input)
        return trimap

the value of trimap is not constrained between 0/1/2, should I apply torch.argmax() to it?

If apply torch.argmax() to trimap, the code for net_M will be explainable:

# why bg/fg/unsure value is not constrained  between 0/1?
bg, fg, unsure = torch.split(trimap, 1, dim=1)
# why trimap value is not constrained between 0/1/2?
m_net_input = torch.cat((input, trimap), 1)
alpha_r = self.m_net(m_net_input)
alpha_p = fg + unsure * alpha_r

Is my understanding correct?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.