tsing90 / pytorch_semantic_human_matting Goto Github PK
View Code? Open in Web Editor NEWThis is an unofficial implementation of the paper "Semantic human matting":
Home Page: https://arxiv.org/pdf/1809.01354.pdf
This is an unofficial implementation of the paper "Semantic human matting":
Home Page: https://arxiv.org/pdf/1809.01354.pdf
Hi,
Your demo result is great, and I wanna reproduce results like yours by creating a new dataset.
So I wanna know how many images (or how many high-quality alpha matte) did you use to train your model?
From what I know , the DIM dataset has 202 forground humans, the SHM dataset has 34311, and the dataset in paper "A Late Fusion CNN for Digital Matting" has 228.
Also, I can relate to the fact the T-Net is pretty hard to train. So another question is that, the GT trimaps you use, are they annotated manually or dilated like in these papers?
Thanks for the great work!
Hi, I wonder when generating training data, you composite the foreground image with a bunch of background images, it is for training purpose or to enlarge your training dataset?
The paper did not explain how the composition loss works and the former implementation of @lizhengwei1992 just uses the alpha to mask on the origin image, no background image is used.
my email is [email protected] ,
i have send email to [email protected], and have no reply !
looking forward to your reply,best regards!
Hi tsing90, I have noticed the bg_list.txt. There is 18000 image paths. But the last image name is 000000581042.jpg, so I guess your total amount is at least 60w images and then you sample 1.8w here to use as bg set. I wonder where to collect so many images? How did you construct the set? Thank you.
As the title... is it normal?
Hi,could you upload the pretrained model?
I saw that lizhengwei1992 has used MobileNet as T-Net. PSPNet is much larger and is very slow. How do you think about using MobileNet?
Can I run your code without the Adobe Dataset?
My images I only have the original image and the PNG mask. Does that work?
I made the change to remove the Adobe Dataset and Coco. I'm only using a dataset with Image and Mask.
When running the code I'm getting this error.
python train.py --patch_size=400 --nEpochs=500 --save_epoch=5 --train_batch=8 --train_phase=pre_train_t_net
Namespace(continue_train=False, dataDir='./data/', debug=False, lr=0.0001, lrDecay=100, lrdecayType='keep', nEpochs=500, nThreads=4, patch_size=400, print_iter=1000, saveDir='./ckpt', save_epoch=5, trainData='human_matting_data', train_batch=8, train_phase='pre_train_t_net', without_gpu=False)
============> Building model ...
Dataset : file number 6
============> Loading datasets ...
============> Set optimizer ...
============> Start Train ! ...
epoch end, shuffle datasets again ...
epoch end, shuffle datasets again ...
epoch end, shuffle datasets again ...
epoch end, shuffle datasets again ...
epoch end, shuffle datasets again ...
Traceback (most recent call last):
File "train.py", line 379, in
main()
File "train.py", line 347, in main
loss_ = loss_ / (i+1)
UnboundLocalError: local variable 'i' referenced before assignment
when I training the end to end model , I can not load the pretrain model of t_net and m_net!
there is not define the t_path and m_path. but after I defined, there is another error occured.
请问在哪里可以下载预训练的模型呢?或者有没有好心人分享一下模型?
非常感谢!
Thank you for the great work. I am trying to understand how this project works:
If I understand correctly, the model is trained using 2 lists: one with foregrounds and one with backgrounds that are then composited on the fly during the training.
I can't seem to understand where in the train.py code does the compositing happen?
Thanks for your help
Hi tsing, I am trying to use the test.py file. There is no problem with T_net but when I try with M-net I get an error: "indices and input shapes do not match: indices [1 x 64 x 225 x 400], input [1 x 64 x 224 x 400]" when trying to do the last pooling layer of the decoder. Do you know why I am getting this 2 pixels difference?
在测试test里的m_net时候,发现代码跑不通,原因如下
1, trimap = np.eye(3)[trimap.reshape(-1)].reshape(list(trimap.shape) + [3])
这里应该是准备把一通道的trimap变成三通道,但np.eye(3)[trimap.reshape(-1)]会out of bounds,3*3的对角矩阵下标越界
2,frame_seg = seg_process(args, (tensor_img, tensor_tri), net, trimap=trimap_src)
应该把最后一个参数去掉,seg_process方法没有这个参数
您好,我想问一下关于['DIM.txt','SHM.txt']这2个文件的关系,在文中你作了一定的说明,但是我有点疑惑,我个人的理解是:这2个文件里的数据没有任何关系,都属于png格式的前景图片,只是在数量上满足一定条件而已。甚至2个文件可以合二为一,将所有图片都放到一个文件夹下,然后生成一个txt文件,只需要修改一下--dataRatio参数就可以了,请问这样理解,是对的吗?如果是对的,那分成2个txt文件的意义或者优势是什么呢?
--fgLists: a list, contains list files in which all images share the same fg-bg ratio, e.g. ['DIM.txt','SHM.txt']
@tsing90
已经完成t模型和m模型的训练,但是在test.py中测试m_net是遇到一个问题,运行命令python test.py --train_phase=pre_train_m_net,报错如下:
use GPU
Loading model from ./ckpt/pre_train_m_net/model/model_obj.pth...
torch.Size([1, 3, 635, 408]) torch.Size([1, 3, 640, 408])
Traceback (most recent call last):
File "test.py", line 159, in
main(args)
File "test.py", line 155, in main
test(args, myModel)
File "test.py", line 138, in test
frame_seg = seg_process(args, (tensor_img, tensor_tri), net)
File "test.py", line 67, in seg_process
alpha = net(inputs[0], inputs[1])
File "/home/gpower/anaconda3/envs/pytorch0.4/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/gpower/ztj/cv/Matting/pytorch_semantic_human_matting/model/network.py", line 39, in forward
m_net_input = torch.cat((input, trimap), 1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 640 and 635 in dimension 2 at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/generic/THCTensorMath.cu:87
发现图片通过t_net预测的trimap图发生了尺寸的改变,原图是(635,408),而trimap图是(640,408)我仔细看了源代码,也没有找到原因,,请问这是什么原因造成的呢?谢谢!
I really appreciate your work. Have you ever tried to replace T-Net's loss function with BCELoss, put alpha_pre and alpha_gt into it?
DIM数据集从哪可以申请下载
@tsing90 Hi, I use TNET to generate trimap,but the pixel of the trimap not only the 0,128,255 and has the others. I don't know why it is. can you give me some advice. Thank you
Were you able to reproduce or get the results close to the results claimed in the paper?
Could you show some sample results?
Thank you for sharing the implementation.
I tried several times to train the M_net, but it always failed to generate corect alpha. I find a possible mistake in the 36th line of pytorch_semantic_human_matting/model/network.py. It is written that bg, fg, unsure = torch.split(trimap, 1, dim=1). However, after generating some images to show the bg, fg and unsure, I found fg and unsure seemed to be reversed. I changed this line of code into: bg, unsure, fg = torch.split(trimap, 1, dim=1) and fixed my problem.
raw_alpha = self.conv_0(x1d)
and
alpha_r = self.m_net(m_net_input)
# fusion module
# paper : alpha_p = fs + us * alpha_r
alpha_p = fg + unsure * alpha_r
alpha_p may not in [0,1]
is that trouble?
average loss: 0.75172
saving model ....
epoch end, shuffle datasets again ...
[1 / 300] loss: 0.69093 time: 1548 average loss: 0.55106
saving model .... epoch end, shuffle datasets again ...
[2 / 300] loss: 0.53391 time: 1538
average loss: 0.47728 saving model ....epoch end, shuffle datasets again ...
[3 / 300] loss: 0.46429 time: 1537 average loss: 0.42489
saving model ....epoch end, shuffle datasets again ...
[4 / 300] loss: 0.41367 time: 1537 average loss: 0.38102
saving model .... epoch end, shuffle datasets again ...
---Ellipsis part
[46 / 300] loss: 0.07379 time: 1542
average loss: 0.07126 saving model ....epoch end, shuffle datasets again ...
When the loss dropped to 0.07, I thought there would be a good image segmentation result, so I predicted that T-net (PSPNet50) did not seem to work properly.
====================================================
In theory, T-Net (PSPNet) will have normal segmentation results, but my results don't look right, but I'm not sure where the problem is.
@tsing90 thanks!
I prepared the data according to the description of your Readme file, re-prepared the data, 180 DIM images and my own 880 collected images were processed into four-channel (RGBA) images, currently training, ask a few Questions about model training:
When I train the TNet, I feel frustrated because the train speed is too slow.
I use 'aisegmentcom-matting-human-datasets' to train this model (the number of images is about 34000), batch_size=8, nearly 4400 batches per one epoch.
Totally 300 epochs may train about 15 days.
Is it normal? Or actually 300 epochs is no necessary?
大神,请问testing的效果如何?我们在考虑是否值得尝试一下这个网络,我们有百万级的训练数据。
Hey,
Great work! I found there is a bug when loading data for end-to-end training.
In train.py, it requires data in the following format.
img, trimap_gt, alpha_gt, bg, fg = sample_batched['image'], sample_batched['trimap'], sample_batched['alpha'], sample_batched['bg'], sample_batched['fg']
However, in dataset.py, you return the data as follows.
return (img_m, trimap_m), (img, trimap, a, bg, fg)
I think we should return the second part. Is it right? Thanks,
问题描述:我训练的是trimap生成网络部分。 为了更好地训练,我在训练过程中加入了验证集,把训练数据10:1比例分为训练集和验证集。验证集是在每个训练epoch之后进行验证。下图是第50个epoch的验证结果,中间是ground truth,右边是验证结果。
然后我把第50个epoch的模型保存,然后用test.py重新加载后对同一张图片进行测试,下面是测试结果。
可以看出来测试结果要比验证集的结果差好多,请问这是什么原因呢?
下面是保存和加载模型的代码段:
保存模型:
lastest_out_path = "{}/ckpt_{}.pth".format(self.save_dir_model, epoch)
torch.save(model, lastest_out_path)
加载模型:
myModel = torch.load(model_path)
myModel.eval()
myModel.to(device)
I'm running the new train file and I'm getting this error message.
C:\pytorch_semantic_human_matting-master>python train.py --patch_size=400 --train_phase=pre_train_t_net --fgList /data/super_img.txt --bg_list /data/bg_list.txt --dataRatio [100,1]
============> Building model ...
Traceback (most recent call last):
File "train.py", line 433, in
main()
File "train.py", line 277, in main
train_data = dataset.human_matting_data(args)
File "C:\pytorch_semantic_human_matting-master\data\dataset.py", line 255, in init
assert os.path.isfile(fg_path), "missing file at {}".format(fg_path)
AssertionError: missing file at /
My directory structure of the project:
pytorch_semantic_human_matting-master
│ README.md
│ test.py
| train.py
└───model
│ │ extractors.py
│ │ M_Net.py
│ │ network.py
│ │ T_Net_psp.py
└───data
│ data_util.py
│ dataset.py
| gen_trimap.py
| super_img.txt
| super_msk.txt
| bg_list.txt
└───super_img
└───super_msk
└───background
My list files look like this:
/pytorch_semantic_human_matting-master/data/super_img/adult-black-body-costume-41667.png
/pytorch_semantic_human_matting-master/data/super_img/ache-adult-depression-expression-41253.jpg
/pytorch_semantic_human_matting-master/data/super_img/active-activity-ball-exercise-41213.jpg
/pytorch_semantic_human_matting-master/data/super_img/active-athletic-exercise-female-40974.jpg
/pytorch_semantic_human_matting-master/data/super_img/active-cold-female-girl-41371.jpg
/pytorch_semantic_human_matting-master/data/super_img/adorable-baby-beautiful-boy-41000.png
/pytorch_semantic_human_matting-master/data/super_img/adult-attractive-full-body-41215.png
/pytorch_semantic_human_matting-master/data/super_img/adult-baby-background-bump-41286.png
/pytorch_semantic_human_matting-master/data/super_img/adult-background-business-computer-53508.png
I have one per line but looks like your list file you have several files on the same line.
like the title, why pre-train M-net also?
according to the network structure in the origin paper, the output of M-Net alpha_r looks kind of like the final alpha, but the loss of alpha seems just relate to the unknown regions. Did you get the similar output of M-Net like the paper or it was more similar to the unsure layer of trimap? Cause my training result is more like the unsure layer, with most of the foreground region and background region dark.
For the code in network.py
:
class net_T(nn.Module):
# Train T_net
def __init__(self):
super(net_T, self).__init__()
self.t_net = PSPNet()
def forward(self, input):
# trimap
trimap= self.t_net(input)
return trimap
the value of trimap
is not constrained between 0/1/2, should I apply torch.argmax()
to it?
If apply torch.argmax()
to trimap
, the code for net_M
will be explainable:
# why bg/fg/unsure value is not constrained between 0/1?
bg, fg, unsure = torch.split(trimap, 1, dim=1)
# why trimap value is not constrained between 0/1/2?
m_net_input = torch.cat((input, trimap), 1)
alpha_r = self.m_net(m_net_input)
alpha_p = fg + unsure * alpha_r
Is my understanding correct?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.