csailvision / sceneparsing Goto Github PK
View Code? Open in Web Editor NEWDevelopment kit for MIT Scene Parsing Benchmark
Home Page: http://sceneparsing.csail.mit.edu
License: BSD 3-Clause "New" or "Revised" License
Development kit for MIT Scene Parsing Benchmark
Home Page: http://sceneparsing.csail.mit.edu
License: BSD 3-Clause "New" or "Revised" License
I couldn't access http://sceneparsing.csail.mit.edu/eval/ to evaluate the test results. Is the server down?
I can not download dataset from server. Is the server under maintenance?
The images in annotation/ and prediction/ in sampleData/ seems to be the same... Is there a mistake?
Is there a license for the ADE20K dataset?
The license for this repository covers the code, but not the dataset.
Thank you
I have tried several images with the online segmentation demo and surprisingly it works quite well! I would like to ask which method does this use, FCN, SegNet, DialtedNet, or ensembles with XXX ?
Hi,
I am trying to reproduce DilatedNet.
However, my training results show that
pixel acc : 72.4%
mean acc: 38.6%
mean iou: 28.7%.
Further training does not show improvement.
I am using a pre-trained net and multiple gpus with mini-batch size of 8. I did not use augmentations as the paper do not explain what augmentations are used. I expect that augmentation does affect the results at a small amount, otherwise you probably present augmentations in the paper.
(1) Could you explain what augmentations are used and how much does it improve results?
(2) Could you provide training and validation log files?
Thank you so much.
In the file "convertFromADE/mapFromADE.txt", the characters between the first column and the second column vary at different rows. At some lines they are tabs and at some lines they are spaces. Thus, it would meet problems when users want to use split string function.
I think you'd better replace this file with consistent split characters. It wont be too much work.
Use the pre-trained model, I can't figure out the pixel-wise accuracy posted here. I wonder whether the resize on annotation image may heart the result
Hi! I didn't find the testset download link in official website. So where I can download the testset? Looking fowrard to your reply. Thanks!
Hi,
Recently, I try to use the caffe training code to train own FCN model, but an odd problem paused me.
When I call binary caffe command like:
caffe train -gpu 0 -solver solver_FCN.prototxt
And then, I get the following error:
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 566256237
I0620 14:34:02.241405 19481 net.cpp:744] Ignoring source layer fc6
I0620 14:34:02.241425 19481 net.cpp:744] Ignoring source layer fc7
I0620 14:34:03.245385 19481 solver.cpp:218] Iteration 0 (6.64999e-08 iter/s, 0.997409s/40 iters), loss = 1.27303e+06
I0620 14:34:03.245477 19481 solver.cpp:237] Train net output #0: loss = 1.27303e+06 (* 1 = 1.27303e+06 loss)
I0620 14:34:03.245496 19481 sgd_solver.cpp:105] Iteration 0, lr = 1e-10
I0620 14:34:20.222455 19481 solver.cpp:218] Iteration 40 (2.35623 iter/s, 16.9763s/40 iters), loss = 1.06719e+06
I0620 14:34:20.222501 19481 solver.cpp:237] Train net output #0: loss = 1.30907e+06 (* 1 = 1.30907e+06 loss)
I0620 14:34:20.222533 19481 sgd_solver.cpp:105] Iteration 40, lr = 1e-10
F0620 14:34:24.133394 19481 syncedmem.cpp:71] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
Aborted (core dumped)
I have checked my GPU, and find no problem.
Is there anyone has the same problem?
Regards,
When I excute this code, I get a error:
[libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 5: 15 Message type "caffe.LayerParameter" has no field named "input_param".
I know, this is seem a error of CAFFE, but when I change the deploy, the error gone.
input: "data"
input_dim: 1
input_dim: 3
input_dim: 513
input_dim: 513
But, I get another question. If I want to use the prototxt with CRF, how to setting the prototxt, there a special parameter "data_dim".
layer {
name: "data"
type: "ImageSegData"
top: "data"
top: "label"
top: "data_dim"
......................
I hope your help, thank you!
The model is currently not downloadable at http://sceneparsing.csail.mit.edu/model/Dilated_iter_120000.caffemodel (throws a 404 not found error), please check the server.
Thanks.
As we all know, SegNet is available in public. However, Cascade-Segnet and Cascade-DilatedNet are both reported as state-of-the-art. Can some please explain what is the difference between SegNet and Cascade-SegNet?
Hi
I wish to relabel the indoor images of ADE20k Dataset of 150 class labels into smaller number of categories e.g. floor, wall, furniture, person, stairs etc. I am having a hard time finding a file that provides class to color correspondence. I would be very grateful if anyone can help me with that so I can proceed with relabelling. Thanks.
Hi,
Are you going to release the pre-trained weights for one of the supported models in pytorch?
Thanks!
How exactly are the 150 object classes being encoded using the Red and Green channels?
Using the offered DilatedNet_iter_120000.caffemodel model and demoSegmentation.m script, I am unable to reproduce the qualitative test results posted at the bottom of README file (see here https://github.com/CSAILVision/sceneparsing#pre-trained-models-on-going). Here are two examples I got using DilatedNet caffe model.
Are the released models exactly the same with the ones you are using?
Hi! The data URLs return the 404 Not Found
error. Is the server down?
After I ran the code,
I output the array "imPred" after line 66 in "demoSegmentation.m",
% imPred = net.forward({im_inp});
the imPred{1} array is 384 * 384 * 151,
so I expect to get the probability of each pixel belongs to each class,
for instance, 0.8, 0.53, 0.01...etc, which are between 0~1.
However the numbers I got from imPred{1} were like -1.2331, 3.0104, -0.7758, 10.1961...etc,
so I was wondering if these numbers can convert to probabilities,
and how to convert them?
Thank you.
Can you please tell where can I get the required caffemodels?
Hi,
I'd like to know if you have the scene names for the test data and I can evaluate my trained model using it. As far as I know, the original dataset (released here) have scene names for each training/validation data. But, the test data which can be downloaded from the above site don't contain scene names.
I think the original dataset is novel because it has wide variety of segmentation labels and also each image belongs to scene. Therefore, I believe it would be great if you could provide us with the scene names as classification labels for each test data. Of course, if we can evaluate the performance of classification against test data, it would be enough. (It means you don't need to make the scene names of test data public.)
if you could kindly consider it, I would be grateful. Also, if this post is not appropriate here, please make it close.
Thanks,
Hello!
Great work and thanks for releasing the dataset! I was curious what the license is for the dataset?
Thanks!
The model files which were referenced by the project (previously hosted at http://sceneparsing.csail.mit.edu/ and at http://sceneparsing.csail.mit.edu/model/pytorch/) are gone. Is there another place to find these files?
So I have a question about the ADE20k itself.
I read all the seg-masks from the training set (~15k files) and counted the number of unique class values. I got 2231 unique values, where the highest value is 3144. This makes no sense as the number of classes is supposed to be 150.
I'm using this code to load the *_seg.png files in Python (adapted from the Matlab code on the dataset site):
mask = np.array(Image.open(mask_path), dtype=np.uint16)
R,G,B = mask[:,:,0], mask[:,:,1], mask[:,:,2]
class_mask = R // 10 * 256 + G
For starters, great work!
I want to find out which are the classes present in an image. But I am new to both neural networks and matlab code, and I have a hard time understanding how to use sceneparsing/visualizationCode/.
How can I do that?
Thank you
Hi
I have been trying so create more training data with the object which are not accurately detected to improve the segmentation, but i can't figure out which color encoding to use which will give me single channel masking like yours. I have tried to pass same colored annotations with the encoding given in color150.mat file but the color encoding used in the original ADE20K dataset annotation are different.
In the below below images the floor color encoding are different. The green floor one will give the single channel input as required but the brown floor one will give only black mask. So can anyone tell me how to get the correct color encoding to pass it from https://github.com/CSAILVision/sceneparsing/blob/master/convertFromADE/convertFromADE.m to get annotations
I tried to reproduce the result with provided data layer, training network and solver parameters but fail to produce even close result as provided DilatedNet model [http://sceneparsing.csail.mit.edu/model/DilatedNet_iter_120000.caffemodel].
A test run on validation images with model of 120000 iteration gives me following stats:
However, the reported baseline performance is (73.6, 44.6, 32.3, 60.1)
I wonder what is going wrong and what I should do to have a matching result?
[The training images are resized to 384x384 and mirrored to match the setting of author's]
Is there a pre-processing script for the torch training code that you can make publically available ? i.e. for generating the h5 file and the json files.
I am trying to understand where 'Ratio', 'Train', and 'Val' come from in the objectInfo150.txt file.
Presumably the 'Ratio' is the pixel ratio of each category presented over all the images. I tried to reproduce the number for the 'wall' category by 1) counting the number of pixels labelled with '1' for each image, divided by the total number of pixels in the image, then averaged over total number of images under the training/validation set separately/altogether; 2) similar to 1) but averaged over the sum of total number of pixels in all images. Neither approach successfully reproduce the number(around 0.1 off 0.1576)
I guess the numbers under 'Train' and 'Val' are the instance counts for each category? For this I simply count if the category 'wall' is present in every image under the training and validation sets. Since 'wall' is a stuff I guess it is sufficient to just check existence. But the numbers also don't match (11588 vs. 11664, 1167 vs. 1172)
I want to ask where my understanding goes wrong? Thanks a lot!
I've been having a bit more trouble than I thought I had bargained for with these models that are intended to work out of the box, specifically with the AdeSegDataLayer. I think I almost have it, but I am getting the following error:
I0623 00:19:42.193922 27604 layer_factory.hpp:77] Creating layer data
I0623 00:19:42.639711 27604 net.cpp:100] Creating Layer data
I0623 00:19:42.639730 27604 net.cpp:408] data -> data
I0623 00:19:42.639760 27604 net.cpp:408] data -> label
I0623 00:19:43.050173 27604 net.cpp:150] Setting up data
I0623 00:19:43.050217 27604 net.cpp:157] Top shape: 1 3 1944 2592 (15116544)
I0623 00:19:43.050227 27604 net.cpp:157] Top shape: 1 1 1944 2592 3 (15116544)
I0623 00:19:43.050235 27604 net.cpp:165] Memory required for data: 120932352
I0623 00:19:43.050258 27604 layer_factory.hpp:77] Creating layer data_data_0_split
I0623 00:19:43.050281 27604 net.cpp:100] Creating Layer data_data_0_split
I0623 00:19:43.050292 27604 net.cpp:434] data_data_0_split <- data
I0623 00:19:43.050312 27604 net.cpp:408] data_data_0_split -> data_data_0_split_0
I0623 00:19:43.050330 27604 net.cpp:408] data_data_0_split -> data_data_0_split_1
I0623 00:19:43.050793 27604 net.cpp:150] Setting up data_data_0_split
I0623 00:19:43.050809 27604 net.cpp:157] Top shape: 1 3 1944 2592 (15116544)
I0623 00:19:43.050817 27604 net.cpp:157] Top shape: 1 3 1944 2592 (15116544)
I0623 00:19:43.050822 27604 net.cpp:165] Memory required for data: 241864704
I0623 00:19:43.050829 27604 layer_factory.hpp:77] Creating layer conv1_1
I0623 00:19:43.050853 27604 net.cpp:100] Creating Layer conv1_1
I0623 00:19:43.050859 27604 net.cpp:434] conv1_1 <- data_data_0_split_0
I0623 00:19:43.050871 27604 net.cpp:408] conv1_1 -> conv1_1
I0623 00:19:43.700464 27604 net.cpp:150] Setting up conv1_1
I0623 00:19:43.700536 27604 net.cpp:157] Top shape: 1 64 2142 2790 (382475520)
I0623 00:19:43.700549 27604 net.cpp:165] Memory required for data: 1771766784
I0623 00:19:43.700593 27604 layer_factory.hpp:77] Creating layer relu1_1
I0623 00:19:43.700616 27604 net.cpp:100] Creating Layer relu1_1
I0623 00:19:43.700634 27604 net.cpp:434] relu1_1 <- conv1_1
I0623 00:19:43.700644 27604 net.cpp:395] relu1_1 -> conv1_1 (in-place)
I0623 00:19:43.701800 27604 net.cpp:150] Setting up relu1_1
I0623 00:19:43.701817 27604 net.cpp:157] Top shape: 1 64 2142 2790 (382475520)
I0623 00:19:43.701825 27604 net.cpp:165] Memory required for data: 3301668864
I0623 00:19:43.701958 27604 layer_factory.hpp:77] Creating layer conv1_2
I0623 00:19:43.701982 27604 net.cpp:100] Creating Layer conv1_2
I0623 00:19:43.701988 27604 net.cpp:434] conv1_2 <- conv1_1
I0623 00:19:43.702000 27604 net.cpp:408] conv1_2 -> conv1_2
F0623 00:19:43.704733 27604 blob.cpp:34] Check failed: shape[i] <= 2147483647 / count_ (2790 vs. 1740) blob size exceeds INT_MAX
*** Check failure stack trace: ***
@ 0x7f0da310bdaa (unknown)
@ 0x7f0da310bce4 (unknown)
@ 0x7f0da310b6e6 (unknown)
@ 0x7f0da310e687 (unknown)
@ 0x7f0da3794b5e caffe::Blob<>::Reshape()
@ 0x7f0da37e81d6 caffe::BaseConvolutionLayer<>::Reshape()
@ 0x7f0da37b618f caffe::CuDNNConvolutionLayer<>::Reshape()
@ 0x7f0da375ec7c caffe::Net<>::Init()
@ 0x7f0da375faf5 caffe::Net<>::Net()
@ 0x7f0da379bb9a caffe::Solver<>::InitTrainNet()
@ 0x7f0da379cc9c caffe::Solver<>::Init()
@ 0x7f0da379cfca caffe::Solver<>::Solver()
@ 0x7f0da377d2b3 caffe::Creator_AdamSolver<>()
@ 0x40f4ae caffe::SolverRegistry<>::CreateSolver()
@ 0x408504 train()
@ 0x405e6c main
@ 0x7f0da1966f45 (unknown)
@ 0x406773 (unknown)
@ (nil) (unknown)
Aborted (core dumped)
I noticed that in the AdeSegDataLayer there doesn't appear to be any place to resize the data, but everywhere on the project page and in the evaluation scripts it looks as though the data is supposed to be 384x384. could that be the cause? if so, why isn't that in the DataLayer, and more importantly, can you suggest a change to my datalayer [attached] to do that resize properly (I could just shrink the smaller height dimension to 384 then crop the width, or shrink the width to 384 and pad the height...which is what you all did?)
Hi, I trained the model using the given code twice(once with re-scaled images of size 384 by 384(bicubic for images and nearest for annotations)) and once without scaling. I trained for around 150,000 iterations. But in both cases with the trained snapshot weights, when I do inference on the validation images, I get blank images. Also during the training the cross entropy is very high all the time(~600000) and doesn't seem to come down at all. So did you use the same settings given in the solver_FCN ? specifically the base_lr: 1e-10? Are there any other tricks that must be used to train the model because now the predictions are completely blank and with this high CE it's obvious that the model is not learning anything.
Note- With the pre-trained weights that you have provided I can reproduce your results and I get 71.95% pixel accuracy using the FCN model. Just the training part does not seem to work. Also I tried initialising all the layers before fc6 with the pretrained VGG-16 weights. Any pointers highly appreciated.
Thanks!
Hi,
This isn't exactly related to the code, but I cannot download the train/val data from the website. I'm getting interruptions and corrupted zip files. Is there a mirror for the data?
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.