GithubHelp home page GithubHelp logo

csailvision / sceneparsing Goto Github PK

View Code? Open in Web Editor NEW
455.0 455.0 194.0 683 KB

Development kit for MIT Scene Parsing Benchmark

Home Page: http://sceneparsing.csail.mit.edu

License: BSD 3-Clause "New" or "Revised" License

MATLAB 38.26% Python 15.64% Lua 46.10%

sceneparsing's People

Contributors

hangzhaomit avatar xavierpuigf avatar zhoubolei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sceneparsing's Issues

Faile to reproduce DilatedNet performance

Hi,

I am trying to reproduce DilatedNet.

However, my training results show that
pixel acc : 72.4%
mean acc: 38.6%
mean iou: 28.7%.

Further training does not show improvement.

I am using a pre-trained net and multiple gpus with mini-batch size of 8. I did not use augmentations as the paper do not explain what augmentations are used. I expect that augmentation does affect the results at a small amount, otherwise you probably present augmentations in the paper.

(1) Could you explain what augmentations are used and how much does it improve results?

(2) Could you provide training and validation log files?

Thank you so much.

Inconsistent space in the file "convertFromADE/mapFromADE.txt"

In the file "convertFromADE/mapFromADE.txt", the characters between the first column and the second column vary at different rows. At some lines they are tabs and at some lines they are spaces. Thus, it would meet problems when users want to use split string function.
I think you'd better replace this file with consistent split characters. It wont be too much work.

ADE20K testset

Hi! I didn't find the testset download link in official website. So where I can download the testset? Looking fowrard to your reply. Thanks!

'Out of Memory' When Training Own Model

Hi,

Recently, I try to use the caffe training code to train own FCN model, but an odd problem paused me.

When I call binary caffe command like:

caffe train -gpu 0 -solver solver_FCN.prototxt

And then, I get the following error:

[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.

[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 566256237

I0620 14:34:02.241405 19481 net.cpp:744] Ignoring source layer fc6

I0620 14:34:02.241425 19481 net.cpp:744] Ignoring source layer fc7

I0620 14:34:03.245385 19481 solver.cpp:218] Iteration 0 (6.64999e-08 iter/s, 0.997409s/40 iters), loss = 1.27303e+06

I0620 14:34:03.245477 19481 solver.cpp:237] Train net output #0: loss = 1.27303e+06 (* 1 = 1.27303e+06 loss)

I0620 14:34:03.245496 19481 sgd_solver.cpp:105] Iteration 0, lr = 1e-10

I0620 14:34:20.222455 19481 solver.cpp:218] Iteration 40 (2.35623 iter/s, 16.9763s/40 iters), loss = 1.06719e+06

I0620 14:34:20.222501 19481 solver.cpp:237] Train net output #0: loss = 1.30907e+06 (* 1 = 1.30907e+06 loss)

I0620 14:34:20.222533 19481 sgd_solver.cpp:105] Iteration 40, lr = 1e-10

F0620 14:34:24.133394 19481 syncedmem.cpp:71] Check failed: error == cudaSuccess (2 vs. 0) out of memory

*** Check failure stack trace: ***

Aborted (core dumped)

I have checked my GPU, and find no problem.

Is there anyone has the same problem?

Regards,

A error about demoSegmenation.m

When I excute this code, I get a error:

[libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 5: 15 Message type "caffe.LayerParameter" has no field named "input_param".

I know, this is seem a error of CAFFE, but when I change the deploy, the error gone.
input: "data"
input_dim: 1
input_dim: 3
input_dim: 513
input_dim: 513

But, I get another question. If I want to use the prototxt with CRF, how to setting the prototxt, there a special parameter "data_dim".
layer {
name: "data"
type: "ImageSegData"
top: "data"
top: "label"
top: "data_dim"
......................

I hope your help, thank you!

Reference for Cascade-SegNet?

As we all know, SegNet is available in public. However, Cascade-Segnet and Cascade-DilatedNet are both reported as state-of-the-art. Can some please explain what is the difference between SegNet and Cascade-SegNet?

Class to Color Correspondence

Hi
I wish to relabel the indoor images of ADE20k Dataset of 150 class labels into smaller number of categories e.g. floor, wall, furniture, person, stairs etc. I am having a hard time finding a file that provides class to color correspondence. I would be very grateful if anyone can help me with that so I can proceed with relabelling. Thanks.

How to calculate the probability of each pixel belongs to every class?

After I ran the code,
I output the array "imPred" after line 66 in "demoSegmentation.m",
% imPred = net.forward({im_inp});
the imPred{1} array is 384 * 384 * 151,
so I expect to get the probability of each pixel belongs to each class,
for instance, 0.8, 0.53, 0.01...etc, which are between 0~1.

However the numbers I got from imPred{1} were like -1.2331, 3.0104, -0.7758, 10.1961...etc,
so I was wondering if these numbers can convert to probabilities,
and how to convert them?

Thank you.

Scene names as classification labels

Hi,

I'd like to know if you have the scene names for the test data and I can evaluate my trained model using it. As far as I know, the original dataset (released here) have scene names for each training/validation data. But, the test data which can be downloaded from the above site don't contain scene names.

I think the original dataset is novel because it has wide variety of segmentation labels and also each image belongs to scene. Therefore, I believe it would be great if you could provide us with the scene names as classification labels for each test data. Of course, if we can evaluate the performance of classification against test data, it would be enough. (It means you don't need to make the scene names of test data public.)

if you could kindly consider it, I would be grateful. Also, if this post is not appropriate here, please make it close.
Thanks,

Licencing for ADE20K Dataset

Hello!
Great work and thanks for releasing the dataset! I was curious what the license is for the dataset?
Thanks!

ADE20k classes

So I have a question about the ADE20k itself.

I read all the seg-masks from the training set (~15k files) and counted the number of unique class values. I got 2231 unique values, where the highest value is 3144. This makes no sense as the number of classes is supposed to be 150.

I'm using this code to load the *_seg.png files in Python (adapted from the Matlab code on the dataset site):

mask = np.array(Image.open(mask_path), dtype=np.uint16)
R,G,B = mask[:,:,0], mask[:,:,1], mask[:,:,2]
class_mask = R // 10 * 256 + G

Class that corresponds to each color

For starters, great work!
I want to find out which are the classes present in an image. But I am new to both neural networks and matlab code, and I have a hard time understanding how to use sceneparsing/visualizationCode/.
How can I do that?
Thank you

How to create more training date with some extra classes

Hi
I have been trying so create more training data with the object which are not accurately detected to improve the segmentation, but i can't figure out which color encoding to use which will give me single channel masking like yours. I have tried to pass same colored annotations with the encoding given in color150.mat file but the color encoding used in the original ADE20K dataset annotation are different.
In the below below images the floor color encoding are different. The green floor one will give the single channel input as required but the brown floor one will give only black mask. So can anyone tell me how to get the correct color encoding to pass it from https://github.com/CSAILVision/sceneparsing/blob/master/convertFromADE/convertFromADE.m to get annotations

6
ADE_train_00000196_seg

Can not reproduce DilatedNet result using provided solver, data_layer and training network

I tried to reproduce the result with provided data layer, training network and solver parameters but fail to produce even close result as provided DilatedNet model [http://sceneparsing.csail.mit.edu/model/DilatedNet_iter_120000.caffemodel].

A test run on validation images with model of 120000 iteration gives me following stats:

  • iteration 120000 overall accuracy 0.71458910259
  • iteration 120000 mean accuracy 0.321233999994
  • iteration 120000 mean IU 0.243954227299
  • iteration 120000 fwavacc 0.567641521075

However, the reported baseline performance is (73.6, 44.6, 32.3, 60.1)

I wonder what is going wrong and what I should do to have a matching result?
[The training images are resized to 384x384 and mirrored to match the setting of author's]

Preprocessing script for torch

Is there a pre-processing script for the torch training code that you can make publically available ? i.e. for generating the h5 file and the json files.

How were the numbers uner 'Ratio', 'Train', and 'Val' calculated in objectInfo150.txt?

I am trying to understand where 'Ratio', 'Train', and 'Val' come from in the objectInfo150.txt file.

Presumably the 'Ratio' is the pixel ratio of each category presented over all the images. I tried to reproduce the number for the 'wall' category by 1) counting the number of pixels labelled with '1' for each image, divided by the total number of pixels in the image, then averaged over total number of images under the training/validation set separately/altogether; 2) similar to 1) but averaged over the sum of total number of pixels in all images. Neither approach successfully reproduce the number(around 0.1 off 0.1576)

I guess the numbers under 'Train' and 'Val' are the instance counts for each category? For this I simply count if the category 'wall' is present in every image under the training and validation sets. Since 'wall' is a stuff I guess it is sufficient to just check existence. But the numbers also don't match (11588 vs. 11664, 1167 vs. 1172)

I want to ask where my understanding goes wrong? Thanks a lot!

Failing to train models

I've been having a bit more trouble than I thought I had bargained for with these models that are intended to work out of the box, specifically with the AdeSegDataLayer. I think I almost have it, but I am getting the following error:

I0623 00:19:42.193922 27604 layer_factory.hpp:77] Creating layer data
I0623 00:19:42.639711 27604 net.cpp:100] Creating Layer data
I0623 00:19:42.639730 27604 net.cpp:408] data -> data
I0623 00:19:42.639760 27604 net.cpp:408] data -> label
I0623 00:19:43.050173 27604 net.cpp:150] Setting up data
I0623 00:19:43.050217 27604 net.cpp:157] Top shape: 1 3 1944 2592 (15116544)
I0623 00:19:43.050227 27604 net.cpp:157] Top shape: 1 1 1944 2592 3 (15116544)
I0623 00:19:43.050235 27604 net.cpp:165] Memory required for data: 120932352
I0623 00:19:43.050258 27604 layer_factory.hpp:77] Creating layer data_data_0_split
I0623 00:19:43.050281 27604 net.cpp:100] Creating Layer data_data_0_split
I0623 00:19:43.050292 27604 net.cpp:434] data_data_0_split <- data
I0623 00:19:43.050312 27604 net.cpp:408] data_data_0_split -> data_data_0_split_0
I0623 00:19:43.050330 27604 net.cpp:408] data_data_0_split -> data_data_0_split_1
I0623 00:19:43.050793 27604 net.cpp:150] Setting up data_data_0_split
I0623 00:19:43.050809 27604 net.cpp:157] Top shape: 1 3 1944 2592 (15116544)
I0623 00:19:43.050817 27604 net.cpp:157] Top shape: 1 3 1944 2592 (15116544)
I0623 00:19:43.050822 27604 net.cpp:165] Memory required for data: 241864704
I0623 00:19:43.050829 27604 layer_factory.hpp:77] Creating layer conv1_1
I0623 00:19:43.050853 27604 net.cpp:100] Creating Layer conv1_1
I0623 00:19:43.050859 27604 net.cpp:434] conv1_1 <- data_data_0_split_0
I0623 00:19:43.050871 27604 net.cpp:408] conv1_1 -> conv1_1
I0623 00:19:43.700464 27604 net.cpp:150] Setting up conv1_1
I0623 00:19:43.700536 27604 net.cpp:157] Top shape: 1 64 2142 2790 (382475520)
I0623 00:19:43.700549 27604 net.cpp:165] Memory required for data: 1771766784
I0623 00:19:43.700593 27604 layer_factory.hpp:77] Creating layer relu1_1
I0623 00:19:43.700616 27604 net.cpp:100] Creating Layer relu1_1
I0623 00:19:43.700634 27604 net.cpp:434] relu1_1 <- conv1_1
I0623 00:19:43.700644 27604 net.cpp:395] relu1_1 -> conv1_1 (in-place)
I0623 00:19:43.701800 27604 net.cpp:150] Setting up relu1_1
I0623 00:19:43.701817 27604 net.cpp:157] Top shape: 1 64 2142 2790 (382475520)
I0623 00:19:43.701825 27604 net.cpp:165] Memory required for data: 3301668864
I0623 00:19:43.701958 27604 layer_factory.hpp:77] Creating layer conv1_2
I0623 00:19:43.701982 27604 net.cpp:100] Creating Layer conv1_2
I0623 00:19:43.701988 27604 net.cpp:434] conv1_2 <- conv1_1
I0623 00:19:43.702000 27604 net.cpp:408] conv1_2 -> conv1_2
F0623 00:19:43.704733 27604 blob.cpp:34] Check failed: shape[i] <= 2147483647 / count_ (2790 vs. 1740) blob size exceeds INT_MAX
*** Check failure stack trace: ***
    @     0x7f0da310bdaa  (unknown)
    @     0x7f0da310bce4  (unknown)
    @     0x7f0da310b6e6  (unknown)
    @     0x7f0da310e687  (unknown)
    @     0x7f0da3794b5e  caffe::Blob<>::Reshape()
    @     0x7f0da37e81d6  caffe::BaseConvolutionLayer<>::Reshape()
    @     0x7f0da37b618f  caffe::CuDNNConvolutionLayer<>::Reshape()
    @     0x7f0da375ec7c  caffe::Net<>::Init()
    @     0x7f0da375faf5  caffe::Net<>::Net()
    @     0x7f0da379bb9a  caffe::Solver<>::InitTrainNet()
    @     0x7f0da379cc9c  caffe::Solver<>::Init()
    @     0x7f0da379cfca  caffe::Solver<>::Solver()
    @     0x7f0da377d2b3  caffe::Creator_AdamSolver<>()
    @           0x40f4ae  caffe::SolverRegistry<>::CreateSolver()
    @           0x408504  train()
    @           0x405e6c  main
    @     0x7f0da1966f45  (unknown)
    @           0x406773  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)


I noticed that in the AdeSegDataLayer there doesn't appear to be any place to resize the data, but everywhere on the project page and in the evaluation scripts it looks as though the data is supposed to be 384x384. could that be the cause? if so, why isn't that in the DataLayer, and more importantly, can you suggest a change to my datalayer [attached] to do that resize properly (I could just shrink the smaller height dimension to 384 then crop the width, or shrink the width to 384 and pad the height...which is what you all did?)

ade_layers.py.zip

Model does not learn on training(very high CE)

Hi, I trained the model using the given code twice(once with re-scaled images of size 384 by 384(bicubic for images and nearest for annotations)) and once without scaling. I trained for around 150,000 iterations. But in both cases with the trained snapshot weights, when I do inference on the validation images, I get blank images. Also during the training the cross entropy is very high all the time(~600000) and doesn't seem to come down at all. So did you use the same settings given in the solver_FCN ? specifically the base_lr: 1e-10? Are there any other tricks that must be used to train the model because now the predictions are completely blank and with this high CE it's obvious that the model is not learning anything.

Note- With the pre-trained weights that you have provided I can reproduce your results and I get 71.95% pixel accuracy using the FCN model. Just the training part does not seem to work. Also I tried initialising all the layers before fc6 with the pretrained VGG-16 weights. Any pointers highly appreciated.
Thanks!

Mirrored Data File Download?

Hi,

This isn't exactly related to the code, but I cannot download the train/val data from the website. I'm getting interruptions and corrupted zip files. Is there a mirror for the data?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.