cvjena / cn24 Goto Github PK

View Code? Open in Web Editor NEW

123.0 16.0 44.0 8.61 MB

Convolutional (Patch) Networks for Semantic Segmentation

License: BSD 3-Clause "New" or "Revised" License

CMake 3.78% Shell 0.67% C 6.88% C++ 88.27% Makefile 0.03% Lex 0.08% Yacc 0.30%

convolutional-networks opencl deep-learning segmentation

cn24's Introduction

Build status:

master (production branch): develop (development branch):

Welcome to the CN24 GitHub repository!

CN24 is a complete semantic segmentation framework using fully convolutional networks. It supports a wide variety of platforms (Linux, Mac OS X and Windows) and libraries (OpenCL, Intel MKL, AMD ACML...) while providing dependency-free reference implementations. The software is developed in the Computer Vision Group at the University of Jena.

Why should I use CN24?

Designed for pixel-wise labeling and semantic segmentation (train and test your own networks!)
Suited for various applications in driver assistance systems, scene understanding, remote sensing, biomedical image processing and many more
OpenCL support not only suited for NVIDIA GPUs
High-performance implementation with minimal dependencies to other libraries

Getting started

To get started, clone this repository and visit the wiki! Installation is just a two command lines away. For an even faster introduction, check out one of these examples:

The repository contains pre-trained networks for these two applications, which are ready to use.

Licensing

CN24 is available under a 3-clause BSD license. See LICENSE for details. If you use CN24 for research, please cite our paper Clemens-Alexander Brust, Sven Sickert, Marcel Simon, Erik Rodner, Joachim Denzler. "Convolutional Patch Networks with Spatial Prior for Road Detection and Urban Scene Understanding". VISAPP 2015.

Remark: The paper does not discuss the fully convolutional network adaptations integrated in CN24.

Questions?

If you have questions, feedback, or experience problems. Let us know and write an e-mail to Clemens-Alexander Brust, Sven Sickert, Marcel Simon, and Erik Rodner.

cn24's People

Contributors

Stargazers

Watchers

cn24's Issues

how to set GPU

I can't run the train process with GPU ? who to set the training process on GPU?

Add per-layer gradient stats

Remove ALL CMake defines from public headers.

Projects using CN24 have no way of know which defines were used when building CN24. Some headers use defines to implement member functions. Remove these instances and move them to the cpp files.

Make OpenCL device configurable

Currently, the OpenCL platform and device numbers are hardcoded. Figure out the best way to
configure them and implement it.

Multidimensional Labels

Hello,
I have been reading though the docs for CN24. Is it possible to setup a .set file such that the output has 3 channels (Red, Green, Blue) where the number of actual "classifications" is 2^24?

Essentially I want to train a network to output an RGB image based on a different RGB image.

Please let me know if this is possible.

Make trainNetwork tool more configurable, add training scripts

There are still some hardcoded parts in trainNetwork:

Add a shell similar to the tensorTool to read in values from the user
Make this scriptable for unattended usage

questions: batch size in hybrid mode; training and testing tensors

Since an epoch is <iterations> batches, I have a question about batches. I guess in normal mode batch size is measured in images processed; How about hybrid (patch) mode? In other words, how to set iterations so that one epoch corresponds to one pass over a training set?

Also, can one safely change training and testing tensors set in config.set after creating the tensors? For example, subsitute another tensor for training. It seems to be the case, but I'd like to make sure.

ERR [ ErrorLayer::CreateOutputs(48) ] Inputs need the same number of elements!

I get the following error when attempting to trainNetwork.

ERR [ ErrorLayer::CreateOutputs(48) ] Inputs need the same number of elements!
ERR [ NetGraph::InitializeNode(262) ] FATAL: Layer will not create outputs: Square Loss Layer (Weight: 1), input0: (4s@1536x1536x2m)

I tried a number varying things, for example architectures, but in the end it always comes down to this. Here's a fuller printout:

    (...)
DBG [ ErrorLayer::ErrorLayer(18) ] Instance created.
DBG [ NetGraph::IsComplete(119) ] Node is okay: Dataset Input Layer
DBG [ NetGraph::IsComplete(119) ] Node is okay: Resize Layer (11x11)
DBG [ NetGraph::IsComplete(119) ] Node is okay: Convolutional Layer (8 kernels @ 12x12)
DBG [ NetGraph::IsComplete(119) ] Node is okay: ReLU Layer
DBG [ NetGraph::IsComplete(119) ] Node is okay: Convolutional Layer (2 kernels @ 1x1)
DBG [ NetGraph::IsComplete(119) ] Node is okay: Sigmoid Layer
DBG [ NetGraph::IsComplete(119) ] Node is okay: Upscale Layer (4x4)
DBG [ NetGraph::IsComplete(119) ] Node is okay: Square Loss Layer (Weight: 1)
DBG [ NetGraph::IsComplete(137) ] Graph check complete.
DBG [ main(154) ] Graph complete: 1
DBG [ ConfusionMatrixLayer::ConfusionMatrixLayer(17) ] Instance created, 2 classes.
DBG [ ConvolutionLayer::Connect(135) ] Local learning rate is now 1
DBG [ ConvolutionLayer::Connect(135) ] Local learning rate is now 1
ERR [ ErrorLayer::CreateOutputs(48) ] Inputs need the same number of elements!
ERR [ NetGraph::InitializeNode(262) ] FATAL: Layer will not create outputs: Square Loss Layer (Weight: 1), input0: (4s@1536x1536x2m)
terminate called after throwing an instance of 'std::runtime_error'
  what():  See log for details.
Aborted (core dumped)

By the way, where's the log to see?

License in OpenCL kernels are GPL3, while main project is BSD 3-clause

Hi!

I just noticed that some of the kernels are GPL. Is that on purpose?

CN24 reports incorrect metrics when using test images different in size (WxH) from train images

For example, test F1 etc. are way too high:

INF [ Trainer::Epoch(241) ] Training (Epoch 59, node 0) Square Loss Layer (Weight: 1) lps: 0.0526363
INF [ Trainer::Epoch(243) ] Training (Epoch 59) aggregate lps: 0.0526363
RESULT --- Training  - Epoch 59 - F1 : 56.1468% (t=-0.5)
RESULT --- Training  - Epoch 59 - ACC: 95.9312%
RESULT --- Training  - Epoch 59 - PRE: 52.9711%
RESULT --- Training  - Epoch 59 - REC: 59.7277%
RESULT --- Training  - Epoch 59 - FPR: 2.41795%
RESULT --- Training  - Epoch 59 - FNR: 40.2723%
INF [ NetGraph&, Conv::NetGraph&, Conv::Trainer&, Conv::Trainer&, bool, std::string&)(296) ] Training complete.

 > test

INF [ Trainer::Test(117) ] Testing (Epoch 60, node 0) Square Loss Layer (Weight: 1) lps: 0.826714
INF [ Trainer::Test(119) ] Testing (Epoch 60) aggregate lps: 0.826714
RESULT --- Testing  - Epoch 60 - F1 : 85.0582% (t=-1)
RESULT --- Testing  - Epoch 60 - ACC: 80.2098%
RESULT --- Testing  - Epoch 60 - PRE: 99.9539%
RESULT --- Testing  - Epoch 60 - REC: 74.0263%
RESULT --- Testing  - Epoch 60 - FPR: 0.108757%
RESULT --- Testing  - Epoch 60 - FNR: 25.9737%

Trained in hybrid mode.

Use unsupervised segmentation during training

Make training status available to all layers

Some functions like Dropout need to know whether the net is currently training or testing. Make Trainer status available to all _Layer_s.

Error moving to GPU

Hi,

I am testing your great package. After trying yesterday to do one epoch on my CPU with my own dataset, it took about 16 hours so I figured the package worked but I needed some GPU acceleration. :-)

I installed OpenCL for my GeForce GT 330M (please don't laugh, I am still testing the fundamentals before scaling up). Then I added a config file to the build folder specifying which device and platform I wanted to use. I recompiled the package with the OpenCL flag on in ccmake. When I trained the network again, I get the error:

ERR [ Tensor::MoveToGPU(407) ] FATAL: Error moving to GPU: -4
terminate called after throwing an instance of 'std::runtime_error'

The full output:

ruud@computer:~/CN24/cn24/build$ ./trainNetwork application/config.set application/test.net

INF [ System::Init(68) ] CN24 version 06206bf refs/heads/stable
INF [ System::Init(69) ] Copyright (C) 2015 Clemens-Alexander Brust
INF [ System::Init(70) ] For licensing information, see the LICENSE file included with this project.
DBG [ System::Init(75) ] Executable path: /home/ruud/CN24/cn24/build/
INF [ System::Init(89) ] Loading config file: /home/ruud/CN24/cn24/build/config
INF [ CLHelper::Init(200) ] Using OpenCL device: GeForce GT 330M
INF [ CLHelper::Init(201) ] Image support: Yes
INF [ CLHelper::Init(202) ] Max work group size: 3752810960
INF [ CLHelper::Init(213) ] Creating OpenCL context...
INF [ CLHelper::Init(224) ] Creating OpenCL command queue...
DBG [ CLHelper::CreateProgram(350) ] Compiling kernels/crossCorrelation.cl
DBG [ CLHelper::CreateProgram(350) ] Compiling kernels/biasedConvolution.cl
DBG [ CLHelper::CreateProgram(350) ] Compiling kernels/fullConvolution.cl
DBG [ CLHelper::CreateProgram(350) ] Compiling kernels/foldWeights.cl
DBG [ CLHelper::CreateProgram(350) ] Compiling kernels/biasedMatrixVector.cl
DBG [ CLHelper::CreateProgram(350) ] Compiling kernels/biasGradient.cl
DBG [ CLHelper::CreateProgram(350) ] Compiling kernels/matrixMatrix.cl
DBG [ CLHelper::CreateProgram(350) ] Compiling kernels/maximumPooling.cl
DBG [ CLHelper::CreateProgram(350) ] Compiling kernels/nonLinearFunctions.cl
DBG [ TensorViewer::TensorViewer(57) ] Instance created.
DBG [ ConfigurableFactory::ConfigurableFactory(57) ] Adding convolutional layer to receptive field (7,7)
DBG [ ConfigurableFactory::ConfigurableFactory(64) ] Convolutional layer
DBG [ ConfigurableFactory::ConfigurableFactory(66) ] Adding maxpooling layer to receptive field (2,2)
DBG [ ConfigurableFactory::ConfigurableFactory(57) ] Adding convolutional layer to receptive field (5,5)
DBG [ ConfigurableFactory::ConfigurableFactory(57) ] Adding convolutional layer to receptive field (5,5)
DBG [ main(78) ] Optimal settings: LR: 0.0001, GM: 0.003, EX: 0.75, SB: 10, PB: 2, L1: 0.001, L2: 0.0005, MM: 0.9
INF [ main(84) ] Using fully convolutional training
DBG [ TensorStreamDataset* Conv::TensorStreamDataset::CreateFromConfiguration(328) ] Loading dataset with 6 classes
DBG [ TensorStreamDataset* Conv::TensorStreamDataset::CreateFromConfiguration(329) ] Training tensor: /home/ruud/CN24/cn24/build/pepper/pepper_train.Tensor
DBG [ TensorStreamDataset* Conv::TensorStreamDataset::CreateFromConfiguration(330) ] Testing tensor: /home/ruud/CN24/cn24/build/pepper/pepper_test.Tensor
DBG [ TensorStreamDataset::TensorStreamDataset(32) ] Instance created.
DBG [ TensorStreamDataset::TensorStreamDataset(54) ] 1 training tensors
DBG [ TensorStreamDataset::TensorStreamDataset(73) ] 1 testing tensors
DBG [ DatasetInputLayer::DatasetInputLayer(31) ] Instance created.
DBG [ DatasetInputLayer::DatasetInputLayer(41) ] Using loss sampling probability: 0.25
DBG [ DatasetInputLayer::DatasetInputLayer(47) ] Total samples: 2
DBG [ Net::AddLayer(59) ] Layer 0 output 0: (2s@1000x752x3m)
DBG [ Net::AddLayer(59) ] Layer 0 output 1: (2s@1000x752x6m)
DBG [ Net::AddLayer(59) ] Layer 0 output 2: (2s@1000x752x2m)
DBG [ Net::AddLayer(59) ] Layer 0 output 3: (2s@1000x752x1m)
DBG [ Net::AddLayer(73) ] Layer 0 added.
DBG [ Net::AddLayer(77) ] Layer 0 is OpenCL aware
DBG [ Net::AddLayer(87) ] Layer 0 added as training layer.
DBG [ ResizeLayer::ResizeLayer(23) ] Instance created, border size: (22, 22)
DBG [ Net::AddLayer(37) ] Layer 1 input: layer 0, output 0
DBG [ Net::AddLayer(59) ] Layer 1 output 0: (2s@1022x774x3m)
DBG [ Net::AddLayer(73) ] Layer 1 added.
DBG [ Net::AddLayer(77) ] Layer 1 is OpenCL aware
DBG [ ConfigurableFactory::AddLayers(147) ] Parsing layer: convolutional kernels=16 size=7x7
DBG [ ConfigurableFactory::AddLayers(157) ] Parsed dropout fraction: 0
DBG [ ConvolutionLayer::ConvolutionLayer(48) ] Instance created. 16 output maps with 7x7 kernels.
DBG [ ConvolutionLayer::ConvolutionLayer(50) ] Dropout fraction: 0
DBG [ ConfigurableFactory::AddLayers(162) ] LLR factor: 1, RFX: 24
DBG [ Layer::SetLocalLearningRate(76) ] Setting local learning rate to 1
DBG [ Net::AddLayer(37) ] Layer 2 input: layer 1, output 0
DBG [ Net::AddLayer(59) ] Layer 2 output 0: (2s@1016x768x16m)
DBG [ ConvolutionLayer::Connect(113) ] Local learning rate is now 1
DBG [ Net::AddLayer(73) ] Layer 2 added.
DBG [ Net::AddLayer(77) ] Layer 2 is OpenCL aware
DBG [ ConfigurableFactory::AddLayers(147) ] Parsing layer: maxpooling size=2x2
DBG [ MaxPoolingLayer::MaxPoolingLayer(22) ] Instance created: 2x2 pooling.
DBG [ Net::AddLayer(37) ] Layer 3 input: layer 2, output 0
DBG [ Net::AddLayer(59) ] Layer 3 output 0: (2s@508x384x16m)
DBG [ Net::AddLayer(73) ] Layer 3 added.
DBG [ Net::AddLayer(77) ] Layer 3 is OpenCL aware
DBG [ ConfigurableFactory::AddLayers(147) ] Parsing layer: tanh
DBG [ TanhLayer::TanhLayer(54) ] Instance created, nl: Tanh
DBG [ Net::AddLayer(37) ] Layer 4 input: layer 3, output 0
DBG [ Net::AddLayer(59) ] Layer 4 output 0: (2s@508x384x16m)
DBG [ Net::AddLayer(73) ] Layer 4 added.
DBG [ Net::AddLayer(77) ] Layer 4 is OpenCL aware
DBG [ ConfigurableFactory::AddLayers(147) ] Parsing layer: convolutional size=5x5 kernels=12
DBG [ ConfigurableFactory::AddLayers(157) ] Parsed dropout fraction: 0
DBG [ ConvolutionLayer::ConvolutionLayer(48) ] Instance created. 12 output maps with 5x5 kernels.
DBG [ ConvolutionLayer::ConvolutionLayer(50) ] Dropout fraction: 0
DBG [ ConfigurableFactory::AddLayers(162) ] LLR factor: 1, RFX: 24
DBG [ Layer::SetLocalLearningRate(76) ] Setting local learning rate to 1
DBG [ Net::AddLayer(37) ] Layer 5 input: layer 4, output 0
DBG [ Net::AddLayer(59) ] Layer 5 output 0: (2s@504x380x12m)
DBG [ ConvolutionLayer::Connect(113) ] Local learning rate is now 1
DBG [ Net::AddLayer(73) ] Layer 5 added.
DBG [ Net::AddLayer(77) ] Layer 5 is OpenCL aware
DBG [ ConfigurableFactory::AddLayers(147) ] Parsing layer: tanh
DBG [ TanhLayer::TanhLayer(54) ] Instance created, nl: Tanh
DBG [ Net::AddLayer(37) ] Layer 6 input: layer 5, output 0
DBG [ Net::AddLayer(59) ] Layer 6 output 0: (2s@504x380x12m)
DBG [ Net::AddLayer(73) ] Layer 6 added.
DBG [ Net::AddLayer(77) ] Layer 6 is OpenCL aware
DBG [ ConfigurableFactory::AddLayers(147) ] Parsing layer: convolutional size=5x5 kernels=64
DBG [ ConfigurableFactory::AddLayers(157) ] Parsed dropout fraction: 0
DBG [ ConvolutionLayer::ConvolutionLayer(48) ] Instance created. 64 output maps with 5x5 kernels.
DBG [ ConvolutionLayer::ConvolutionLayer(50) ] Dropout fraction: 0
DBG [ ConfigurableFactory::AddLayers(162) ] LLR factor: 1, RFX: 24
DBG [ Layer::SetLocalLearningRate(76) ] Setting local learning rate to 1
DBG [ Net::AddLayer(37) ] Layer 7 input: layer 6, output 0
DBG [ Net::AddLayer(59) ] Layer 7 output 0: (2s@500x376x64m)
DBG [ ConvolutionLayer::Connect(113) ] Local learning rate is now 1
DBG [ Net::AddLayer(73) ] Layer 7 added.
DBG [ Net::AddLayer(77) ] Layer 7 is OpenCL aware
DBG [ ConfigurableFactory::AddLayers(147) ] Parsing layer: tanh
DBG [ TanhLayer::TanhLayer(54) ] Instance created, nl: Tanh
DBG [ Net::AddLayer(37) ] Layer 8 input: layer 7, output 0
DBG [ Net::AddLayer(59) ] Layer 8 output 0: (2s@500x376x64m)
DBG [ Net::AddLayer(73) ] Layer 8 added.
DBG [ Net::AddLayer(77) ] Layer 8 is OpenCL aware
DBG [ ConfigurableFactory::AddLayers(147) ] Parsing layer: convolutional size=1x1 kernels=192
DBG [ ConfigurableFactory::AddLayers(157) ] Parsed dropout fraction: 0
DBG [ ConvolutionLayer::ConvolutionLayer(48) ] Instance created. 192 output maps with 1x1 kernels.
DBG [ ConvolutionLayer::ConvolutionLayer(50) ] Dropout fraction: 0
DBG [ ConfigurableFactory::AddLayers(162) ] LLR factor: 1, RFX: 24
DBG [ Layer::SetLocalLearningRate(76) ] Setting local learning rate to 1
DBG [ Net::AddLayer(37) ] Layer 9 input: layer 8, output 0
DBG [ Net::AddLayer(59) ] Layer 9 output 0: (2s@500x376x192m)
DBG [ ConvolutionLayer::Connect(113) ] Local learning rate is now 1
DBG [ Net::AddLayer(73) ] Layer 9 added.
DBG [ Net::AddLayer(77) ] Layer 9 is OpenCL aware
DBG [ ConfigurableFactory::AddLayers(147) ] Parsing layer: tanh
DBG [ TanhLayer::TanhLayer(54) ] Instance created, nl: Tanh
DBG [ Net::AddLayer(37) ] Layer 10 input: layer 9, output 0
DBG [ Net::AddLayer(59) ] Layer 10 output 0: (2s@500x376x192m)
DBG [ Net::AddLayer(73) ] Layer 10 added.
DBG [ Net::AddLayer(77) ] Layer 10 is OpenCL aware
DBG [ ConfigurableFactory::AddLayers(147) ] Parsing layer: convolutional size=1x1 kernels=6
DBG [ ConfigurableFactory::AddLayers(157) ] Parsed dropout fraction: 0
DBG [ ConvolutionLayer::ConvolutionLayer(48) ] Instance created. 6 output maps with 1x1 kernels.
DBG [ ConvolutionLayer::ConvolutionLayer(50) ] Dropout fraction: 0
DBG [ ConfigurableFactory::AddLayers(162) ] LLR factor: 1, RFX: 24
DBG [ Layer::SetLocalLearningRate(76) ] Setting local learning rate to 1
DBG [ Net::AddLayer(37) ] Layer 11 input: layer 10, output 0
DBG [ Net::AddLayer(59) ] Layer 11 output 0: (2s@500x376x6m)
DBG [ ConvolutionLayer::Connect(113) ] Local learning rate is now 1
DBG [ Net::AddLayer(73) ] Layer 11 added.
DBG [ Net::AddLayer(77) ] Layer 11 is OpenCL aware
DBG [ ConfigurableFactory::AddLayers(147) ] Parsing layer: sigm
DBG [ SigmoidLayer::SigmoidLayer(55) ] Instance created, nl: Sigmoid
DBG [ Net::AddLayer(37) ] Layer 12 input: layer 11, output 0
DBG [ Net::AddLayer(59) ] Layer 12 output 0: (2s@500x376x6m)
DBG [ Net::AddLayer(73) ] Layer 12 added.
DBG [ Net::AddLayer(77) ] Layer 12 is OpenCL aware
DBG [ UpscaleLayer::UpscaleLayer(18) ] Instance created: 2x2 upscaling.
DBG [ Net::AddLayer(37) ] Layer 13 input: layer 12, output 0
DBG [ Net::AddLayer(59) ] Layer 13 output 0: (2s@1000x752x6m)
DBG [ Net::AddLayer(73) ] Layer 13 added.
WRN [ Net::AddLayer(79) ] Layer 13 is NOT OpenCL aware
DBG [ ConfigurableFactory::AddLayers(236) ] Added upscaling layer for FCN
DBG [ main(127) ] Output layer id: 13
DBG [ ErrorLayer::ErrorLayer(17) ] Instance created.
DBG [ Net::AddLayer(37) ] Layer 14 input: layer 13, output 0
DBG [ Net::AddLayer(37) ] Layer 14 input: layer 0, output 1
DBG [ Net::AddLayer(37) ] Layer 14 input: layer 0, output 3
DBG [ Net::AddLayer(73) ] Layer 14 added.
WRN [ Net::AddLayer(79) ] Layer 14 is NOT OpenCL aware
DBG [ Net::AddLayer(123) ] Layer 14 added as loss function layer.
DBG [ ConfusionMatrixLayer::ConfusionMatrixLayer(17) ] Instance created, 6 classes.
DBG [ Net::AddLayer(37) ] Layer 15 input: layer 13, output 0
DBG [ Net::AddLayer(37) ] Layer 15 input: layer 0, output 1
DBG [ Net::AddLayer(37) ] Layer 15 input: layer 0, output 3
DBG [ Net::AddLayer(73) ] Layer 15 added.
WRN [ Net::AddLayer(79) ] Layer 15 is NOT OpenCL aware
DBG [ Net::AddLayer(111) ] Layer 15 added as confusion matrix layer.
DBG [ ConvolutionLayer::OnLayerConnect(695) ] Updating weights: 192 -> 0
DBG [ ConvolutionLayer::OnLayerConnect(695) ] Updating weights: 64 -> 192
DBG [ ConvolutionLayer::OnLayerConnect(695) ] Updating weights: 300 -> 64
DBG [ ConvolutionLayer::OnLayerConnect(695) ] Updating weights: 400 -> 300
DBG [ ConvolutionLayer::OnLayerConnect(695) ] Updating weights: 147 -> 100
DBG [ Trainer::Trainer(22) ] Instance created
DBG [ Trainer::Trainer(37) ] Optimizing 10 sets of parameters.
DBG [ Trainer::Trainer(57) ] Weights: 40082
INF [ Trainer::Trainer(60) ] Training settings: LR: 0.0001, GM: 0.003, EX: 0.75, SB: 10, PB: 2, L1: 0.001, L2: 0.0005, MM: 0.9
INF [ main(247) ] Enter "help" for information on how to use this program

train epochs=1

DBG [ Net::SetTestOnlyStatDisabled(174) ] Confusion matrix layer disabled: 0
DBG [ Trainer::Epoch(162) ] Epoch: 0, it: 100, bsize: 20, lr0: 0.0001

ERR [ Tensor::MoveToGPU(407) ] FATAL: Error moving to GPU: -4
terminate called after throwing an instance of 'std::runtime_error'
what(): See log for details.
Aborted (core dumped)

High CPU, no GPU utilization in training

Hi,

I started training, the log shows the GPU:

INF [ CLHelper::Init(216) ] Using OpenCL device: GRID K520
DBG [ CLHelper::Init(217) ] Image support: Yes
DBG [ CLHelper::Init(228) ] Creating OpenCL context...
DBG [ CLHelper::Init(239) ] Creating OpenCL command queue...

And then:

 > train

INF [ Trainer::Epoch(156) ] Epoch: 0, it: 500, bsize: 16, current lr: 0.0001

[nothing happens]
[when setting the method=patch, the progress is prompt]

I'm seeing all 8 CPUs at 100%, but nvidia-smi indicates little GPU usage. What does this mean?

FB Memory Usage
    Total                       : 4095 MiB
    Used                        : 64 MiB
    Free                        : 4031 MiB
BAR1 Memory Usage
    Total                       : 128 MiB
    Used                        : 2 MiB
    Free                        : 126 MiB
Compute Mode                    : Default
Utilization
    Gpu                         : 0 %
    Memory                      : 0 %
    Encoder                     : 0 %
    Decoder                     : 0 %
Temperature
    GPU Current Temp            : 41 C
    GPU Shutdown Temp           : 97 C
    GPU Slowdown Temp           : 92 C
Power Readings
    Power Management            : Supported
    Power Draw                  : 42.09 W
    Power Limit                 : 125.00 W
    Default Power Limit         : 125.00 W
    Enforced Power Limit        : 125.00 W
    Min Power Limit             : 85.00 W
    Max Power Limit             : 130.00 W
Clocks
    Graphics                    : 797 MHz
    SM                          : 797 MHz
    Memory                      : 2500 MHz
Compute Processes
    Process ID                  : 20643
        Name                    : /home/ubuntu/cn24/build/trainNetwork
        Used GPU Memory         : 52 MiB

Training with more than four channels

I have an image with three different shapes and another image with 4 classes, 3 marking the shapes and one class for marking the background. I created a .CTensor from these and wrote a .set file which associates the colors to classes using

makeCompressedTensorStream testclass.set images . labels . testclass.CTensor false

When I try to train a network with the four classes, CN24 errors out with:

ERR [ ErrorLayer::CreateOutputs(49) ] Inputs need the same number of elements!

The same also happens when I modify the toy example to use four classes.

Am I missing something or is that a bug?

EDIT: this is on master, not develop

Minimal working example for training net.

Here is doc how to import datsets, but I still don't understand in what format labels should be.
https://github.com/cvjena/cn24/wiki/Importing-Datasets
So I requesting for minimum working example for training net with training data(not as tensors but with images), maybe we can adapt some small subset of images from Pacal Voc Segmentation Task ?
http://host.robots.ox.ac.uk/pascal/VOC/voc2007/segexamples/index.html

Add IU evaluation method

Some datasets recommend using the intersection-over-union metric. Add a layer to calculate this metric.

Add missing code paths for multi-class problems

The following files are missing multi-class handling at this time:

tools/makeTensorStream.cpp: when importing label RGB images.
src/util/Colorize.cpp: when colorizing classification outputs.

Multiple GPUs

Can cn24 take advantage of multiple GPUs?

ERR [ Tensor::Deserialize(303) ] Memory map failed: 12 / bad alloc

Hello,

I have two tensors:

the first is 1.6G, contains 25916 image/map pairs
the second (testing) is 464M, 7406 image/map pairs
When I run trainNetwork, I get the following errors and the program crashes. Stack trace at the end.

I didn't get these errors when using similarly sized tensors with fewer images. Currently there are more because I cut the original images into patches, otherwise the contents are the same. Why the errors?

(...)
ERR [ Tensor::Deserialize(303) ] Memory map failed: 12
ERR [ Tensor::Deserialize(303) ] Memory map failed: 12.
ERR [ Tensor::Deserialize(303) ] Memory map failed: 12
ERR [ Tensor::Deserialize(303) ] Memory map failed: 12.
ERR [ Tensor::Deserialize(303) ] Memory map failed: 12
ERR [ Tensor::Deserialize(303) ] Memory map failed: 12.
ERR [ Tensor::Deserialize(303) ] Memory map failed: 12
ERR [ Tensor::Deserialize(303) ] Memory map failed: 12.
ERR [ Tensor::Deserialize(303) ] Memory map failed: 12
ERR [ Tensor::Deserialize(303) ] Memory map failed: 12.
ERR [ Tensor::Deserialize(303) ] Memory map failed: 12
ERR [ Tensor::Deserialize(303) ] Memory map failed: 12.
ERR [ Tensor::Deserialize(303) ] Memory map failed: 12
ERR [ Tensor::Deserialize(303) ] Memory map failed: 12.
DBG [ TensorViewer::TensorViewer(57) ] Instance created.
DBG [ DatasetInputLayer::DatasetInputLayer(32) ] Instance created.
DBG [ DatasetInputLayer::DatasetInputLayer(42) ] Using loss sampling probability: 1
DBG [ DatasetInputLayer::DatasetInputLayer(48) ] Total samples: 7406
DBG [ DatasetInputLayer::DatasetInputLayer(52) ] Generating random permutation...
DBG [ ConfigurableFactory::ConfigurableFactory(88) ] Adding convolutional layer to receptive field (12,12s4,4p0,0)
DBG [ ResizeLayer::ResizeLayer(23) ] Instance created, border size: (11, 11)
DBG [ ConfigurableFactory::AddLayers(484) ] Parsing layer: convolutional kernels=8 size=12x12 stride=4x4
DBG [ ConfigurableFactory::AddLayers(497) ] Parsed dropout fraction: 0
DBG [ ConvolutionLayer::ConvolutionLayer(65) ] Instance created. 8 output maps with 12x12 kernels, stride: 4x4, padding: 0x0, group: 1.
DBG [ ConvolutionLayer::ConvolutionLayer(68) ] Dropout fraction: 0
DBG [ Layer::SetLocalLearningRate(78) ] Setting local learning rate to 1
DBG [ ConfigurableFactory::AddLayers(484) ] Parsing layer: relu
DBG [ ReLULayer::ReLULayer(59) ] Instance created, nl: ReLU
DBG [ ConfigurableFactory::AddLayers(484) ] Parsing layer: convolutional size=1x1 kernels=1
DBG [ ConfigurableFactory::AddLayers(497) ] Parsed dropout fraction: 0
DBG [ ConvolutionLayer::ConvolutionLayer(65) ] Instance created. 1 output maps with 1x1 kernels, stride: 1x1, padding: 0x0, group: 1.
DBG [ ConvolutionLayer::ConvolutionLayer(68) ] Dropout fraction: 0
DBG [ Layer::SetLocalLearningRate(78) ] Setting local learning rate to 1
DBG [ ConfigurableFactory::AddLayers(484) ] Parsing layer: tanh
DBG [ TanhLayer::TanhLayer(57) ] Instance created, nl: Tanh
DBG [ UpscaleLayer::UpscaleLayer(18) ] Instance created: 4x4 upscaling.
DBG [ ConfigurableFactory::AddLayers(647) ] Added upscaling layer for FCN
DBG [ ErrorLayer::ErrorLayer(18) ] Instance created.
DBG [ NetGraph::IsComplete(119) ] Node is okay: Dataset Input Layer
DBG [ NetGraph::IsComplete(119) ] Node is okay: Resize Layer (11x11)
DBG [ NetGraph::IsComplete(119) ] Node is okay: Convolutional Layer (8 kernels @ 12x12)
DBG [ NetGraph::IsComplete(119) ] Node is okay: ReLU Layer
DBG [ NetGraph::IsComplete(119) ] Node is okay: Convolutional Layer (1 kernels @ 1x1)
DBG [ NetGraph::IsComplete(119) ] Node is okay: Tanh Layer
DBG [ NetGraph::IsComplete(119) ] Node is okay: Upscale Layer (4x4)
DBG [ NetGraph::IsComplete(119) ] Node is okay: Square Loss Layer (Weight: 1)
DBG [ NetGraph::IsComplete(137) ] Graph check complete.
DBG [ main(194) ] Testing graph complete: 1
DBG [ BinaryStatLayer::BinaryStatLayer(21) ] terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Instance created. Using 13 thresholds from -1 to 1
Program received signal SIGABRT, Aborted.
0x00007ffff7281cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.

(gdb) bt
#0  0x00007ffff7281cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff72850d8 in __GI_abort () at abort.c:89
#2  0x00007ffff78866b5 in __gnu_cxx::__verbose_terminate_handler() ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff7884836 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff7884863 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff7884aa2 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff7884f8d in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007ffff7885029 in operator new[](unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007ffff7b7ad9e in Conv::Tensor::Resize(unsigned long, unsigned long, unsigned long, unsigned long, float*, bool) () from /home/ubuntu/cn24/libcn24.so
#9  0x00007ffff7b7aaf5 in Conv::Tensor::Tensor(unsigned long, unsigned long, unsigned long, unsigned long)
    () from /home/ubuntu/cn24/libcn24.so
#10 0x00007ffff7b96a0c in Conv::CombinedTensor::CombinedTensor(unsigned long, unsigned long, unsigned long, unsigned long) () from /home/ubuntu/cn24/libcn24.so
#11 0x00007ffff7ba8538 in Conv::DatasetInputLayer::CreateOutputs(std::vector<Conv::CombinedTensor*, std::allocator<Conv::CombinedTensor*> > const&, std::vector<Conv::CombinedTensor*, std::allocator<Conv::CombinedTensor*> >&) () from /home/ubuntu/cn24/libcn24.so
#12 0x00007ffff7b8b485 in Conv::NetGraph::InitializeNode(Conv::NetGraphNode*) ()
   from /home/ubuntu/cn24/libcn24.so
#13 0x00007ffff7b8b2ad in Conv::NetGraph::Initialize() () from /home/ubuntu/cn24/libcn24.so
#14 0x000000000040b54e in main ()

Add/correct doxygen information for all headers

Some public header files are missing documentation. Backslash and @ documentation styles are mixed. Complete the documentation and use @ everywhere.

Move Dropout functionality to ConvolutionLayer

We don't need other parameterized layers anymore, ConvolutionLayer is the only layer that needs to support Dropout. Remove DropoutLayer and add Dropout functionality to ConvolutionLayer (all implementations).

Network from the cn24 paper

Dear all,

I am trying to replicate the network provided in "Constitutional Patch Networks with Spatial Prior for Road Detection and Urban Scene Understanding" using the cn24 library.

I am having an error: WRN [ NetGraph::IsComplete(141) ] Net has no outputs!
The network is provided below:

Network configuration

?convolutional kernels=12 size=7x7
?maxpooling size=2x2
?relu

?convolutional size=5x5 kernels=6
?relu

?fullyconnected neurons=48
?relu

?fullyconnected neurons=192
?spatialprior
?relu

?fullyconnected neurons=1
?tanh
?output

Learning settings

l1=0.001
l2=0.0005
lr=0.0001
gamma=0.003
momentum=0.9
exponent=0.75
iterations=100
sbatchsize=10
pbatchsize=2

Thank you in advance.

Write wiki/man pages for tools

The following tools are missing a wiki/man page:

makeTensorStream
classifyImage
tensorTool
trainNetwork

Compilation issue on windows 10

The error is that CN24 needs a 64-bit system. However I am working on 64-bit system.
The error log is below:
-- Building for: Visual Studio 14 2015
-- The C compiler identification is MSVC 19.0.24215.1
-- The CXX compiler identification is MSVC 19.0.24215.1
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/cl.exe
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/cl.exe -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/cl.exe
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/cl.exe -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at CMakeLists.txt:31 (message):
CN24 needs a 64-bit system!

-- Configuring incomplete, errors occurred!
See also "C:/KD/Semantic Segmentation/build/CMakeFiles/CMakeOutput.log".

Fix ConfusionMatrixLayer output during training.

There are some issues with the output from ConfusionMatrixLayer:

Class labels can be too long, handle this gracefully
Use the right colors and output when training as opposed to testing

Add PNG and JPEG write support to classifyImage tool

classifyImage can only write binary Tensor files at this time. Add support for writing to PNGLoader and JPGLoader.

Path to OpenCL kernels invalid if working directory != build directory

classifyImage terminates with

ERR [ CLHelper::CreateProgram(267) ] FATAL: Cannot open kernel: kernels/crossCorrelation.cl

if classifyImage is called from a directory, that is not the build directory.

How to reproduce:

git clone https://github.com/cvjena/cn24.git
mkdir build && cd build
ccmake (enable OpenCL support)
make
cd ../examples 
../build/classifyImage kitti_um_road.set kitti.net kitti_pretrained.Tensor sample1.jpg out1.jpg

Probably the path to the kernel is relative to pwd instead of relative to the executable classifyImage.

Problem about

Hi I was trying to create the .CTensor file using the command provided in wiki, but for some reason my files cannot be read properly. Attached is a screenshot of the error, and a screenshot of the text files of image and label names. Could you please help me with it? What might the potential errors be? Thanks!

Create separate implementation for 1x1 convolutional layers.

1x1 convolutional layers can use a much more simple implementation than arbitrary size convolutional layers. This may also improve performance.

Add loss sampling for FCN operation

Always using complete images for training reduces the number of possible batches and can lead to problems with optimization. Add spatial loss sampling by zeroing out the localized error with a certain probablity. This should be configurable. The batch size needs to be increased to account for smaller gradients.

Bus error when testing

When I type test, the CPU usage goes up and nothing happens for about a minute, then a message appears about a bus error. Could this have to do with image size (1500x1500)?

INF [ Trainer::Epoch(156) ] Epoch: 99, it: 500, bsize: 16, current lr: 2.33894e-06
.....10%.....20%.....30%.....40%.....50%.....60%.....70%.....80%.....90%....
DBG [ Trainer::Epoch(221) ] Training, sps: 8628.59
DBG [ Trainer::Epoch(226) ] Training, tps: 115.894 us
DBG [ Trainer::Epoch(232) ] Training, GB/s   up: 0.000128576
DBG [ Trainer::Epoch(233) ] Training, GB/s down: 0.000128576
INF [ Trainer::Epoch(241) ] Training (Epoch 99, node 0) Square Loss Layer (Weight: 1) lps: 0.138364
INF [ Trainer::Epoch(243) ] Training (Epoch 99) aggregate lps: 0.138364
RESULT --- Training  - Epoch 99 - F1 : 12.6683% (t=-1)
RESULT --- Training  - Epoch 99 - ACC: 6.7625%
RESULT --- Training  - Epoch 99 - PRE: 6.7625%
RESULT --- Training  - Epoch 99 - REC: 100%
RESULT --- Training  - Epoch 99 - FPR: 100%
RESULT --- Training  - Epoch 99 - FNR: 0%
DBG [ Trainer::Reset(80) ] Resetting Trainer state
INF [ NetGraph&, Conv::NetGraph&, Conv::Trainer&, Conv::Trainer&, bool, std::string&)(296) ] Training complete.
 > test

DBG [ Trainer::Reset(80) ] Resetting Trainer state
DBG [ DatasetInputLayer::SetTestingMode(242) ] Enabled testing mode.
DBG [ Trainer::Test(90) ] ./trainnet.sh: line 3: 20806 Bus error               (core dumped) /home/ubuntu/cn24/build/trainNetwork -v /home/ubuntu/data/config.set /home/ubuntu/data/arch.net

Licensing issue with libreadline

Readline is licensed under the GNU GPL, linking against it makes the whole library fall under this license. I see two possible solutions here:

Add a CMake flag to disable the shell and skip linking to readline. This is what I'll do for our copy of the lib. I'm using CN24 as a library, so that's OK for me.
Integrate a readline alternative. There's editline and linenoise

Problem with bad_alloc error

Hello,

before training my own datasets I first wanted to check the training on the toy example and I am running into difficulties.

I first formed the TensorStream from data with the command:
sudo ./makeCompressedTensorStream mypath/cn24/example/toy/toy.set mypath/cn24/example/toy/images mypath/cn24/example/toy mypath/cn24/example/toy/labels mypath/cn24/example/toy DATASET_TRAIN.CTensor false

the contents of my dataset_config.set is:
training=DATASET_TRAIN.CTensor

when I next run the command:
sudo ./trainNetwork dataset_config.set toy01.net

I get the following error:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc

Note that I get a similar error if I want to inspect tensor with TensorTool, if I run command "sudo ./tensorTool DATASET_TRAIN.CTensor":
" terminate called after throwing an instance of 'std::bad_alloc'"

I have also noticed I do not get errors above if I build the tensorstream with makeTensorStream command instead of the makeCompressedTensorstream command. If I run TensorTool on such TensorStream I get the following output:
INF [ main(56) ] Tensor: (1s@14x7x1m)
INF [ main(57) ] Enter "help" for information on how to use this program

However if I run the above command trainNetwork on tensorstream formed with makeTensorStream I get a different error:
INF [ main(104) ] Using hybrid patchwise training
INF [ main(111) ] Loading dataset, this can take a long time depending on the size!
INF [ TensorStreamPatchDataset::TensorStreamPatchDataset(119) ] Deserializing 1 Tensors...
.
ERR [ SimpleLayer::Connect(37) ] Nodes failed validation and did not connect!
ERR [ NetGraph::InitializeNode(285) ] FATAL: Layer will not connect: Convolutional Layer (0 kernels @ 1x1)
terminate called after throwing an instance of 'std::runtime_error'
what(): See log for details.

Could you please help me with the problem? What could be the reasons?

Thanks!

FATAL: Layer has dynamic input but doesn't support it: Confusion Matrix Layer

When trying to load the toy_net.json like this

net-load toy_net.json classification

it fails with

No error layer type specified!

After adding "error_layer": "dummy" (also tried 'no' and 'to the JSON it finally fails with:

FATAL: Layer has dynamic input but doesn't support it: Confusion Matrix Layer

Try loss sampling with very large regions

Commenting out #method=patch doesn't work in arch file

When I put # in front of method=patch, the network still goes into patch mode. Need to delete the line altogether to disable it. I'd expect commenting out to be enough for this.