GithubHelp home page GithubHelp logo

caffe-jacinto-models's People

Contributors

mathmanu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

caffe-jacinto-models's Issues

Check failed when testing quantize in JDetNet

Hello,i run the tarin_image_object_detection.sh, and got some problems in the test_quantize phase.
Here is the log.
I0910 14:36:28.498530 13312 common.cpp:475] GPU 0 'TITAN Xp' has compute capability 6.1
I0910 14:36:29.036025 13312 caffe.cpp:902] This is NVCaffe 0.17.0 started at Mon Sep 10 14:36:28 2018
I0910 14:36:29.036056 13312 caffe.cpp:904] CuDNN version: 7104
I0910 14:36:29.036072 13312 caffe.cpp:905] CuBLAS version: 9000
I0910 14:36:29.036077 13312 caffe.cpp:906] CUDA version: 9000
I0910 14:36:29.036082 13312 caffe.cpp:907] CUDA driver version: 9010
I0910 14:36:29.036089 13312 caffe.cpp:908] Arguments:
[0]: /home/junxiang/caffe-jacinto/build/tools/caffe.bin
[1]: test_detection
[2]: --model=training/voc0712/JDetNet/20180828_14-57_ds_PSP_dsFac_32_hdDS8_1/test_quantize/test.prototxt
[3]: --iterations=496
[4]: --weights=training/voc0712/JDetNet/20180828_14-57_ds_PSP_dsFac_32_hdDS8_1/sparse/voc0712_ssdJacintoNetV2_iter_120000.caffemodel
…………………………………………

I0910 14:36:48.912307 13312 net.cpp:2195] Enabling quantization at output of: Concat mbox_loc
I0910 14:36:48.912477 13312 net.cpp:2195] Enabling quantization at output of: Concat mbox_conf
I0910 14:36:48.912649 13312 net.cpp:2195] Enabling quantization at output of: Concat mbox_priorbox
I0910 14:36:48.917215 13350 common.cpp:192] New stream 0x7fa3ac006960, device 0, thread 13350
F0910 14:36:48.941680 13312 permute_layer.cu:70] Check failed: error == cudaSuccess (7 vs. 0) too many resources requested for launch
*** Check failure stack trace: ***
@ 0x7fa4660295cd google::LogMessage::Fail()
@ 0x7fa46602b433 google::LogMessage::SendToLog()
@ 0x7fa46602915b google::LogMessage::Flush()
@ 0x7fa46602be1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7fa467a7ce48 caffe::PermuteLayer<>::Forward_gpu()
@ 0x7fa4673e8a7f caffe::Layer<>::Forward()
@ 0x7fa4672561fe caffe::Net::ForwardFromTo()
@ 0x7fa46725633d caffe::Net::Forward()
@ 0x44cc4b test_detection()
@ 0x4521f2 main
@ 0x7fa4647ab830 __libc_start_main
@ 0x449699 _start
@ (nil) (unknown)

Can you tell me how to solve this problem?

JDetNet failed while training

I am using v0.17

I0810 14:59:47.258464 25579 solver.cpp:352] Iteration 7800 (1.27604 iter/s, 78.3675s/100 iter), 15.1/452.4ep, loss = 5.78276
I0810 14:59:47.258651 25579 solver.cpp:376] Train net output #0: mbox_loss = 6.02961 (* 1 = 6.02961 loss)
I0810 14:59:47.258673 25579 sgd_solver.cpp:172] Iteration 7800, lr = 0.01, m = 0.9, wd = 0.0001, gs = 1
I0810 15:01:05.190842 25579 solver.cpp:352] Iteration 7900 (1.28319 iter/s, 77.9311s/100 iter), 15.3/452.4ep, loss = 5.22862
I0810 15:01:05.191012 25579 solver.cpp:376] Train net output #0: mbox_loss = 5.81463 (* 1 = 5.81463 loss)
I0810 15:01:05.191031 25579 sgd_solver.cpp:172] Iteration 7900, lr = 0.01, m = 0.9, wd = 0.0001, gs = 1
I0810 15:02:23.070077 25579 solver.cpp:905] Snapshotting to binary proto file training/voc0712/JDetNet/_ds_PSP_dsFac_32_hdDS8_1/initial/voc0712_ssdJacintoNetV2_iter_8000.caffemodel
I0810 15:02:23.083901 25580 net.cpp:1071] Ignoring source layer mbox_loss
I0810 15:02:23.112282 25579 sgd_solver.cpp:398] Snapshotting solver state to binary proto file training/voc0712/JDetNet/_ds_PSP_dsFac_32_hdDS8_1/initial/voc0712_ssdJacintoNetV2_iter_8000.solverstate
I0810 15:02:23.131805 25579 solver.cpp:635] Iteration 8000, Testing net (#0)
F0810 15:02:23.132831 25579 net.cpp:1081] Check failed: target_blobs[j]->shape() == source_blob->shape() Cannot share param 0 weights from layer 'conv1a/bn'; shape mismatch. Source param shape is 1 32 1 1 (32); target param shape is 32 (32)
*** Check failure stack trace: ***
@ 0x7f43d58d85cd google::LogMessage::Fail()
@ 0x7f43d58da433 google::LogMessage::SendToLog()
@ 0x7f43d58d815b google::LogMessage::Flush()
@ 0x7f43d58dae1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f43d6879b3e caffe::Net::ShareTrainedLayersWith()
@ 0x7f43d68edc6c caffe::Solver::TestDetection()
@ 0x7f43d68f0a87 caffe::Solver::TestAll()
@ 0x7f43d68f1347 caffe::Solver::Step()
@ 0x7f43d68f2d62 caffe::Solver::Solve()
@ 0x7f43d68e3c0a caffe::P2PSync::InternalThreadEntry()
@ 0x7f43d6472b2c caffe::InternalThread::entry()
@ 0x7f43d6474ddb boost::detail::thread_data<>::run()
@ 0x7f43d3dcf5d5 (unknown)
@ 0x7f43d38a06ba start_thread
@ 0x7f43d40eb41d clone
@ (nil) (unknown)
Aborted (core dumped)

Regarding performance comparison

Is there any performance comparison done for Object Detection for following two models:
jdetnet21 vs mobilenet
I see jdetnet21 is loosely based on mobilenet. Are there any particular benefits of using one vs other.

Hi,I have the problem in quantized in your paper.

`
class Quantized:
def init(self):
pass

@staticmethod
def check_valid(key):
    invalid = ['running_mean', 'running_var', 'num_batches_tracked']
    for iv in invalid:
        if iv in key:
            return False
    
    return True

def get_range(self, metric):
    minr = float('inf')
    maxr = -float('inf')

    minr = min(minr, torch.min(metric.cpu()).item())
    maxr = max(maxr, torch.max(metric.cpu()).item())
    return minr, maxr     

@staticmethod
def clip(metric, mq):
    metric = torch.clamp(torch.round(metric * mq), -128, 127)
    return metric / mq

def __call__(self, state_dict):
    for key, value in enumerate(state_dict):
        if Quantized.check_valid(value):
            metric = state_dict[value]
            minr, maxr = self.get_range(metric)
            int_len = math.log2(max(abs(minr), abs(maxr))) + 1
            fac_len = 8 - int_len
            
            mq = math.pow(2, fac_len)
            metric = Quantized.clip(metric, mq)
            state_dict[value] = metric
            print("minr: {} maxr: {}".format(minr, maxr))
    print("quantized complete ^_^")
    return state_dict

`
I write it in pytorch, but i have a poor result. I need your help.
1.How can i get the range [Rmin, Rmax]?

In some scripts named train_xx_.sh, weights_src URLs need to be modified from caffe-0.15 to caffe-0.17

how to build it using CPU_ONLY??

Dear,
I try to build with:

USE_CUDNN := 0
CPU_ONLY := 1

But always get error:

CXX src/caffe/blob.cpp
In file included from ./include/caffe/common.hpp:48:0,
from ./include/caffe/blob.hpp:11,
from src/caffe/blob.cpp:6:
./include/caffe/util/device_alternate.hpp:3:23: fatal error: cublas_v2.h: No such file or directory
#include <cublas_v2.h>
^
compilation terminated.
make: *** [.build_release/src/caffe/blob.o] Error 1

How to build it without GPU?

Thanks and best regards
He Wei

training error ./scripts/train_image_object_detection.sh

I0611 00:04:31.663493 8740 net.cpp:403] Top memory (TEST) required for data: 1703675240 diff: 1703675240
I0611 00:04:31.663501 8740 net.cpp:406] Bottom memory (TEST) required for data: 1703674816 diff: 1703674816
I0611 00:04:31.663507 8740 net.cpp:409] Shared (in-place) memory (TEST) by data: 695552000 diff: 695552000
I0611 00:04:31.663511 8740 net.cpp:412] Parameters memory (TEST) required for data: 19132568 diff: 19132568
I0611 00:04:31.663516 8740 net.cpp:415] Parameters shared memory (TEST) by data: 0 diff: 0
I0611 00:04:31.663519 8740 net.cpp:421] Network initialization done.
F0611 00:04:31.663878 8740 io.cpp:55] Check failed: fd != -1 (-1 vs. -1) File not found: training/voc0712/JDetNet/20190611_00-04_ds_PSP_dsFac_32_hdDS8_1/sparse/voc0712_ssdJacintoNetV2_iter_120000.caffemodel
*** Check failure stack trace: ***
@ 0x7f6b599f75cd google::LogMessage::Fail()
@ 0x7f6b599f9433 google::LogMessage::SendToLog()
@ 0x7f6b599f715b google::LogMessage::Flush()
@ 0x7f6b599f9e1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f6b5a5d84dc caffe::ReadProtoFromBinaryFile()
@ 0x7f6b5a653de6 caffe::ReadNetParamsFromBinaryFileOrDie()
@ 0x7f6b5aa99dea caffe::Net::CopyTrainedLayersFromBinaryProto()
@ 0x7f6b5aa99e8e caffe::Net::CopyTrainedLayersFrom()
@ 0x41202c test_detection()
@ 0x40d1d0 main
@ 0x7f6b58179830 __libc_start_main
@ 0x40de69 _start
@ (nil) (unknown)

error occur while training

Hi,
While am trying to train the initial training am getting the below error.

I0302 02:22:22.283617 5159 common.cpp:528] NVML initialized, thread 5159
I0302 02:22:22.285171 5132 net.cpp:1071] Ignoring source layer mbox_loss
F0302 02:22:22.321794 5132 solver.cpp:668] Check failed: result[j]->width() == 5 (3 vs. 5)
*** Check failure stack trace: ***
@ 0x7f65e6a015cd google::LogMessage::Fail()
@ 0x7f65e6a03433 google::LogMessage::SendToLog()
@ 0x7f65e6a0115b google::LogMessage::Flush()
@ 0x7f65e6a03e1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f65e7a09a38 caffe::Solver::TestDetection()
@ 0x7f65e7a0a857 caffe::Solver::TestAll()
@ 0x7f65e7a0b3bc caffe::Solver::Step()
@ 0x7f65e7a0d512 caffe::Solver::Solve()
@ 0x410732 train()
@ 0x40d310 main
@ 0x7f65e5034830 __libc_start_main
@ 0x40dfa9 _start
@ (nil) (unknown)

Kindly share your comments.

terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::lock_error> >' what(): boost: mutex lock failed in pthread_mutex_lock: Invalid argument

Hi,
when I try to train my models, problems happened in the end of every stage. I mean 'initial'、'l1reg'、‘sparse’ and so on. The main work of every stage seems to be done, but no result charts saved compared to the example stored in the './trained' fold. The problems is like due to the multi-threads. the run log shows as follows.

I0908 09:01:46.720330 7901 caffe.cpp:268] Solver performance on device 0: 1.667 * 32 = 53.33 img/sec (6 itr in 2.4 sec)
I0908 09:01:46.720353 7901 caffe.cpp:271] Optimization Done in 16s
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injectorboost::lock_error >'
what(): boost: mutex lock failed in pthread_mutex_lock: Invalid argument
*** Aborted at 1567904507 (unix time) try "date -d @1567904507" if you are using GNU date ***
PC: @ 0x7fe0f0f8ae97 gsignal
*** SIGABRT (@0x3e800001edd) received by PID 7901 (TID 0x7fe079fff700) from PID 7901; stack trace: ***
@ 0x7fe0f0f8af20 (unknown)
@ 0x7fe0f0f8ae97 gsignal
@ 0x7fe0f0f8c801 abort
@ 0x7fe0f1d9e957 (unknown)
@ 0x7fe0f1da4ab6 (unknown)
@ 0x7fe0f1da4af1 std::terminate()
@ 0x7fe0f1da4d24 __cxa_throw
@ 0x7fe0f3389734 boost::throw_exception<>()
@ 0x7fe0f33898c7 boost::unique_lock<>::lock()
@ 0x7fe0f37c51af caffe::BlockingQueue<>::push()
@ 0x7fe0f3484e76 caffe::AnnotatedDataLayer<>::load_batch()
@ 0x7fe0f3748656 caffe::BasePrefetchingDataLayer<>::InternalThreadEntryN()
@ 0x7fe0f33688c6 caffe::InternalThread::entry()
@ 0x7fe0f336b03b boost::detail::thread_data<>::run()
@ 0x7fe0e6f92bcd (unknown)
@ 0x7fe0d0ade6db start_thread
@ 0x7fe0f106d88f clone

Problem with mobiledetnetv2

Hi,
When I tried with mobiledetnetv2, I got the following problem:
solver_param: {'type': 'SGD', 'max_iter': 120000, 'stepvalue': [60000, 90000, 300000], 'base_lr': 0.01, 'lr_policy': 'multistep', 'power': 1.0, 'weight_decay': 0.0001}
config_param: {'model_name': 'mobiledetnetv2-0.5', 'config_name': '/home/user/projects/caffe-jdetnet/trained_models/rovit_traffic_dataset/mobiledetnetv2-0.5/2019_01_24/ssd_256x256_ds_PSP_dsFac_32_hdDS8_1/initial', 'gpus': '0', 'threads': 8, 'pretrain_model': '/home/user/projects/caffe-jdetnet/pretrained_models/imagenet_mobilenet-0.5_iter_320000.caffemodel', 'dataset': 'rovit_traffic_dataset', 'train_data': '/media/user/DATA/data/rovit_traffic_dataset/lmdb/rovit_traffic_dataset_trainval_lmdb', 'test_data': '/media/user/DATA/data/rovit_traffic_dataset/lmdb/rovit_traffic_dataset_test_lmdb', 'name_size_file': '/media/user/DATA/data/rovit_traffic_dataset/test_name_size.txt', 'label_map_file': '/media/user/DATA/data/rovit_traffic_dataset/labelmap_rovit_traffic_dataset.prototxt', 'num_test_image': 21375, 'num_classes': 8, 'ssd_size': '256x256', 'use_batchnorm_mbox': 1, 'small_object': 1, 'mean_value': 128, 'use_batchnorm': False, 'use_scale': True, 'lr_mult': 1, 'kernel_mbox_loc_conf': 1, 'chop_num_heads': 0, 'num_intermediate': 512, 'rhead_name_non_linear': 0, 'first_hd_same_op_ch': 1, 'reg_head_at_ds8': 1, 'aspect_ratio_type': 1, 'concat_reg_head': 0, 'base_nw_3_head': 0, 'use_difficult_gt': 1, 'evaluate_difficult_gt': 0, 'ignore_difficult_gt': False, 'fully_conv_at_end': 0, 'force_color': 0, 'shuffle': 1, 'use_image_list': 1, 'log_space_steps': 0, 'min_ratio': 5, 'max_ratio': 85, 'batch_size': 16, 'accum_batch_size': 16, 'test_batch_size': 8, 'feature_stride': 32, 'num_feature': 32, 'ds_type': 'PSP', 'ds_fac': 32, 'min_dim': 256, 'resize_width': 256, 'resize_height': 256, 'crop_width': 256, 'crop_height': 256, 'run_soon': True, 'resume_training': True, 'remove_old_models': False, 'stride_list': None, 'dilation_list': None, 'freeze_layers': [], 'flip': True, 'clip': False, 'share_location': True, 'background_label_id': 0, 'normalization_mode': 1, 'code_type': 2, 'ignore_cross_boundary_bbox': False, 'mining_type': 1, 'neg_pos_ratio': 3.0, 'loc_weight': 1.0}
config_param.ds_fac: 32
config_param.stride_list: [2, 2, 2, 2, 2]
Traceback (most recent call last):
File "/home/user/projects/caffe-jdetnet/train_jdetnet.py", line 221, in
train(config_param, solver_param, caffe_cmd)
File "/home/user/projects/caffe-jdetnet/models/train_jdetnet_model.py", line 768, in train
net, out_layer, out_layer_names = CoreNetwork(config_param, net, out_layer)
File "/home/user/projects/caffe-jdetnet/models/train_jdetnet_model.py", line 338, in CoreNetwork
num_intermediate=config_param['num_intermediate'])
File "/home/user/projects/caffe-jdetnet/models/mobilenetv2.py", line 230, in mobiledetnetv2
num_input = num_channels[from_layer]
KeyError: 'relu5_5/sep'
Is this the problem about model architect? How to solve it?
Thank you.

caffe-0.17 mobilenet object detection TIDL import configuration

I was able to train mobilenet, however I believe EVE cores are loaded too much so that there is no image.

  1. I might be missing a step in configuring it for TIDL import tool.
    So far I have;
# Default - 0
randParams         = 0

# 0: Caffe, 1: TensorFlow, Default - 0
modelType          = 0

# 0: Fixed quantization By tarininng Framework, 1: Dyanamic quantization by TIDL, Default - 1
quantizationStyle  = 1

# quantRoundAdd/100 will be added while rounding to integer, Default - 50
quantRoundAdd      = 25

numParamBits       = 8
# 0 : 8bit Unsigned, 1 : 8bit Signed Default - 1
inElementType      = 0

inputNetFile       = "deploy.prototxt
inputParamsFile       = "voc0712_mobiledetnet-0.5_iter_120000.caffemodel"
outputNetFile      = "NET_OD_mobilenet.bin"
outputParamsFile   = "PRM_OD_mobilenet.bin"

rawSampleInData = 1
preProcType   = 4
sampleInData = "trace_dump_0_768x320.y"
tidlStatsTool = "eve_test_dl_algo.out.exe"

Does anyone verify using mobilenet with TIDL OD usecase, if so could you share the import configuration please.

  1. Additionally, I am using deploy.prototxt and voc0712_mobiledetnet-0.5_iter_120000.caffemodel from scripts/training/../initial folder as other folders do not have caffe model, is this correct?

Thanks in advance.

Get error when use 3 or 4 gpus to train model

I installed NCCL to use more gpus to train model.
install step:

  1. git clone https://github.com/NVIDIA/nccl.git
  2. cd nccl
  3. sudo make install -j8
  4. remove Makefile.config USE_NCCL comment

When I train model I use below instruction:
$CAFFE_ROOT/build/tools/caffe train --solver="models/ssd/${PROJECT}/initial/solver.prototxt" --weights="models/ssd/${PROJECT}/initial/${PRETRAINED}" -gpu 0,1,2

it get error:
image

image

But I can use 2 gpus to train.
Did I loss something instruction?

make: *** [.build_release/lib/libcaffe-nv.so.0.17.0] Error 1

I have gone through all the steps and this came out.

I tried the symbolic link solution for libturbojpeg but did not change.

CXX .build_release/src/caffe/proto/caffe.pb.cc
./3rdparty/half_float/half.hpp(1659): warning: calling a host function("half_float::detail::round_half<( ::std::float_round_style)1> ") from a host device function("half_float::detail::functions::rint") is not allowed

./3rdparty/half_float/half.hpp(1659): warning: calling a host function("half_float::detail::round_half<( ::std::float_round_style)1> ") from a host device function("half_float::detail::functions::rint") is not allowed

./3rdparty/half_float/half.hpp(1659): warning: calling a host function("half_float::detail::round_half<( ::std::float_round_style)1> ") from a host device function("half_float::detail::functions::rint") is not allowed

./3rdparty/half_float/half.hpp(1659): warning: calling a host function("half_float::detail::round_half<( ::std::float_round_style)1> ") from a host device function("half_float::detail::functions::rint") is not allowed

AR -o .build_release/lib/libcaffe-nv.a
LD -o .build_release/lib/libcaffe-nv.so.0.17.0
/usr/bin/ld: cannot find -lopenblas
collect2: error: ld returned 1 exit status
Makefile:600: recipe for target '.build_release/lib/libcaffe-nv.so.0.17.0' failed
make: *** [.build_release/lib/libcaffe-nv.so.0.17.0] Error 1

My system is Ubuntu 16.04 , GTX 1080Ti , CUDA 8.0 , CuDNN V6

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.