ljanyst / ssd-tensorflow Goto Github PK

View Code? Open in Web Editor NEW

169.0 10.0 89.0 262 KB

A Single Shot MultiBox Detector in TensorFlow

Home Page: http://jany.st/post/2017-11-05-single-shot-detector-ssd-from-scratch-in-tensorflow.html

License: GNU General Public License v3.0

Python 99.71% Shell 0.29%

tensorflow vgg16 fully-convolutional-networks object-detection

ssd-tensorflow's Introduction

SSD-TensorFlow

Overview

The programs in this repository train and use a Single Shot MultiBox Detector to take an image and draw bounding boxes around objects of certain classes contained in this image. The network is based on the VGG-16 model and uses the approach described in this paper by Wei Liu et al. The software is generic and easily extendable to any dataset, although I only tried it with Pascal VOC so far. All you need to do to introduce a new dataset is to create a new source_xxxxxx.py file defining it.

Go here for more info.

Pascal VOC Results

Images and numbers speak louder than a thousand words, so here they are:

Model	Training data	mAP Train	mAP VOC12 test	Reference
vgg300	VOC07+12 trainval and VOC07 Test	79.5%	72.3%	72.4%
vgg512	VOC07+12 trainval and VOC07 Test	82.3%	75.0%	74.9%

Usage

To train the model on the Pascal VOC data, go to the pascal-voc directory and download the dataset:

cd pascal-voc
./download-data.sh
cd ..

You then need to preprocess the dataset before you can train the model on it. It's OK to use the default settings, but if you want something more, it's always good to try the --help parameter.

./process_dataset.py

You can then train the whole thing. It will take around 150 to 200 epochs to get good results. Again, you can try --help if you want to do something custom.

./train.py

You can annotate images, dump raw predictions, print the AP stats, or export the results in the Pascal VOC compatible format using the inference script.

./infer.py --help

To export the model to an inference optimize graph run (use result/result as the name of the output tensor):

./export_model.py

If you want to make detection basing on the inference model, check out:

./detect.py

Have Fun!

ssd-tensorflow's People

Contributors

Stargazers

Watchers

Forkers

changjo xhivaw futurianh1k ligaoyi101 hulk89 hireshgupta1997 dashankadesilva dreadlord1984 nunovb yangasm gfphoenix78 reiisky czhihao jeanchritopher jimmyguo505 airyym alexliyang ahmed3991 kim-kyung-jin fortisaqua jackson2213 jerrymath suri97 hallochen undercontroller wayne315315 jackcc stoneyang guodebby csldali aomeyao olivical pranjalisaini snooble abiller 1vash npc-wang enllauna cooparation flyostrich dongulee mrmoore98 jeffreyyihuang slothfull xxjlxh yueweiyang caoqian2016 leoiv hongpeng1992 khle08 han-lam soumallyab xuhongweih bitisony ericaguoqiuyu roozbehsanaei dattv rickvisual jun20061588 lflyme jps892 jiaminglin zeuspnt jananireethu aaronzinhoo nanersifang layccg bharatha14 ajinkya933 bmdivakar peternara leatherking hdchenjian kmanjari sergeyveneckiy aly-shmahell freewind2016 sudiroeen starfork508 markhsia nick198903

ssd-tensorflow's Issues

train.py error

hello,please help me，when i run ./train.py ,awalys
[i] Creating the model...
36%|██████████████████████████████████████████▊
ConnectionResetError: [Errno 104] Connection reset by peer
Can't always download

Is the coordinate position of the detection box upper left corner, lower right corner or upper left corner with width and height?

data augmentation details

Hi! ljanyst, thanks for sharing your wonderful work, for the result you posted, have you add the image expansion data augmentation?

how low the anchor size

how to use detect.py

Hi Lukas, could you please tell me how to use detect.py?

I exported a model to a pb file, and tried to use detect.py.
I used images with 1280*1024 pixels, but I faced cv2 error.

Please help

2018-04-11 17:03:33.415491: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9886 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:05:00.0, compute capability: 6.1)
0%| | 0/1 [00:00<?, ?it/s]OpenCV Error: Assertion failed (ssize.width > 0 && ssize.height > 0) in resize, file /home/kimken/opencv-3.4.0/modules/imgproc/src/resize.cpp, line 4044

Traceback (most recent call last):
File "detect.py", line 124, in
main()
File "detect.py", line 99, in main
img = cv2.resize(img, (300, 300))
cv2.error: /home/kimken/opencv-3.4.0/modules/imgproc/src/resize.cpp:4044: error: (-215) ssize.width > 0 && ssize.height > 0 in function resize

./detect.py

Traceback (most recent call last):
File "./detect.py", line 125, in
main()
File "./detect.py", line 100, in main
img = cv2.resize(img, (300, 300))
cv2.error: OpenCV(4.0.0-pre) /home/opencv/opencv/modules/imgproc/src/resize.cpp:3784: error: (-215:Assertion failed) !ssize.empty() in function 'resize'

infer.py error

ubuntu@ubuntu-IdeaPad-U430p:/git/ssd-tensorflow-master$ ./infer.py pascal-voc/test/VOCdevkit/VOC2007/JPEGImages
[i] Project name: test
[i] Training data: pascal-voc/training-data.pkl
[i] Batch size: 4
[i] Data source: None
[i] Data directory: pascal-voc
[i] Output directory: test-output
[i] Annotate: False
[i] Dump predictions: False
[i] Sample: test
[i] Threshold: 0.01
[i] Pascal summary: False
[i] Configuring the data source...
[!] Unable to load data source: No module named 'source_None'
ubuntu@ubuntu-IdeaPad-U430p:/git/ssd-tensorflow-master$ ./infer.py pascal-voc/test/VOCdevkit/VOC2007/JPEGImages
[i] Project name: test
[i] Training data: pascal-voc/training-data.pkl
[i] Batch size: 4
[i] Data source: None
[i] Data directory: pascal-voc
[i] Output directory: test-output
[i] Annotate: False
[i] Dump predictions: False
[i] Sample: test
[i] Threshold: 0.01
[i] Pascal summary: False
[i] Compute stats: False
[i] Network checkpoint: test/final.ckpt
[i] Metagraph file: test/final.ckpt.meta
[i] Image size: Size(w=300, h=300)
[i] Number of files: 1
2018-04-07 21:04:04.761128: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2018-04-07 21:04:04.761153: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-04-07 21:04:04.761171: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-04-07 21:04:04.761180: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2018-04-07 21:04:04.761190: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2018-04-07 21:04:04.910995: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-04-07 21:04:04.911423: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: GeForce GT 730M
major: 3 minor: 5 memoryClockRate (GHz) 0.758
pciBusID 0000:09:00.0
Total memory: 1.96GiB
Free memory: 1.71GiB
2018-04-07 21:04:04.911458: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
2018-04-07 21:04:04.911468: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y
2018-04-07 21:04:04.911486: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 730M, pci bus id: 0000:09:00.0)
[i] Creating the model...
[i] Processing samples: 0%| | 0/1 [00:00<?, ?batches/s]OpenCV Error: Assertion failed (ssize.width > 0 && ssize.height > 0) in resize, file /io/opencv/modules/imgproc/src/resize.cpp, line 4044
Traceback (most recent call last):
File "./infer.py", line 281, in
sys.exit(main())
File "./infer.py", line 220, in main
desc=description, unit='batches'):
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tqdm/_tqdm.py", line 897, in iter
for obj in iterable:
File "./infer.py", line 47, in sample_generator
image = cv2.resize(cv2.imread(image_file), image_size)
cv2.error: /io/opencv/modules/imgproc/src/resize.cpp:4044: error: (-215) ssize.width > 0 && ssize.height > 0 in function resize

someone can help me ? thank you very much !

Issues while using pdb

I am having trouble while introducing a breakpoint using:
"import pdb;pdb.set_trace()"
I wanted to know this code is not expected to run with breakpoints? Is there some other library which I need to use to run the code with breakpoints.

Below is the error I am getting:

-> for s in samples:
(Pdb)
Process Process-1:
Traceback (most recent call last):
File "/home/apargaon/anaconda2/envs/test/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/home/apargaon/anaconda2/envs/test/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/apargaon/ssd/ssd-tensorflow/training_data.py", line 118, in batch_producer
images, labels, gt_boxes = process_samples(samples)
File "/home/apargaon/ssd/ssd-tensorflow/training_data.py", line 96, in process_samples
for s in samples:
File "/home/apargaon/ssd/ssd-tensorflow/training_data.py", line 96, in process_samples
for s in samples:
File "/home/apargaon/anaconda2/envs/test/lib/python3.6/bdb.py", line 48, in trace_dispatch
return self.dispatch_line(frame)
File "/home/apargaon/anaconda2/envs/test/lib/python3.6/bdb.py", line 67, in dispatch_line
if self.quitting: raise BdbQuit
bdb.BdbQuit

/home/apargaon/ssd/ssd-tensorflow/training_data.py(96)process_samples()
-> for s in samples:
(Pdb)
Process Process-2:
Traceback (most recent call last):
File "/home/apargaon/anaconda2/envs/test/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/home/apargaon/anaconda2/envs/test/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/apargaon/ssd/ssd-tensorflow/training_data.py", line 118, in batch_producer
images, labels, gt_boxes = process_samples(samples)
File "/home/apargaon/ssd/ssd-tensorflow/training_data.py", line 96, in process_samples
for s in samples:
File "/home/apargaon/ssd/ssd-tensorflow/training_data.py", line 96, in process_samples
for s in samples:
File "/home/apargaon/anaconda2/envs/test/lib/python3.6/bdb.py", line 48, in trace_dispatch
return self.dispatch_line(frame)
File "/home/apargaon/anaconda2/envs/test/lib/python3.6/bdb.py", line 67, in dispatch_line
if self.quitting: raise BdbQuit
bdb.BdbQuit
/home/apargaon/ssd/ssd-tensorflow/training_data.py(96)process_samples()
-> for s in samples:
(Pdb)
Process Process-3:
Traceback (most recent call last):
File "/home/apargaon/anaconda2/envs/test/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/home/apargaon/anaconda2/envs/test/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/apargaon/ssd/ssd-tensorflow/training_data.py", line 118, in batch_producer
images, labels, gt_boxes = process_samples(samples)
File "/home/apargaon/ssd/ssd-tensorflow/training_data.py", line 96, in process_samples
for s in samples:
File "/home/apargaon/ssd/ssd-tensorflow/training_data.py", line 96, in process_samples
for s in samples:
File "/home/apargaon/anaconda2/envs/test/lib/python3.6/bdb.py", line 48, in trace_dispatch
return self.dispatch_line(frame)
File "/home/apargaon/anaconda2/envs/test/lib/python3.6/bdb.py", line 67, in dispatch_line
if self.quitting: raise BdbQuit
bdb.BdbQuit
/home/apargaon/ssd/ssd-tensorflow/training_data.py(96)process_samples()
-> for s in samples:
(Pdb)
Process Process-4:
Traceback (most recent call last):
File "/home/apargaon/anaconda2/envs/test/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/home/apargaon/anaconda2/envs/test/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/apargaon/ssd/ssd-tensorflow/training_data.py", line 118, in batch_producer
images, labels, gt_boxes = process_samples(samples)
File "/home/apargaon/ssd/ssd-tensorflow/training_data.py", line 96, in process_samples
for s in samples:
File "/home/apargaon/ssd/ssd-tensorflow/training_data.py", line 96, in process_samples
for s in samples:
File "/home/apargaon/anaconda2/envs/test/lib/python3.6/bdb.py", line 48, in trace_dispatch
return self.dispatch_line(frame)
File "/home/apargaon/anaconda2/envs/test/lib/python3.6/bdb.py", line 67, in dispatch_line
if self.quitting: raise BdbQuit
bdb.BdbQuit

why did you use 1000x1000 image size?

I guess It is just to normalize anchor box and sample image as same dimension. Am I right?

How about your hardware set?

Thank you for sharing your implemenation.

Could you share a little info about your hardware? When I run your code with GeForce GTX 1060 6GB, it always says "out of memory".

Output nodes

Hi, I am trying to use this code on my dataset and I am quite new to tensorflow. I have this doubt.
What are output nodes that are to be passed to graph_util.convert_variables_to_constants() ?

vgg512

EOFError: Ran out of input

Hello,

I'm getting an error as you can see in the log below. Can you tell me what is wrong?

[i] Epoch 1/10: 0%| | 0/3302 [00:00<?, ?batches/s]Traceback (most recent call last):
File "train.py", line 327, in
sys.exit(main())
File "train.py", line 241, in main
desc=description, unit='batches'):
File "C:\Users\vlad.tamas\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tqdm_tqdm.py", line 955, in iter
for obj in iterable:
File "D:\ssd-tensorflow-master\training_data.py", line 166, in gen_batch
w.start()
File "C:\Users\vlad.tamas\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Users\vlad.tamas\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\multiprocessing\context.py", line 212, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\vlad.tamas\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\multiprocessing\context.py", line 313, in _Popen
return Popen(process_obj)
File "C:\Users\vlad.tamas\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\multiprocessing\popen_spawn_win32.py", line 66, in init
reduction.dump(process_obj, to_child)
File "C:\Users\vlad.tamas\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\multiprocessing\reduction.py", line 59, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'TrainingData.__batch_generator..batch_producer'

Traceback (most recent call last):
File "", line 1, in
File "C:\Users\vlad.tamas\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\multiprocessing\spawn.py", line 106, in spawn_main
exitcode = _main(fd)
File "C:\Users\vlad.tamas\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\multiprocessing\spawn.py", line 116, in _main
self = pickle.load(from_parent)
EOFError: Ran out of input

Possible issue with Momentum Optimizer: Training is ~18x slower.

I was using the version with the Adam Optimizer for the longest time and finally tried out the latest version with Momentum. The batch times were usually around 1 second with the vgg300 preset and about 1.9 with vgg512 with the Adam version. I'm not sure what's causing this but training takes insanely long now (18.61s/batches).

I find it odd that its says "ran out of memory trying to allocate 3.92GiB" when it says I have 6.60GiB free on my GPU. I tried updating tensorflow-gpu to the latest version but I didn't fix the problem.

I'm on windows 10, CUDA 9 installed, using these parameters:
python train.py --name "model" --vgg-dir "path\to\dataset" --num-workers 0"

Am I missing something obvious?

These are some of the warning messages I get while training:

2018-06-16 10:39:03.483635: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2

2018-06-16 10:39:03.826152: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.898
pciBusID: 0000:01:00.0
totalMemory: 8.00GiB freeMemory: 6.60GiB

2018-06-16 10:39:03.830427: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0

2018-06-16 10:39:05.106763: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:

2018-06-16 10:39:05.108598: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929] 0

2018-06-16 10:39:05.109745: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0: N

2018-06-16 10:39:05.111337: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6379 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)

[i] Creating the model...

2018-06-16 10:39:13.629301: W T:\src\github\tensorflow\tensorflow\core\graph\graph_constructor.cc:1244] Importing a graph with a lower producer version 21 into an existing graph with producer version 26. Shape inference will have run different parts of the graph with different producer versions.
[i] Training...

[i] Train 1/200: 0%| | 0/263 [00:00<?, ?batches/s]

2018-06-16 10:40:01.832411: W T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.92GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

[i] Train 1/200: 2%|8 | 4/263 [01:13<1:20:20, 18.61s/batches]

./export_model.py error

Please help me, when I run ./export_model.py
usage: export_model.py [-h] [--metagraph-file METAGRAPH_FILE]
[--checkpoint-file CHECKPOINT_FILE]
[--output-file OUTPUT_FILE] --output-tensors
OUTPUT_TENSORS [OUTPUT_TENSORS ...]
export_model.py: error: the following arguments are required: --output-tensors
But when I run ./infer.py, test-output doesn't show anything。
when i run ./infer.py:
[i] Creating the model...
[i] Processing samples: 100%|██████████████████████████████████████████████████████████| 155/155 [00:54<00:00, 2.82batches/s]
[i] All done.
who can guide me? The specific training detection process, how to do your own data set. I will be very grateful.

Error using train.py

@ljanyst HI when i train the pascal dataset i get this errro after 45% training
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
[Finished in 10534.3s with exit code 1]
[shell_cmd: python -u "E:\sSD_tF\ssd-tensorflow-master\train.py"]
[dir: E:\sSD_tF\ssd-tensorflow-master]

i have not logged off /or disconnected my network pls help me out
Thanks in advance

How to evaluate and calculate mAP on VOC test 2007 by infer.py

Thanks for your great code to implement SSD, here are my questions:

how can I evaluate the model on VOC test 2007 and calculate mAP?

I can't compute the mAP as

ssd-tensorflow/infer.py

Line 147 in e9c1ee5

compute_stats = False

and in

ssd-tensorflow/infer.py

Line 149 in e9c1ee5

if args.data_source:

, you use args.data_source, however, there no args.data_source in the parser part.

even if I add the args.data_source in parser, it shows:
Traceback (most recent call last): File "infer.py", line 286, in <module> sys.exit(main()) File "infer.py", line 155, in main source.load_test_data(args.data_dir) File "/cephfs/group/teg-qboss-teg-qboss-ocr-shixi/jamiecai/workspace/detection/ssd-tensorflow-master_ljanyst/source_pascal_voc.py", line 196, in load_test_data annot = self.__build_annotation_list(root, 'test') File "/cephfs/group/teg-qboss-teg-qboss-ocr-shixi/jamiecai/workspace/detection/ssd-tensorflow-master_ljanyst/source_pascal_voc.py", line 81, in __build_annotation_list with open(root + '/ImageSets/Main/' + dataset_type + '.txt') as f: FileNotFoundError: [Errno 2] No such file or directory: 'pascal-voc/test/VOCdevkit/VOC2012/ImageSets/Main/test.txt'

would you please update infer.py?
Thank you very much :D

why the sacle was not change in compute the anchors?

First thanks you project ,it help me a lot.
when i read you code to compute the anchor,i find the 's = map_params.scale' ,but when compute the anchors, s=0.9 wasn't any change. i think the code should add 's=preset.maps[k].scale'

please tell why? thank you!

#---------------------------------------------------------------------------
# Compute the actual boxes for every scale and feature map
#---------------------------------------------------------------------------
anchors = []
for k in range(len(preset.maps)):
    fk = preset.maps[k].size[0]
    # **s=preset.maps[k].scale**
    for size in box_sizes[k]:
        for j in range(fk):
            y = (j+0.5)/float(fk)
            for i in range(fk):
                x = (i+0.5)/float(fk)
                box = Anchor(Point(x, y), Size(size[0], size[1]),
                             i, j, s, k)
                anchors.append(box)`

Training on custom dataset

@ljanyst HI i am new to tensorflow and ssd , thanks for the code , having few doubts
When training for custom dataset should i download the Vgg_graph file ?
What is the use of training_data.py file ?
Thanks in advance

Question regarding layers

I was going through the ssdvgg.py file and I'm a little confused as to where the intialize vgg layers are defined. I just see layers listed (layers = ['conv1_1', ..) and then l2 loss for them but I dont see them defined anywhere. Thanks in advance!

terminate called after throwing an instance of 'std::bad_alloc'

Hello! Thank you for your code. Currently I don't get access to GPU so I'm trying your code on my laptop. I encountered this problem. Do you have any idea on what the reason is?

Thank you so much!

$./train.py
[i] Project name: test
[i] Data directory: pascal-voc
[i] VGG directory: vgg_graph
[i] # epochs: 10
[i] Batch size: 8
[i] Tensorboard directory: tb
[i] Checkpoint interval: 5
[i] Learning rate: 0.001
[i] Learning rate decay: 0.97
[i] Optimizer epsilon: 0.1
[i] Weight decay: 0.0005
[i] Continue: False
[i] Number of workers: 4
[i] Creating directory test...
[i] Starting at epoch: 1
[i] Configuring the training data...
[i] # training samples: 26411
[i] # validation samples: 677
[i] # classes: 20
[i] Image size: Size(w=300, h=300)
[i] Creating the model...
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted

pos

error in source_pascal_voc.py

I am trying to output test statictics of the entire VOC test data

in function load_test_data in file source_pascal_voc.py
there is a line:

root = data_dir + '/test/VOCdevkit/VOC2012'

however the VOC 2012 has no test data but the 2007 data has. Could it be an error and you intended it to be

root = data_dir + '/test/VOCdevkit/VOC2007'

changing the line works for me

"Confidence loss is NaN" message is printed

Hello,
I know you modified source code through your blog recently.
So, I tried to use updated source.
But, there is a problem like below when training.

[i] Training...
[i] Train 1/200: 3%|▌ | 28/853 [00:17<08:22, 1.64batches/s][!] Confidence loss is NaN.
[i] Train 1/200: 3%|▋ | 29/853 [00:17<08:11, 1.68batches/s][!] Confidence loss is NaN.
[i] Train 1/200: 4%|▋ | 30/853 [00:17<08:00, 1.71batches/s][!] Confidence loss is NaN.
[i] Train 1/200: 4%|▋ | 31/853 [00:17<07:48, 1.75batches/s][!] Confidence loss is NaN.
[i] Train 1/200: 4%|▋ | 32/853 [00:17<07:39, 1.79batches/s][!] Confidence loss is NaN.
[i] Train 1/200: 4%|▋ | 33/853 [00:18<07:30, 1.82batches/s][!] Confidence loss is NaN.
[i] Train 1/200: 4%|▊ | 34/853 [00:18<07:20, 1.86batches/s][!] Confidence loss is NaN.
[i] Train 1/200: 4%|▊ | 35/853 [00:18<07:11, 1.90batches/s][!] Confidence loss is NaN.
[i] Train 1/200: 4%|▊ | 36/853 [00:18<07:03, 1.93batches/s][!] Confidence loss is NaN.
[i] Train 1/200: 4%|▊ | 37/853 [00:18<06:55, 1.97batches/s][!] Confidence loss is NaN.
[i] Train 1/200: 4%|▊ | 38/853 [00:18<06:47, 2.00batches/s][!] Confidence loss is NaN.
[i] Train 1/200: 5%|▊ | 39/853 [00:19<06:40, 2.03batches/s][!] Confidence loss is NaN.
[i] Train 1/200: 5%|▉ | 40/853 [00:19<06:33, 2.07batches/s][!] Confidence loss is NaN.
[i] Train 1/200: 5%|▉ | 41/853 [00:19<06:26, 2.10batches/s][!] Confidence loss is NaN.
[i] Train 1/200: 5%|▉ | 42/853 [00:19<06:20, 2.13batches/s][!] Confidence loss is NaN.

I am using tensorflow 1.8, cuda9-0, cudnn7.

So.. do you know how can I fix it?

Thank you

[Q] How to use the model to test some images?

As stated in the title, could you provide me steps to test the model after training is finished? So the output is a window with box(s) telling the predicted class. I am completely new to this area.

Custom dataset

Changes to be done in order to try it on Kitti object detection dataset?

How to train without learned model?

Hi, I'am reading your SSD code. It is really good for me. Thank you.

I hope to train the SSD without learned model to see decreasing losses.
But, there are no parameters or boolean variables for switching train mode.

Is there any simple way to train without any model?

Thank you.

testing a trained network

@ljanyst Hi, i have trained the model on pascal voc. And also i'm trying to test using single image with infer.py but i'm not able to see the image with bounding boxes.
i'm getting like this:

[i] Project name: test
[i] Training data: pascal-voc/training-data.pkl
[i] Batch size: 32
[i] Data source: None
[i] Data directory: pascal-voc
[i] Output directory: test-output
[i] Annotate: False
[i] Dump predictions: False
[i] Sample: test
[i] Threshold: 0.5
[i] Pascal summary: False
[!] No files specified

I am new to tensorflow.. can you tell me where i am wrong or how to test it??

I can not download the vgg.zip.Can you or someone provide a file?Thank you very much

mAP on VOC2007 test.

Hi,
Thanks for sharing your codes.
I modified source_voc_pascal.py to train VOC2007 + VOC2012 trainval.
And after training, I evaulated on VOC2007 test but the result is far below compared to the original paper's..

May I have any ideas?

AttributeError: 'list' object has no attribute 'label'

First of all, thank you for making this available on Github.
I'm fairly new to Machine Learning, and after a month or so of self study and work, I'm beat on this issue.
When I run training.py, it gives me this error,

Traceback (most recent call last):
File "training.py", line 464, in
sys.exit(main())
File "training.py", line 433, in main
APs = training_ap_calc.compute_aps()
File "average_precision.py", line 78, in compute_aps
counts[box.label] += 1
AttributeError: 'list' object has no attribute 'label'

Thanks!

What is 'topological sort failed'?

Thanks for your wonderfull code to implement SSD, here are my questions:

When i run train.py training, it outputs the warning :

2019-04-17 10:05:26.145105: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.

When I check my GPU via the nvidia-smi command, my GPU doesn't seem to be working very hard. GPU-Util value are changing a lot (0% ~ 90%) I think if there's no problem, GPU-Util values should be held high(90%~100%)

$ nvidia-smi
Wed Apr 17 10:22:21 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56       Driver Version: 418.56       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN Xp            Off  | 00000000:01:00.0  On |                  N/A |
| 38%   61C    P2    78W / 250W |  11910MiB / 12192MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1037      G   /usr/lib/xorg/Xorg                           477MiB |
|    0      1915      G   compiz                                       248MiB |
|    0      2235      G   ...-token=144ABE95F58132602BEFD0938D570E4B    45MiB |
|    0     21058      G   ...quest-channel-token=8328652029201461942    63MiB |
|    0     21701      G   /proc/self/exe                                44MiB |
|    0     25361      G   ...-token=5B2A04BC2FBD002902D012155741B9F2    62MiB |
|    0     26180      C   python3                                    10961MiB |
+-----------------------------------------------------------------------------+

However, my RAM is pretty tough and my ubuntu desktop is very slow. (even i added swap memory)

$ free -h
               total        used        free      shared  buff/cache   available
Mem:            15G         13G        218M        1.1G        2.3G        1.0G
Swap:           19G         72M         19G

Although these problem still not addressed, I spend 40 hours for 180 epochs. and
training mAP 0.57
validation mAP 0.49

What could likely cause this problem(topological sort failed) and how to avoid it?

Thank you very much!

This is sample of my terminal.

[i] Project name:          test
[i] Data directory:        pascal-voc
[i] VGG directory:         vgg_graph
[i] # epochs:              200
[i] Batch size:            8
[i] Tensorboard directory: tb
[i] Checkpoint interval:   5
[i] Learning rate values:  0.00075;0.0001;0.00001
[i] Learning rate boundaries:  320000;400000
[i] Momentum:              0.9
[i] Weight decay:          0.0005
[i] Continue:              True
[i] Number of workers:     12
[i] Starting at epoch:     181
[i] Configuring the training data...
[i] # training samples:    21503
[i] # validation samples:  5585
[i] # classes:             20
[i] Image size:            Size(w=300, h=300)
2019-04-17 10:04:42.046026: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-04-17 10:04:42.349121: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-04-17 10:04:42.349865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:01:00.0
totalMemory: 11.91GiB freeMemory: 10.95GiB
2019-04-17 10:04:42.349887: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-04-17 10:04:48.780566: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-17 10:04:48.780596: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-04-17 10:04:48.780603: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-04-17 10:04:48.789158: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10589 MB memory) -> physical GPU (device: 0, name: TITAN Xp, pci bus id: 0000:01:00.0, compute capability: 6.1)
[i] Creating the model...
[i] Training...
[i] Train 181/200:   0%|                                                                   | 0/2688 [00:00<?, ?batches/s]2019-04-17 10:05:26.145105: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2019-04-17 10:05:26.167850: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.
2019-04-17 10:05:26.291892: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2019-04-17 10:05:26.303694: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.

Confidence loss is NaN?

Hi,

It returns an error:

'Train 1/200: 19%'
'Confidence loss is NaN'

And

'...tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: filter_summaries/conv1_1'

Any helps?

AttributeError: Can't pickle local object 'TrainingData.__batch_generator.<locals>.batch_producer'

Hi, I am getting following error. When i debugged I found that error is coming at "w.start()" in training_data.py. Please let me know what went wrong.
Full error:

AttributeError: Can't pickle local object 'TrainingData.__batch_generator..batch_producer'

d:\tsr\ssd-tensorflow-master\training_data.py(175)gen_batch()
-> w.start()
(Pdb) C:\Users\VNP2KOR\AppData\Local\conda\conda\envs\tf_gpu\lib\site-packages\h5py_init_.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\VNP2KOR\AppData\Local\conda\conda\envs\tf_gpu\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\VNP2KOR\AppData\Local\conda\conda\envs\tf_gpu\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

extra_scale is defined wrong

hi, thanks for your excellent code.

Here are some tiny mistakes when setting extra_scale. Your parameters are 100 times more than the correct parameters.

For SSD_PRESET['vgg300'], extra_scale should be 1.075
For SSD_PRESET['vgg512'], extra_scale should be 1.05

by setting 100 times less than your original number, the anchors would stay inside the original image.

exporting model

I am trying to export the model. I see that I need to give a list of output nodes. I am not completely sure as to what to pass for that argument. some one help please !

Error when training: "OpenCV Error: Assertion failed ((scn == 3 || scn == 4) && (depth == 0 || depth == 5)) in cvtColor, file /tmp/build/80754af9/opencv_1525313247723/work/modules/imgproc/src/color.cpp, line 11115 Process Process-34:"

Hi,
I just use train.py but when start training I get the error :
"[i] Train 1/200: 0%| | 0/2688 [00:00<?, ?batches/s]OpenCV Error: Assertion failed ((scn == 3 || scn == 4) && (depth == 0 || depth == 5)) in cvtColor, file /tmp/build/80754af9/opencv_1525313247723/work/modules/imgproc/src/color.cpp, line 11115
Process Process-34:
Traceback (most recent call last):
File "/home/hientt/anaconda3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/hientt/anaconda3/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/hientt/ssd-tensorflow/training_data.py", line 119, in batch_producer
images, labels, gt_boxes = process_samples(samples)
File "/home/hientt/ssd-tensorflow/training_data.py", line 95, in process_samples
image, label, gt = run_transforms(s)
File "/home/hientt/ssd-tensorflow/training_data.py", line 83, in run_transforms
args = t(*args)
File "/home/hientt/ssd-tensorflow/transforms.py", line 159, in call
return self.transforms[pick](data, label, gt)
File "/home/hientt/ssd-tensorflow/transforms.py", line 148, in call
args = t(*args)
File "/home/hientt/ssd-tensorflow/transforms.py", line 136, in call
return self.transform(data, label, gt)
File "/home/hientt/ssd-tensorflow/transforms.py", line 215, in call
data = cv2.cvtColor(data, cv2.COLOR_BGR2HSV)
cv2.error: /tmp/build/80754af9/opencv_1525313247723/work/modules/imgproc/src/color.cpp:11115: error: (-215) (scn == 3 || scn == 4) && (depth == 0 || depth == 5) in function cvtColor

Process Process-35:
Traceback (most recent call last):
File "/home/hientt/anaconda3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/hientt/anaconda3/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/hientt/ssd-tensorflow/training_data.py", line 119, in batch_producer
images, labels, gt_boxes = process_samples(samples)
File "/home/hientt/ssd-tensorflow/training_data.py", line 95, in process_samples
image, label, gt = run_transforms(s)
File "/home/hientt/ssd-tensorflow/training_data.py", line 83, in run_transforms
args = t(*args)
File "/home/hientt/ssd-tensorflow/transforms.py", line 136, in call
return self.transform(data, label, gt)
File "/home/hientt/ssd-tensorflow/transforms.py", line 168, in call
data = data.astype(np.float32)
AttributeError: 'NoneType' object has no attribute 'astype'
OpenCV Error: Assertion failed ((scn == 3 || scn == 4) && (depth == 0 || depth == 5)) in cvtColor, file /tmp/build/80754af9/opencv_1525313247723/work/modules/imgproc/src/color.cpp, line 11115
Process Process-36:
OpenCV Error: Assertion failed ((scn == 3 || scn == 4) && (depth == 0 || depth == 5)) in cvtColor, file /tmp/build/80754af9/opencv_1525313247723/work/modules/imgproc/src/color.cpp, line 11115
OpenCV Error: Assertion failed ((scn == 3 || scn == 4) && (depth == 0 || depth == 5)) in cvtColor, file /tmp/build/80754af9/opencv_1525313247723/work/modules/imgproc/src/color.cpp, line 11115
Process Process-37:
Process Process-38:
Process Process-39:
Traceback (most recent call last):
File "/home/hientt/anaconda3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/hientt/anaconda3/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/hientt/ssd-tensorflow/training_data.py", line 119, in batch_producer
images, labels, gt_boxes = process_samples(samples)
File "/home/hientt/ssd-tensorflow/training_data.py", line 95, in process_samples
image, label, gt = run_transforms(s)
File "/home/hientt/ssd-tensorflow/training_data.py", line 83, in run_transforms
args = t(*args)
File "/home/hientt/ssd-tensorflow/transforms.py", line 159, in call
return self.transforms[pick](data, label, gt)
File "/home/hientt/ssd-tensorflow/transforms.py", line 148, in call
args = t(*args)
File "/home/hientt/ssd-tensorflow/transforms.py", line 136, in call
return self.transform(data, label, gt)
File "/home/hientt/ssd-tensorflow/transforms.py", line 215, in call
data = cv2.cvtColor(data, cv2.COLOR_BGR2HSV)
cv2.error: /tmp/build/80754af9/opencv_1525313247723/work/modules/imgproc/src/color.cpp:11115: error: (-215) (scn == 3 || scn == 4) && (depth == 0 || depth == 5) in function cvtColor

Process Process-40:
Traceback (most recent call last):
Traceback (most recent call last):
Process Process-41:
File "/home/hientt/anaconda3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/hientt/anaconda3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/hientt/anaconda3/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/hientt/anaconda3/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/hientt/ssd-tensorflow/training_data.py", line 119, in batch_producer
images, labels, gt_boxes = process_samples(samples)
File "/home/hientt/ssd-tensorflow/training_data.py", line 119, in batch_producer
images, labels, gt_boxes = process_samples(samples)
Traceback (most recent call last):
File "/home/hientt/ssd-tensorflow/training_data.py", line 95, in process_samples
image, label, gt = run_transforms(s)
File "/home/hientt/ssd-tensorflow/training_data.py", line 95, in process_samples
image, label, gt = run_transforms(s)
File "/home/hientt/ssd-tensorflow/training_data.py", line 83, in run_transforms
args = t(*args)
File "/home/hientt/ssd-tensorflow/training_data.py", line 83, in run_transforms
args = t(*args)
File "/home/hientt/ssd-tensorflow/transforms.py", line 136, in call
return self.transform(data, label, gt)
File "/home/hientt/ssd-tensorflow/transforms.py", line 159, in call
return self.transforms[pick](data, label, gt)
File "/home/hientt/ssd-tensorflow/transforms.py", line 148, in call
args = t(*args)
File "/home/hientt/ssd-tensorflow/transforms.py", line 168, in call
data = data.astype(np.float32)
File "/home/hientt/ssd-tensorflow/transforms.py", line 136, in call
return self.transform(data, label, gt)
File "/home/hientt/ssd-tensorflow/transforms.py", line 215, in call
data = cv2.cvtColor(data, cv2.COLOR_BGR2HSV)
AttributeError: 'NoneType' object has no attribute 'astype'
cv2.error: /tmp/build/80754af9/opencv_1525313247723/work/modules/imgproc/src/color.cpp:11115: error: (-215) (scn == 3 || scn == 4) && (depth == 0 || depth == 5) in function cvtColor

File "/home/hientt/anaconda3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/hientt/anaconda3/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/hientt/ssd-tensorflow/training_data.py", line 119, in batch_producer
images, labels, gt_boxes = process_samples(samples)
File "/home/hientt/ssd-tensorflow/training_data.py", line 95, in process_samples
image, label, gt = run_transforms(s)
File "/home/hientt/ssd-tensorflow/training_data.py", line 83, in run_transforms
args = t(*args)
File "/home/hientt/ssd-tensorflow/transforms.py", line 159, in call
return self.transforms[pick](data, label, gt)
File "/home/hientt/ssd-tensorflow/transforms.py", line 148, in call
args = t(*args)
File "/home/hientt/ssd-tensorflow/transforms.py", line 136, in call
return self.transform(data, label, gt)
File "/home/hientt/ssd-tensorflow/transforms.py", line 215, in call
data = cv2.cvtColor(data, cv2.COLOR_BGR2HSV)
cv2.error: /tmp/build/80754af9/opencv_1525313247723/work/modules/imgproc/src/color.cpp:11115: error: (-215) (scn == 3 || scn == 4) && (depth == 0 || depth == 5) in function cvtColor

Traceback (most recent call last):
File "/home/hientt/anaconda3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/hientt/anaconda3/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/hientt/ssd-tensorflow/training_data.py", line 119, in batch_producer
images, labels, gt_boxes = process_samples(samples)
File "/home/hientt/ssd-tensorflow/training_data.py", line 95, in process_samples
image, label, gt = run_transforms(s)
File "/home/hientt/ssd-tensorflow/training_data.py", line 83, in run_transforms
args = t(*args)
File "/home/hientt/ssd-tensorflow/transforms.py", line 159, in call
return self.transforms[pick](data, label, gt)
File "/home/hientt/ssd-tensorflow/transforms.py", line 148, in call
args = t(*args)
File "/home/hientt/ssd-tensorflow/transforms.py", line 136, in call
return self.transform(data, label, gt)
File "/home/hientt/ssd-tensorflow/transforms.py", line 183, in call
data = data.astype(np.float32)
AttributeError: 'NoneType' object has no attribute 'astype'
"
Can anyone help me solve this !
Thanks,

problem occurs during the train.py

Hi, I followed the introduction and when I run the train.py, it just stopped during create the model, and showed the ‘connetctionreseterror [winerror:10054] An existing connection was forcibly closed by the remote host’.

Can someone help me to solve this problem please. @ljanyst

personally problem of understanding the code

Hi, @ljanyst
I'm learning your code,thank you for you perfect job.
the code of line 337 in ssdvgg.py :
X = l2_normalization (self.vgg_conv4_3, 20, 512,'l2_norm_conv4_3')
why to set initial_scale 20 ?

A problem about save .pb

when we export the model to an inference optimize graph run (use result/result as the name of the output tensor) in . /export_model.py.We find a question as :
Traceback (most recent call last):
File "E:/defPerson-ssd-tf/export_model.py", line 42, in
help='names of the output tensors')
File "D:\Anaconda\envs\py3.5\lib\argparse.py", line 1342, in add_argumen
raise ValueError("length of metavar tuple does not match nargs")
ValueError: length of metavar tuple does not match nargs

what happend we want to find,please!

Can't pickle local object 'TrainingData.__batch_generator.<locals>.batch_producer'

runfile('G:/ssd-tensorflow/train.py', wdir='G:/ssd-tensorflow')
[i] Project name: test
[i] Data directory: pascal-voc
[i] VGG directory: vgg_graph
[i] # epochs: 200
[i] Batch size: 8
[i] Tensorboard directory: tb
[i] Checkpoint interval: 5
[i] Learning rate values: 0.001;0.0001;0.00001
[i] Learning rate boundaries: 320000;400000
[i] Momentum: 0.9
[i] Weight decay: 0.0005
[i] Continue: False
[i] Number of workers: 8
[i] Creating directory test...
[i] Starting at epoch: 1
[i] Configuring the training data...
[i] # training samples: 21503
[i] # validation samples: 17125
[i] # classes: 20
[i] Image size: Size(w=300, h=300)
[i] Creating the model...
INFO:tensorflow:Restoring parameters from b'vgg_graph/vgg\variables\variables'
WARNING:tensorflow:From C:\ProgramData\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
[i] Training...
[i] Train 1/200: 0%| | 0/2688 [00:00<?, ?batches/s]Traceback (most recent call last):

File "", line 1, in
runfile('G:/ssd-tensorflow/train.py', wdir='G:/ssd-tensorflow')

File "C:\ProgramData\Anaconda\envs\tensorflow\lib\site-packages\spyder\utils\site\sitecustomize.py", line 710, in runfile
execfile(filename, namespace)

File "C:\ProgramData\Anaconda\envs\tensorflow\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "G:/ssd-tensorflow/train.py", line 344, in
sys.exit(main())

File "G:/ssd-tensorflow/train.py", line 253, in main
desc=description, unit='batches'):

File "C:\ProgramData\Anaconda\envs\tensorflow\lib\site-packages\tqdm_tqdm.py", line 927, in iter
for obj in iterable:

File "G:\ssd-tensorflow\training_data.py", line 172, in gen_batch
w.start()

File "C:\ProgramData\Anaconda\envs\tensorflow\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)

File "C:\ProgramData\Anaconda\envs\tensorflow\lib\multiprocessing\context.py", line 212, in _Popen
return _default_context.get_context().Process._Popen(process_obj)

File "C:\ProgramData\Anaconda\envs\tensorflow\lib\multiprocessing\context.py", line 313, in _Popen
return Popen(process_obj)

File "C:\ProgramData\Anaconda\envs\tensorflow\lib\multiprocessing\popen_spawn_win32.py", line 66, in init
reduction.dump(process_obj, to_child)

File "C:\ProgramData\Anaconda\envs\tensorflow\lib\multiprocessing\reduction.py", line 59, in dump
ForkingPickler(file, protocol).dump(obj)

AttributeError: Can't pickle local object 'TrainingData.__batch_generator..batch_producer'

I train this model in windows10, but after create the model, it says "can't pickle", it's the problem of vision?

urlopen error [Errno 110] Connection timed out>

helo, when i run train.py I always this erro :

urllib.error.URLError: <urlopen error [Errno 110] Connection timed out>

could you tell me how to solve it?thank you very much!

How to interpret the TensorBoard?

Thank you for the great implementation!

And my question is what are the intuitions behind the "Distribution" and "Histograms" in the TensorBoard?

Thank you!

How to use the trained model to test in a picture and show the detection results

mod_conv6/filter doesn't exist in graph

Hi,
I modified you SSD net to implement one of the face detection papers. The mod_conv6 and mod_conv7 are not present in the graph. The function __build_vgg_mods is executing, but the layers are not getting added to the graph. Even the later layers are getting added (i.e conv8_1 etc)
NOTE

my system supports tensorflow-1.4.1 due to cuda 8, I made the necessary changes.
The fc layers (to which the weight is assigned ) is in the graph. Should I use that?

Training on VOC dataset

@ljanyst Hi i followed the steps in your readme file to train on voc dataset .

Ran process_dataset.py which gave a output of .pkl file in the pascal-voc folder
Ran train.py which created save_model.pb in the vgg_graph dir and vgg-graph.zip
3.Ran detect.py to test the but its not giving any error or ouput

have followed the right steps can you pls elaborate on the procedure
Thanks in advance

Training on custom dataset

Thanks for your work. I have to train ssdvgg for custom dataset. I have 5 classes(person, chair, desk, computer and door). I will increase the classes but for now thats all. How can I train on custom dataset? I have Images and Annotations. When I run process_dataset it outputs training-data.pkl, trainval-samples.pkl and valid-samples.pkl. But when I run train.py, after it downloads pretrained model it prints parameters and I am getting following error:

[i] Project name: test
[i] Data directory: pascal-voc
[i] VGG directory: vgg_graph
[i] # epochs: 200
[i] Batch size: 8
[i] Tensorboard directory: tb
[i] Checkpoint interval: 5
[i] Learning rate: 0.0001
[i] Learning rate decay: 0.97
[i] Optimizer epsilon: 0.1
[i] Weight decay: 0.0005
[i] Continue: False
[i] Number of workers: 8
Backend TkAgg is interactive backend. Turning interactive mode on.
[i] Creating directory test...
[i] Starting at epoch: 1
[i] Configuring the training data...
[i] # training samples: 6825
[i] # validation samples: 175
[i] # classes: 5
[i] Image size: Size(w=300, h=300)
2018-01-15 23:04:31.870082: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
[i] Creating the model...
[i] Training...
[i] Epoch 1/200: 0%| | 0/854 [00:00<?, ?batches/s]Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.2.3\helpers\pydev\pydevd.py", line 1599, in
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.2.3\helpers\pydev\pydevd.py", line 1026, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.2.3\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/Burak/Desktop/ssd-tensorflow-master/train.py", line 327, in
sys.exit(main())
File "C:/Users/Burak/Desktop/ssd-tensorflow-master/train.py", line 241, in main
desc=description, unit='batches'):
File "C:\Users\Burak\AppData\Local\Programs\Python\Python35\lib\site-packages\tqdm_tqdm.py", line 949, in iter
for obj in iterable:
File "C:/Users/Burak/Desktop/ssd-tensorflow-master\training_data.py", line 166, in gen_batch
w.start()
File "C:\Users\Burak\AppData\Local\Programs\Python\Python35\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Users\Burak\AppData\Local\Programs\Python\Python35\lib\multiprocessing\context.py", line 212, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\Burak\AppData\Local\Programs\Python\Python35\lib\multiprocessing\context.py", line 313, in _Popen
return Popen(process_obj)
File "C:\Users\Burak\AppData\Local\Programs\Python\Python35\lib\multiprocessing\popen_spawn_win32.py", line 66, in init
reduction.dump(process_obj, to_child)
File "C:\Users\Burak\AppData\Local\Programs\Python\Python35\lib\multiprocessing\reduction.py", line 59, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'TrainingData.__batch_generator..batch_producer'

Process finished with exit code 1

export_model.py gives 'utf-8' codec can't decode byte 0xe6 in position 1

Thanks for sharing the code. While I tried to run export_model.py it gives error that 'utf-8' codec can't decode byte 0xe6 in position 1. Also the file is not opening in gedit, shows Unexpected error: Invalid byte sequence in conversion input. Can you please help?

Update: It's working now. The file location was not correct. Thanks.

How to trian this model on COCO dataset?

Thanks for your sharing, could you please tell me how to trian this SSD model on COCO? How can I change the source_pascal_voc.py?

values of label_defs in source_pascal_voc.py

Hello,
Really thank you for sharing your source code.
I am trying to use your code with custom dataset.

How can I assign the values of "label_defs" in "source_pascal_voc.py" to my custom dataset classes?

Thank in advance.