dmlc / gluon-cv Goto Github PK

Gluon CV Toolkit

License: Apache License 2.0

Makefile 0.01% Python 86.32% CMake 0.04% C++ 3.78% Shell 0.10% Jupyter Notebook 9.69% Cython 0.07%

action-recognition computer-vision deep-learning gan gluon image-classification machine-learning mxnet neural-network object-detection person-reid pose-estimation semantic-segmentation

gluon-cv's Introduction

Distributed Machine Learning Common Codebase

DMLC-Core is the backbone library to support all DMLC projects, offers the bricks to build efficient and scalable distributed machine learning libraries.

Developer Channel

What's New

Note on Parameter Module for Machine Learning

Known Issues

RecordIO format is not portable across different processor endians. So it is not possible to save RecordIO file on a x86 machine and then load it on a SPARC machine, because x86 is little endian while SPARC is big endian.

Contributing

Contributing to dmlc-core is welcomed! dmlc-core follows google's C style guide. If you are interested in contributing, take a look at feature wishlist and open a new issue if you like to add something.

DMLC-Core uses C++11 standard. Ensure that your C++ compiler supports C++11.
Try to introduce minimum dependency when possible

CheckList before submit code

Type make lint and fix all the style problems.
Type make doc and fix all the warnings.

NOTE

deps:

libcurl4-openssl-dev

gluon-cv's People

Contributors

Stargazers

Watchers

Forkers

zhreshold astonzhang wzhang1 kyocen cvrocker cupwater baifengbai ml-lab lewiszhao hzhang57 hncz003 cclauss labimage jithinraj caoaries fendaq zhanghang1989 iwanggp shaunstanislauslau zzmjohn liyuanyaun szha gpsbird watkyns jing-luo dmortem winnerineast aust-hansen lizhi3158 strongwolf skair39 edgeowner myfortune110 hesitationer george86028 pengboxiangshang jason8kang wh-forker zfxu wkcn ai3dvision zssasa leedorado yuewu001 abc3436645 hetong007 ericyao2013 thomasdelteil benzei ibnnafis007 fanofjava newenglandml vic4key ocabrisses hanman-aws tianweiy donnyyou wisvison ijkguo eric-haibin-lin mengqhui ixhorse difficultly-name ascenoputing detectdimples joeyteng ankkhedia zhuohuiyuan walterma ishitori aiaini66 grandesty-ml anzhao0503 jangocheng ellinier ifeherva wfus allonbrooks stonegiggity piiswrong leezu tangtangchx abhinavs95 liben2018 davisliang dyjng weiniuzhu shubhamgoel27 shady-cs15 ragavvenkatesan softwaregift moodmax iscas007 stjordanis noahliot karlind husonchen smilejx hefv57 sufeidechabei

gluon-cv's Issues

Converting from numpy to ndarray slowdown...

Hi, I'm capturing images from a webcam continuously (using cv2), transforming them into NDArrays and reshaping them to evaluate a mobilenet ssd network (from gluon model zoo), however, the time to process each frame varies considerably and it takes up to 2 seconds (For the record, my code runs in an armv7hf platform, <0.5s is what I would expect). I suppose this happens because of memory allocation/fragmentation issues. I'm also using nnpack...

Any idea what can be done to avoid this issue? Thanks,

Code and an example of the output below (capture of frame outside of loop to take it out of the equation...). It seems that the slow operation is the call to mx.nd.array(frame), however the slowdown won't happen if I remove the call to net(x)...

CODE:

net = model_zoo.get_model('ssd_512_mobilenet1_0_voc', pretrained=True)
cam = cv2.VideoCapture(args.src)
ret, frame = cam.read()

while True:  # fps._numFrames < 120

    t1 = time.time()
    q=mx.nd.array(frame).as_in_context(mx.cpu(0))
    t2 = time.time()
    q=gdata.transforms.image.imresize(q,512,512)
    x=q.reshape(1,3,512,512)
    t3 = time.time()
    class_IDs, scores, bounding_boxs = net(x)
    t4 = time.time()

    print('[INFO] Conversion to NDArray: {:.2f}'.format(t2-t1))
    print('[INFO] Resize and reshape: {:.2f}'.format(t3-t2))
    print('[INFO] Forward pass: {:.2f}'.format(t4-t3))
    print('[INFO] Total time elapsed: {:.2f}'.format(t4-t1))

Output:

[INFO] Conversion to NDArray: 0.03
[INFO] Resize and reshape: 0.00
[INFO] Forward pass: 0.17
[INFO] Total time elapsed: 0.21

[INFO] Conversion to NDArray: 2.06
[INFO] Resize and reshape: 0.00
[INFO] Forward pass: 0.13
[INFO] Total time elapsed: 2.19

[INFO] Conversion to NDArray: 0.84
[INFO] Resize and reshape: 0.00
[INFO] Forward pass: 0.14
[INFO] Total time elapsed: 0.98

[INFO] Conversion to NDArray: 0.63
[INFO] Resize and reshape: 0.00
[INFO] Forward pass: 0.11
[INFO] Total time elapsed: 0.73

[INFO] Conversion to NDArray: 0.57
[INFO] Resize and reshape: 0.00
[INFO] Forward pass: 0.12
[INFO] Total time elapsed: 0.69

[INFO] Conversion to NDArray: 0.25
[INFO] Resize and reshape: 0.00
[INFO] Forward pass: 0.16
[INFO] Total time elapsed: 0.41

[INFO] Conversion to NDArray: 1.98
[INFO] Resize and reshape: 0.00
[INFO] Forward pass: 0.21
[INFO] Total time elapsed: 2.19

[INFO] Conversion to NDArray: 0.81
[INFO] Resize and reshape: 0.00
[INFO] Forward pass: 0.20
[INFO] Total time elapsed: 1.01

[INFO] Conversion to NDArray: 0.50
[INFO] Resize and reshape: 0.00
[INFO] Forward pass: 0.15
[INFO] Total time elapsed: 0.65

[INFO] Conversion to NDArray: 0.69
[INFO] Resize and reshape: 0.00
[INFO] Forward pass: 0.10
[INFO] Total time elapsed: 0.79

[INFO] Conversion to NDArray: 0.06
[INFO] Resize and reshape: 0.00
[INFO] Forward pass: 0.17
[INFO] Total time elapsed: 0.24

Data Parallel Fails with Single GPU for Evaluate in Segmentation Training Script

When evaluating in segmentation training script train.py, I get the following error:

Traceback (most recent call last):
  File "train.py", line 192, in <module>
    trainer.validation(args.start_epoch)
  File "train.py", line 162, in validation
    for (correct, labeled, inter, union) in outputs:
TypeError: 'numpy.float32' object is not iterable

I think the reason is the following line

gluon-cv/gluoncv/model_zoo/segbase.py

Line 82 in c9d5b79

return correct, labeled, inter, union

the above line return a tuple, and you make a conversion at the following line

gluon-cv/gluoncv/utils/parallel.py

Line 53 in c9d5b79

return tuple(self.module(*inputs[0], **kwargs[0]))

I think it makes no change for the returned result.

Related Issue: #95

Some comments

Here are some highlevel comments about the documents. Please consider address at least some easy ones before the first release.

General comments

avoid use mxnet, gluon and gluonvision. It makes too many red blocks in docs. Use MXNet, Gluon and GluonVision as words. Similar for other terms such as CIFAR10, ImageNet, because they are not variables.

Homepage

needs an empty line after "GluonVision features: "
installation. shall we let users install mxnet first? maybe remove the instruction that installing from source codes?

Getting Started with Pre-trained Model on CIFAR10

replace Gluon Model Zoo with Gluon Model Zoo
replace img = image.imdecode(f.read()) with img = image.imread(f)

Getting Started with Pre-trained Models on ImageNet

have a simliar tutorial as cifar10?

Transfer Learning with Your Own Image Dataset

why the accuracy is so slow?

Train Your Own Model on ImageNet

draw the image with grid, so reader can read the numbers

Object Detection

dive deep into the thrid one.
why image classifcation have numbers, why here we don't. better to make it consistent.

Model zoo

the cifar10 models take too many space. it makes reader easy ignore the imagenet models. suggest to remove some cifar10 models, and add at least some popular imagenet models to this page.

APIs

I feel we lack of code examples for how to use these APIs. Consider to have at least one code example for each category. And we should add more later.

gluonvision.data

needs to fix the TODOs
since we have introduced the usage of the dataset api on the "prepare dataset" sections, explicitly say that refer to it for usage examples

AttributeError: 'module' object has no attribute 'SumSquare'

When I tried to use sync Batchnorm (only one gpu beacuse I failed to use multi-gpu by DataParallelCriterion ), I failed because it tells me "AttributeError: 'module' object has no attribute 'SumSquare'"

  File "/data1/zyx/yks/sources/gluon-cv/gluoncv/utils/parallel.py", line 53, in __call__
  File "/data1/zyx/yks/sources/incubator-mxnet/python/mxnet/gluon/block.py", line 414, in __call__
    return self.forward(*args)
  File "/data1/zyx/yks/sources/incubator-mxnet/python/mxnet/gluon/block.py", line 620, in forward
    return self._call_cached_op(x, *args)
  File "/data1/zyx/yks/sources/incubator-mxnet/python/mxnet/gluon/block.py", line 525, in _call_cached_op
    self._build_cache(*args)
  File "/data1/zyx/yks/sources/incubator-mxnet/python/mxnet/gluon/block.py", line 481, in _build_cache
    inputs, out = self._get_graph(*args)
  File "/data1/zyx/yks/sources/incubator-mxnet/python/mxnet/gluon/block.py", line 473, in _get_graph
    out = self.hybrid_forward(symbol, *grouped_inputs, **params)  # pylint: disable=no-value-for-parameter
  File "/data1/zyx/yks/resnet38w/models/drn_gcn.py", line 97, in hybrid_forward
    fm0 = self.layer0(x)  # 256
  File "/data1/zyx/yks/sources/incubator-mxnet/python/mxnet/gluon/block.py", line 414, in __call__
    return self.forward(*args)
  File "/data1/zyx/yks/sources/incubator-mxnet/python/mxnet/gluon/block.py", line 637, in forward
    return self.hybrid_forward(symbol, x, *args, **params)
  File "/data1/zyx/yks/sources/incubator-mxnet/python/mxnet/gluon/nn/basic_layers.py", line 117, in hybrid_forward
    x = block(x)
  File "/data1/zyx/yks/sources/incubator-mxnet/python/mxnet/gluon/block.py", line 414, in __call__
    return self.forward(*args)
  File "/data1/zyx/yks/sources/incubator-mxnet/python/mxnet/gluon/block.py", line 637, in forward
    return self.hybrid_forward(symbol, x, *args, **params)
  File "/data1/zyx/yks/sources/gluon-cv/gluoncv/model_zoo/syncbn.py", line 117, in hybrid_forward
AttributeError: 'module' object has no attribute 'SumSquare'

And here's the code for training:

            with ag.record():
                outputs =  net(data)
                losses = criterion(outputs,target,mask)
                mx.nd.waitall()
                ag.backward(losses)

Error in calculating bbox_iou

https://github.com/dmlc/gluon-cv/blob/master/gluoncv/utils/bbox.py#L30

Seems
area_a = np.prod(bbox_a[:, 2:4] - bbox_a[:, :2], axis=1)

should be:

area_a = np.prod(bbox_a[:, 2:4] - bbox_a[:, :2] + 1, axis=1)

Failed to find any forward convolution algorithm.

mxnet version:mxnet_cu80-1.2.0.0b20180509
gluoncv version: 0.1.0
when i ran train_ssd.py with default parameters, this problem occurs.

Traceback (most recent call last):
  File "/media/ryan/F/Projects/PycharmProjects/mxnetSSD/train_ssd.py", line 257, in <module>
    train(net, train_data, val_data, classes, args)
  File "/media/ryan/F/Projects/PycharmProjects/mxnetSSD/train_ssd.py", line 184, in train
    anchors, cls_preds, gt_boxes, gt_ids)
  File "/media/ryan/F/virtualenv/mxnet/lib/python3.5/site-packages/mxnet/gluon/block.py", line 413, in __call__
    return self.forward(*args)
  File "/media/ryan/F/virtualenv/mxnet/lib/python3.5/site-packages/gluoncv/model_zoo/ssd/target.py", line 40, in forward
    samples = self._sampler(matches, cls_preds, ious)
  File "/media/ryan/F/virtualenv/mxnet/lib/python3.5/site-packages/mxnet/gluon/block.py", line 413, in __call__
    return self.forward(*args)
  File "/media/ryan/F/virtualenv/mxnet/lib/python3.5/site-packages/gluoncv/model_zoo/samplers.py", line 78, in forward
    y[np.where(x.asnumpy() >= 0)] = 1  # assign positive samples
  File "/media/ryan/F/virtualenv/mxnet/lib/python3.5/site-packages/mxnet/ndarray/ndarray.py", line 1890, in asnumpy
    ctypes.c_size_t(data.size)))
  File "/media/ryan/F/virtualenv/mxnet/lib/python3.5/site-packages/mxnet/base.py", line 149, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [18:06:16] src/operator/nn/./cudnn/cudnn_convolution-inl.h:744: Failed to find any forward convolution algorithm.

SyntaxError in /scripts/detection/ssd/eval_ssd.py

/data/yunfan.lu/home/anaconda2/envs/py3/bin/python /data/yunfan.lu/App/github.data/gluon-cv/scripts/detection/ssd/eval_ssd.py
  File "/data/yunfan.lu/App/github.data/gluon-cv/scripts/detection/ssd/eval_ssd.py", line 59
    batch_size, False, last_batch='keep', num_workers=num_workers)
    ^
SyntaxError: positional argument follows keyword argument

def get_dataloader(val_dataset, data_shape, batch_size, num_workers):
    """Get dataloader."""
    width, height = data_shape, data_shape
    batchify_fn = Tuple(Stack(), Pad(pad_val=-1))
    val_loader = gluon.data.DataLoader(
        val_dataset.transform(SSDDefaultValTransform(width, height)), batchify_fn=batchify_fn,
        batch_size, False, last_batch='keep', num_workers=num_workers)
    return val_loader

Unnecessary dependency on VOC dataset for model in Model Zoo

When trying out fcn_resnet50_voc after a fresh install of Gluon CV you get the error detailed below, where it appears that the VOC dataset is required. Given it's common to only want to use the model and not download the whole dataset, a different check should be used here.

After downloading the VOC dataset with python pascal_voc.py as found here, I was able to use the model from the Model Zoo, but this shouldn't be required.

model = gluoncv.model_zoo.get_model('fcn_resnet50_voc', pretrained=True)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/gluoncv/model_zoo/model_zoo.py in get_model(name, **kwargs)
     54     try:
---> 55         net = gluon.model_zoo.vision.get_model(name, **kwargs)
     56     except ValueError as e:

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/model_zoo/vision/__init__.py in get_model(name, **kwargs)
    150             'Model %s is not supported. Available options are\n\t%s' % (
--> 151                 name, '\n\t'.join(sorted(models.keys()))))
    152     return models[name](**kwargs)

ValueError: Model fcn_resnet50_voc is not supported. Available options are
	alexnet
	densenet121
	densenet161
	densenet169
	densenet201
	inceptionv3
	mobilenet0.25
	mobilenet0.5
	mobilenet0.75
	mobilenet1.0
	mobilenetv2_0.25
	mobilenetv2_0.5
	mobilenetv2_0.75
	mobilenetv2_1.0
	resnet101_v1
	resnet101_v2
	resnet152_v1
	resnet152_v2
	resnet18_v1
	resnet18_v2
	resnet34_v1
	resnet34_v2
	resnet50_v1
	resnet50_v2
	squeezenet1.0
	squeezenet1.1
	vgg11
	vgg11_bn
	vgg13
	vgg13_bn
	vgg16
	vgg16_bn
	vgg19
	vgg19_bn

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
<ipython-input-4-9baa5158fc03> in <module>()
----> 1 model = gluoncv.model_zoo.get_model('fcn_resnet50_voc', pretrained=True)

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/gluoncv/model_zoo/model_zoo.py in get_model(name, **kwargs)
     58         if name not in models:
     59             raise ValueError('%s\n\t%s' % (str(e), '\n\t'.join(sorted(models.keys()))))
---> 60         net = models[name](**kwargs)
     61     return net

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/gluoncv/model_zoo/fcn.py in get_fcn_voc_resnet50(**kwargs)
    140     >>> print(model)
    141     """
--> 142     return get_fcn('pascal_voc', 'resnet50', **kwargs)
    143 
    144 def get_fcn_voc_resnet101(**kwargs):

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/gluoncv/model_zoo/fcn.py in get_fcn(dataset, backbone, pretrained, root, ctx, **kwargs)
    113     # infer number of classes
    114     from ..data.segbase import get_segmentation_dataset
--> 115     data = get_segmentation_dataset(dataset)
    116     model = FCN(data.num_class, backbone=backbone, **kwargs)
    117     if pretrained:

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/gluoncv/data/segbase.py in get_segmentation_dataset(name, **kwargs)
     19         'pascal_aug': VOCAugSegmentation,
     20     }
---> 21     return datasets[name](**kwargs)
     22 
     23 

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/gluoncv/data/pascal_voc/segmentation.py in __init__(self, root, split, transform)
     35     def __init__(self, root=os.path.expanduser('~/.mxnet/datasets/voc'),
     36                  split='train', transform=None):
---> 37         super(VOCSegmentation, self).__init__(root)
     38         self.root = root
     39         _voc_root = os.path.join(self.root, self.BASE_DIR)

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/gluoncv/data/segbase.py in __init__(self, root, base_size, crop_size)
     37     # pylint: disable=abstract-method
     38     def __init__(self, root, base_size=520, crop_size=480):
---> 39         super(SegmentationDataset, self).__init__(root)
     40         self.base_size = base_size
     41         self.crop_size = crop_size

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/gluoncv/data/base.py in __init__(self, root)
     26                          datasets described in `gluon-cv/scripts/datasets`? You need \
     27                          to initialize each dataset only once.".format(root)
---> 28             raise OSError(helper_msg)
     29 
     30     @property

OSError: /home/ubuntu/.mxnet/datasets/voc is not a valid dir. Did you forget to initalize                          datasets described in `gluon-cv/scripts/datasets`? You need                          to initialize each dataset only once.

ssd_512_resnet50_v1_voc has low mAP in VOC 2007 test set!

# -*- coding: utf-8 -*-
# @Time    : 5/23/18 10:40 PM
# @Author  : yunfan
# @File    : test_in_voc_2007.py

from gluoncv import model_zoo
from gluoncv import data
from gluoncv import utils
import mxnet
import json

DEBUG=False

TEST_FILE_PATH = ' '
DEBUG_FILE_PATH = ' '

FILE_PATH = [TEST_FILE_PATH, DEBUG_FILE_PATH][DEBUG]

JPGImages = ' '
 

def filtre_an_image(bboxes, scores, labels, class_names, thresh=0.5):
    """
    TODO TOO MUCH TIME IN HERE
    :param bboxes:
    :param scores:
    :param labels:
    :param class_names:
    :param thresh:
    :return:
    """
    if isinstance(bboxes, mxnet.nd.NDArray):
        bboxes = bboxes.asnumpy()
    if isinstance(labels, mxnet.nd.NDArray):
        labels = labels.asnumpy()
    if isinstance(scores, mxnet.nd.NDArray):
        scores = scores.asnumpy()
    res = []
    for i, bbox in enumerate(bboxes):
        if scores is not None and scores.flat[i] < thresh:
            continue
        if labels is not None and labels.flat[i] < 0:
            continue
        label_idx = int(labels.flat[i])

        iter_info = {}
        iter_info['soc'] = str(scores.flat[i])
        iter_info['loc'] = bbox.astype('int').tolist()
        iter_info['clsid'] = label_idx
        iter_info['clsna'] = class_names[label_idx]
        print(iter_info)
        res.append(iter_info)
    return res


def getVOCTestFiles(JPGImages=JPGImages, tetsFilePath=FILE_PATH):
    res = []
    with open(tetsFilePath, 'r') as f:
        for line in f.readlines():
            res.append("{}/{}.jpg".format(JPGImages, str(line.strip())))
    return res


if __name__ == '__main__':
    ctx = mxnet.gpu(3)
    testFileList = getVOCTestFiles()
    ssdNet = model_zoo.get_model('ssd_512_resnet50_v1_voc', pretrained=True, ctx=ctx)
    x, img = data.transforms.presets.ssd.load_test(testFileList, short=512)
    # x = mxnet.nd.array(x, ctx=ctx)
    ans = {}
    ind = 0
    for xx in x:
        xx = mxnet.nd.array(xx, ctx=ctx)
        class_IDs, scores, bounding_boxs = ssdNet(xx)
        tt = filtre_an_image(
            bboxes=bounding_boxs[0],
            scores=scores[0],
            labels=class_IDs,
            class_names=ssdNet.classes)
        ans[testFileList[ind]] = tt
        ind += 1
    with open('VOC2007-ssd_512_resnet50_v1_voc.json', 'w') as outfile:
        json.dump(ans, outfile)

# -*- coding: utf-8 -*-
# @Time    : 5/24/18 10:54 AM
# @Author  : yunfan
# @File    : voc_eval.py

import numpy as np
import json
from detectionLib.dataset.pascal_voc import PascalVOC
from gluoncv.data.pascal_voc.detection import VOCDetection

DEBUG = False

VOC_2007_JSON_PATH = './VOC2007-SSD-512.json'


def res_to_allbbox(voc_classes, path=VOC_2007_JSON_PATH):
    with open(path, 'r') as f:
        res = json.load(f)
    detections = {}
    for res_id in res.keys():
        val = res[res_id]  # [{"loc":[xmin, ymin, xmax, ymax], "soc":0.8, "clsna":"car", "clsid":6},{}...]
        im_ind = res_id[-10:-4]  # 006907
        if im_ind == '000128':
            print("000128")
        for bbox in val:
            soc = bbox['soc']
            clsna = bbox['clsna']
            clsid = bbox['clsid']
            loc = bbox['loc']
            loc.append(float(soc))
            cls_ind = voc_classes.index(clsna)
            if cls_ind not in detections.keys():
                detections[cls_ind] = {}
            if im_ind not in detections[cls_ind].keys():
                detections[cls_ind][im_ind] = []
            detections[cls_ind][im_ind].append(loc)

    for cls_ind in detections.keys():
        rr = detections[cls_ind]
        for im_ind in rr:
            detections[cls_ind][im_ind] = np.array(detections[cls_ind][im_ind])
    return detections


def from_VOC_label():
    voc_2007_test_set = VOCDetection(
        root='/data/yunfan.lu/App/cc.hobot.yunfan/mx-detectron-old/data/VOCdevkit',
        splits=((2007, 'test'),)
    )

    voc_2007_det_label = {}

    for ind in range(len(voc_2007_test_set)):
        image_ind = voc_2007_test_set._items[ind][1]
        bbox_list = voc_2007_test_set[ind][1]
        print(image_ind)
        for bbox in bbox_list:
            bbox_coord = bbox[:4].tolist()
            class_id = int(bbox[4])
            class_name = voc_2007_test_set.CLASSES[class_id]
            class_new_id = pascalVOC.classes.index(class_name)

            if class_new_id not in voc_2007_det_label.keys():
                voc_2007_det_label[class_new_id] = {}
            if image_ind not in voc_2007_det_label[class_new_id].keys():
                voc_2007_det_label[class_new_id][image_ind] = []
            bbox_coord.append(1.0)
            voc_2007_det_label[class_new_id][image_ind].append(bbox_coord)

    for i in voc_2007_det_label.keys():
        for j in voc_2007_det_label[i].keys():
            voc_2007_det_label[i][j] = np.array(voc_2007_det_label[i][j])
    return voc_2007_det_label

if __name__ == '__main__':
    pascalVOC = PascalVOC(
        image_set='2007_test',
        root_path='',
        devkit_path='/data/yunfan.lu/App/cc.hobot.yunfan/mx-detectron-old/data/VOCdevkit',
    )
    if DEBUG:
        detections = from_VOC_label()
    else:
        detections = res_to_allbbox(voc_classes=pascalVOC.classes)
    pascalVOC.evaluate_detections(detections=detections)

in 0.7@mAP only 0.0022 ?

Anyone test this model ?

The training speed of gluon-ssd is slower than that of mxnet-ssd (39 vs 120 images / second)

Hi,
It's a great work.
The mAP of ssd_300_vgg16_atrous_voc reaches 0.77.

However, I found the training speed of gluon-ssd is slower than that of mxnet-ssd.

I use Tesla M40 24GB to train SSD(VGG16, 300 x 300) model.

Environment

Operator System: Ubuntu 14.04
GPU: Tesla M40 24GB
Python: 2.7.12
MXNet: 1.2.0 (installed by pip)
GluonCV: 0.1.0 (installed by pip)

Performance

Name	#GPUS	Training Speed(images / second)
SSD in GluonCV	1	30
SSD in GluonCV	2	37
SSD in GluonCV	4	39
SSD in GluonCV	8	40
mxnet-ssd	4	120
SSD in Caffe	4	40

Reading 32 images takes 5e-5 second in #worker is 32.
It seems that the bottleneck is data parallel.

net.target_generator takes 0.13~0.8 sec times the number of devices. It is not parallel due to python loop.

error to run the example 'Image Classification'

I run it on my local machine as we as colab.research.google.com, sample error information.

img = transform_fn(img)
plt.imshow(nd.transpose(img, (1,2,0)).asnumpy())
plt.show()

MXNetError: Error in operator normalize0_normalize0: [16:37:05] src/operator/image/./image_random-inl.h:110: Check failed: nchannels == 3 || nchannels == 1 The first dimension of input tensor must be the channel dimension with either 1 or 3 elements, but got input with shape [32,32,32]

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x1b7aea) [0x7f55c44b4aea]
[bt] (1) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x1b8121) [0x7f55c44b5121]
[bt] (2) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x366083) [0x7f55c4663083]
[bt] (3) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x2540e1a) [0x7f55c683de1a]
[bt] (4) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x2543710) [0x7f55c6840710]
[bt] (5) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x25514e7) [0x7f55c684e4e7]
[bt] (6) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x25522cc) [0x7f55c684f2cc]
[bt] (7) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x2557715) [0x7f55c6854715]
[bt] (8) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(MXInvokeCachedOp+0x48f) [0x7f55c679e1ff]
[bt] (9) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(MXInvokeCachedOpEx+0x4f) [0x7f55c679e7ef]

'module' object has no attribute 'BilinearResize2D'

I want to train my own dataset for semantic segmentation. So I just want FCN model to test. My code as follow:

from gluoncv.model_zoo.fcn import FCN
model = FCN(19,backbone='resnet50')
x = mx.nd.random.uniform(shape=(1,3,224,224))
preditct = model.forward(x)

but I got the error:

'module' object has no attribute 'BilinearResize2D'

My mxnet version is 1.1.0.
Is there any solution?

Weight initialization in FeatureExpander of SSD

The SSD algorithm uses the following initialization method for symbolic network:

weight_init = mx.init.Xavier(rnd_type='gaussian', factor_type='out', magnitude=2)
for i, f in enumerate(num_filters):
       if use_1x1_transition:
            num_trans = max(min_depth, int(round(f * reduce_ratio)))
            y = mx.sym.Convolution(
            y, num_filter=num_trans, kernel=(1, 1), no_bias=use_bn,
                    name='expand_trans_conv{}'.format(i), attr={'__init__': weight_init})

However, the SymbolBlock does not get the attr when creating parameters as follows:

for i in out.list_arguments():
        if i not in input_names:
            self.params.get(i, allow_deferred_init=True)

for i in out.list_auxiliary_states():
       if i not in input_names:
            self.params.get(i, grad_req='null', allow_deferred_init=True)

I also found such kind of initialization does not work in my own program. Did I misunderstand the process or are there any other tricks?

Shape consistency check needed.

if I don't use any transform. it will crash with batch size greater than 1.
link.
shape consistency check is missing before stacking.
It is able to get the correct shape, but getting the data leads to crash.

    batch_size = 2  # for tutorial, we use smaller batch-size
    num_workers = 0  # you can make it larger(if your CPU has more cores) to accelerate data loading
    train_loader = DetectionDataLoader(train_dataset, batch_size, shuffle=True,
                                       last_batch='rollover', num_workers=num_workers)
    for ib, batch in enumerate(train_loader):
        if ib > 3:
            break
        print batch[0].asnumpy()

@zhreshold

socket.error in gluoncv ssd

1.when I run ssd on a single gpu, I encounter a problem like this:

INFO:root:Namespace(batch_size=15, data_shape=512, dataset='voc', epochs=240, gpus='0', log_interval=100, lr=0.001, lr_decay=0.1, lr_decay_epoch='160,200', momentum=0.9, network='resnet50_v1', num_workers=32, resume='', save_interval=10, save_prefix='ssd_512_resnet50_v1_voc', seed=233, start_epoch=0, val_interval=1, wd=0.0005)
INFO:root:Start training from [Epoch 0]
[02:21:20] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:107: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
INFO:root:[Epoch 0][Batch 99], Speed: 37.221 samples/sec, CrossEntropy=7.690, SmoothL1=3.245
INFO:root:[Epoch 0][Batch 199], Speed: 36.276 samples/sec, CrossEntropy=6.458, SmoothL1=3.105
INFO:root:[Epoch 0][Batch 299], Speed: 38.109 samples/sec, CrossEntropy=5.929, SmoothL1=2.991
Traceback (most recent call last):
File "scripts/ssd/train_ssd.py", line 259, in
train(net, train_data, val_data, eval_metric, args)
File "scripts/ssd/train_ssd.py", line 192, in train
for i, batch in enumerate(train_data):
File "/usr/local/lib/python2.7/dist-packages/mxnet/gluon/data/dataloader.py", line 222, in next
return self.next()
File "/usr/local/lib/python2.7/dist-packages/mxnet/gluon/data/dataloader.py", line 218, in next
idx, batch = self._data_queue.get()
File "/usr/lib/python2.7/multiprocessing/queues.py", line 117, in get
res = self._recv()
File "/usr/local/lib/python2.7/dist-packages/mxnet/gluon/data/dataloader.py", line 88, in recv
return pickle.loads(buf)
File "/usr/lib/python2.7/pickle.py", line 1388, in loads
return Unpickler(file).load()
File "/usr/lib/python2.7/pickle.py", line 864, in load
dispatchkey
File "/usr/lib/python2.7/pickle.py", line 1139, in load_reduce
value = func(*args)
File "/usr/local/lib/python2.7/dist-packages/mxnet/gluon/data/dataloader.py", line 53, in rebuild_ndarray
fd = multiprocessing.reduction.rebuild_handle(fd)
File "/usr/lib/python2.7/multiprocessing/reduction.py", line 155, in rebuild_handle
conn = Client(address, authkey=current_process().authkey)
File "/usr/lib/python2.7/multiprocessing/connection.py", line 169, in Client
c = SocketClient(address)
File "/usr/lib/python2.7/multiprocessing/connection.py", line 308, in SocketClient
s.connect(address)
File "/usr/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
socket.error: [Errno 111] Connection refused

2.when I run it on multi gpus (like 2), the above error is not solved, and I find although you use twice batch_size compared with single gpu, the training samples/sec is not twice, in fact about 43 ~ 48samples/sec, I think this is not reasonable, imply the training process is not stable

3.the resume still not works in gluoncv, and I think it very inconvenient to train from 0 epoch each time.

Can you give me some suggestions? thx a lot!

'No space left on device' when enumerating over the DataLoader object in train.py (Semantic Segmentation)

Hi
I was trying to run the train.py file (for semantic segmentation) on the ade20k dataset using pspnet model (rest all arguments were the default ones). I am running this setup on docker.

oxe = Trainer(model = 'psp', dataset = 'ade20k', batch_size = 100)
print('Starting Epoch:', oxe.start_epoch)
print('Total Epoches:', oxe.epochs)
for epoch in range(oxe.start_epoch, oxe.epochs):
    oxe.training(epoch)
    oxe.validation(epoch)

mxnet version: 1.2.0
OS: Ubuntu 16.04
On executing the script I got the error: No space left on device.

Here is the stack trace:

('Starting Epoch:', 0)
('Total Epoches:', 50)`

OSErrorTraceback (most recent call last)
<ipython-input-28-41717107d429> in <module>()
      2 print('Total Epoches:', oxe.epochs)
      3 for epoch in range(oxe.start_epoch, oxe.epochs):
----> 4     oxe.training(epoch)
      5     oxe.validation(epoch)

<ipython-input-14-b9dd13a653e4> in training(self, epoch)
     82         tbar = tqdm(self.train_data)
     83         train_loss = 0.0
---> 84         for i, (data, target) in enumerate(tbar):
     85             self.lr_scheduler.update(i, epoch)
     86             with autograd.record(True):

/usr/local/lib/python2.7/dist-packages/tqdm/_tqdm.pyc in __iter__(self)
    928 """, fp_write=getattr(self.fp, 'write', sys.stderr.write))
    929 
--> 930             for obj in iterable:
    931                 yield obj
    932                 # Update and possibly print the progressbar.

/mxnet/python/mxnet/gluon/data/dataloader.pyc in __iter__(self)
    282         # multi-worker
    283         return _MultiWorkerIter(self._num_workers, self._dataset,
--> 284                                 self._batchify_fn, self._batch_sampler)
    285 
    286     def __len__(self):

/mxnet/python/mxnet/gluon/data/dataloader.pyc in __init__(self, num_workers, dataset, batchify_fn, batch_sampler)
    128         self._batchify_fn = batchify_fn
    129         self._batch_sampler = batch_sampler
--> 130         self._key_queue = Queue()
    131         self._data_queue = Queue(2*self._num_workers)
    132         self._data_buffer = {}

/mxnet/python/mxnet/gluon/data/dataloader.pyc in __init__(self, *args, **kwargs)
     74     def __init__(self, *args, **kwargs):
     75         if sys.version_info[0] <= 2:
---> 76             super(Queue, self).__init__(*args, **kwargs)
     77         else:
     78             super(Queue, self).__init__(*args, ctx=multiprocessing.get_context(),

/usr/lib/python2.7/multiprocessing/queues.pyc in __init__(self, maxsize)
     61         self._maxsize = maxsize
     62         self._reader, self._writer = Pipe(duplex=False)
---> 63         self._rlock = Lock()
     64         self._opid = os.getpid()
     65         if sys.platform == 'win32':

/usr/lib/python2.7/multiprocessing/synchronize.pyc in __init__(self)
    145 
    146     def __init__(self):
--> 147         SemLock.__init__(self, SEMAPHORE, 1, 1)
    148 
    149     def __repr__(self):

/usr/lib/python2.7/multiprocessing/synchronize.pyc in __init__(self, kind, value, maxvalue)
     73 
     74     def __init__(self, kind, value, maxvalue):
---> 75         sl = self._semlock = _multiprocessing.SemLock(kind, value, maxvalue)
     76         debug('created semlock with handle %s' % sl.handle)
     77         self._make_methods()

OSError: [Errno 28] No space left on device

Thanks a lot for helping!

transform in data api

@zhreshold @zhanghang1989

i remember gluon has changed the api to transform_first instead of passing to transform?

also target_transform is confusing. shouldn't transform can also transform the label?

Code Style Issues

Should be a metric class instead of a bunch of functions. BTW why is it specific to VOC? https://github.com/dmlc/gluon-cv/blob/master/gluoncv/utils/metrics/voc_segmentation.py
Why put nn under util?
There is nn.BBoxCornerToCenter and utils.bbox.bbox_xyxy_to_xywh . Are they duplicates?
data.transforms should be Block based transformers instead of functions
Why put test function in main codebase? https://github.com/dmlc/gluon-cv/blob/master/gluoncv/data/__init__.py#L12

Error in image segmentation tutorial

Following the getting started with FCN on image segmentation, I get an error when trying to download the model:

model = gluoncv.model_zoo.get_model('fcn_resnet50_voc', pretrained=True)
------------------------------------------------------------------------------------------------ 
ValueError                                Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/gluoncv/model_zoo/model_zoo.py in get_model(name, **kwargs)
     86     try:
---> 87         net = gluon.model_zoo.vision.get_model(name, **kwargs)
     88     except ValueError as e:

~/anaconda3/lib/python3.6/site-packages/mxnet/gluon/model_zoo/vision/__init__.py in get_model(name, **kwargs)
    143             'Model %s is not supported. Available options are\n\t%s'%(
--> 144                 name, '\n\t'.join(sorted(models.keys()))))
    145     return models[name](**kwargs)

ValueError: Model fcn_resnet50_voc is not supported. Available options are
	alexnet
	densenet121
	densenet161
	densenet169
	densenet201
	inceptionv3
	mobilenet0.25
	mobilenet0.5
	mobilenet0.75
	mobilenet1.0
	resnet101_v1
	resnet101_v2
	resnet152_v1
	resnet152_v2
	resnet18_v1
	resnet18_v2
	resnet34_v1
	resnet34_v2
	resnet50_v1
	resnet50_v2
	squeezenet1.0
	squeezenet1.1
	vgg11
	vgg11_bn
	vgg13
	vgg13_bn
	vgg16
	vgg16_bn
	vgg19
	vgg19_bn

During handling of the above exception, another exception occurred:

mx.__version__
'1.2.0'

gluoncv.__version__
'0.1.0'

Are the tutorials not integrated in the CI ?

[Feature Request] Add Other State-of-the-art Detection Models

Hi,
Do you have any plan for adding other new State-of-the-art Detection Models (i.e., RefineDet & S3FD & RFBNet)?

Training speed of mxnet-ssd slows down?

 I have use record file(voc07+12) to train old-style ssd at a speed of 40 images/s ,The speed is about 25 images/s when  I try the new  train_ssd.py in gluoncv.
 I use rec dataset and  transform to replace origin file datasets in new ssd code. But when I set **num-workers=4** the gdata.DetectionDataLoader  failed ,while **num-workers=1** , It works but the speed is almost as  slow as original data reading method.
The  error infomation is as following:

Process Process-3:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/gluon/data/dataloader.py", line 134, in worker_loop
    batch = batchify_fn([dataset[i] for i in samples])
  File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/gluon/data/dataset.py", line 126, in __getitem__
    self.run()
    item = self._data[idx]
  File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/gluon/data/vision/datasets.py", line 257, in __getitem__
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    record = super(ImageRecordDataset, self).__getitem__(idx)
  File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/gluon/data/dataset.py", line 180, in __getitem__
    return self._record.read_idx(self._record.keys[idx])
  File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/recordio.py", line 265, in read_idx
    self._target(*self._args, **self._kwargs)
  File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/gluon/data/dataloader.py", line 134, in worker_loop
    return self.read()
  File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/recordio.py", line 163, in read
    ctypes.byref(size)))
  File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/base.py", line 149, in check_call
    batch = batchify_fn([dataset[i] for i in samples])
    raise MXNetError(py_str(_LIB.MXGetLastError()))
MXNetError: [16:12:48] src/recordio.cc:65: Check failed: header[0] == RecordIOWriter::kMagic Invalid RecordIO File

It seems a multi-process problem with old rec file dataset?

Undefined names in model_zoo

flake8 testing of https://github.com/dmlc/gluon-cv on Python 3.6.3

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./gluoncv/model_zoo/cifarresnext.py:122:31: F821 undefined name 'Block'
            total_expansion = Block.expansion ** len(layers)
                              ^
./gluoncv/model_zoo/resnext.py:116:35: F821 undefined name '_conv3x3'
                self.features.add(_conv3x3(64, 1, 0))
                                  ^
./gluoncv/model_zoo/segbase.py:102:21: F821 undefined name '_reshape_like'
            label = _reshape_like(F, label, pred)
                    ^
./gluoncv/model_zoo/segbase.py:102:45: F821 undefined name 'pred'
            label = _reshape_like(F, label, pred)
                                            ^
./gluoncv/model_zoo/segbase.py:103:27: F821 undefined name 'pred'
            loss = -F.sum(pred*label, axis=self._axis, keepdims=True)
                          ^
5     F821 undefined name 'Block'
5

'ParameterDict' object has no attribute 'get_constant'

i run command 'model_zoo.ssd.ssd_512_resnet152_v2_voc(pretrained=True)' and get this error.
Command 'model_zoo.ssd.ssd_512_resnet50_v1_voc(pretrained=True)' also get this error

Feedback on gluon ssd!

gluon-cv commit id 411f61c
mxnet version: both 1.2 and 1.3
cuda-9.0, cudnn7.0
python 2.7
GPU: Tesla P100

Try gluon-ssd-vgg-300 on my own dataset with coco format input

orig_height and orig_width should be float type for python2.x
https://github.com/dmlc/gluon-cv/blob/c114f6c4b7d64e00ab2b5be747d5e46cfd68e04a/gluoncv/utils/metrics/coco_detection.py#L184:51
TypeError: unicode argument expected, got 'str'
change https://github.com/dmlc/gluon-cv/blob/c114f6c4b7d64e00ab2b5be747d5e46cfd68e04a/gluoncv/utils/metrics/coco_detection.py#L128:33 to sys.stdout = io.BytesIO() fix my error
Error: num_slice does not evenly divide data.shape[batch_axis].
In validation, last_batch='keep' is used for validation data loader, it will case error when num of you validation data can't be divided by batchsize.
add even_split=False to https://github.com/dmlc/gluon-cv/blob/c114f6c4b7d64e00ab2b5be747d5e46cfd68e04a/scripts/detection/ssd/train_ssd.py#L123:23 fix my error.
Training speed is very unstable with mxnet 1.2, sometimes 140-150 samples/s, and sometimes 10-20 samples/s, ungrade mxnet to 1.3 fix the problem, and training speed is always around 140-150 sample/s
Results for gluon-ssd is better than mxnet-ssd on my own dataset, 3-4 mAP more with coco-metric.
FPS for gluon-ssd-vgg-300 is
b=1, fps=55
b=8, fps=117
b=16, fps=148
and speed for mxnet-ssd is
b=1, fps=65
b=8, fps=124
b=16, fps=156

Finally, thanks to gluon-cv team!

Exported fcn_resnet50_voc model has duplicate layer names

It seems that when exporting the fcn_resnet50_voc model, it causes duplicate layer names in all of the Activation layers. This causes issues with using this network in MXNet since sym.get_internals() will fail when there are duplicate names.

Example of duplicate names:

{
      "op": "Activation", 
      "name": "fcn0_resnetv1b0_layers1_relu1_fwd", 
      "attrs": {"act_type": "relu"}, 
      "inputs": [[65, 0, 0]]
},

and

{
      "op": "Activation", 
      "name": "fcn0_resnetv1b0_layers1_relu1_fwd", 
      "attrs": {"act_type": "relu"}, 
      "inputs": [[56, 0, 0]]
}

Here is the issue when I try to call get_internals.

ValueError: There are multiple outputs with name "fcn0_resnetv1b0_layers1_relu0_fwd_output"

voc_detection.py has an error about AttributeError: 'NDArray' object has no attribute 'flat'

python /data/yunfan.lu/App/App/github.yunfan/gluon-cv/train_ssd.py

INFO:root:[Epoch 0] Training cost: 3146.936, CrossEntropy=5.021, SmoothL1=2.144
Traceback (most recent call last):
  File "/data/yunfan.lu/App/App/github.yunfan/gluon-cv/train_ssd.py", line 248, in <module>
    train(net, train_data, val_data, eval_metric, args)
  File "/data/yunfan.lu/App/App/github.yunfan/gluon-cv/train_ssd.py", line 213, in train
    map_name, mean_ap = validate(net, val_data, ctx, eval_metric)
  File "/data/yunfan.lu/App/App/github.yunfan/gluon-cv/train_ssd.py", line 138, in validate
    eval_metric.update(det_bboxes, det_ids, det_scores, gt_bboxes, gt_ids, gt_difficults)
  File "/data/yunfan.lu/App/github.data/gluon-cv/gluoncv/utils/metrics/voc_detection.py", line 99, in update
    valid_pred = np.where(pred_label.flat >= 0)[0]
AttributeError: 'NDArray' object has no attribute 'flat'

Data Parallel Fails with Single GPU

Keeping track of the issue reported by @mylxiaoyi
#93 (comment)

"I have only one GPU, when used with aux=True in FCN, the outputs is a tuple, and the length of the outputs is 2, so cause the assert error, because the len(self.ctx_list) is 1. If used with aux=False in FCN, the length of the outputs is the batch_size, e.g. 4, and also cause assert error."

VOC07MApMetric will raise error when rec or prec is None

In VOC07MApMetric._average_precision(), codes don't consider the condition that rec is None or prec is None. So that it'll cause error when come to this line.

gluon-cv/gluoncv/utils/metrics/voc_detection.py

Line 278 in 089b9c5

p = np.max(np.nan_to_num(prec)[rec >= t])

How about add these following lines to the function:

if rec is None or prec is None:
     return np.nan

About the target.py and rpn_target.py

Hi, I find the SSDTargetGenerator defined in nn/ssd/target.py not used over all the gluoncv, the same question in nn/rpn/rpn_target.py. So what's the reason, do you consider the gluon.Block will slow down the training process and use completely hybrid/symbol? Or you have any other plan to put it into use in the next version?

Can not run classification experiment on docker

Seems that in docker process always failed to create tensor on "cpu_shared". While the same experiment runs fine on the host machine.

The simplest code will fail:

import mxnet as mx
from mxnet import gluon
from mxnet.gluon.data.vision import transforms

data_dir = './cv/val'

transform_train = transforms.Compose([
    transforms.Resize(480),
    transforms.RandomResizedCrop(224),
    transforms.ToTensor()
])

train_data = gluon.data.DataLoader(
    gluon.data.vision.ImageFolderDataset(data_dir).transform_first(transform_train),
    batch_size=4, shuffle=True, last_batch='discard', num_workers=1)

for batch in train_data:
    print(type(batch))

it will stuck here:
https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/data/dataloader.py#L101

AttributeError: '_thread._local' object has no attribute 'value'

I have tried to use Gluoncv at grpc server side and Django server. In both cases I get the "AttributeError: '_thread._local' object has no attribute 'value'" error. Apparently has something to do with multi threading. For example this line gives the error:
net = model_zoo.get_model('ssd_512_resnet50_v1_voc', pretrained=True)

Example of stacktrace

Traceback (most recent call last):
File "/venv/lib/python3.6/site-packages/django/core/handlers/exception.py", line 35, in inner
response = get_response(request)
File "/venv/lib/python3.6/site-packages/django/core/handlers/base.py", line 128, in _get_response
response = self.process_exception_by_middleware(e, request)
File "/venv/lib/python3.6/site-packages/django/core/handlers/base.py", line 126, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/server/posts/views.py", line 28, in index
net = model_zoo.get_model('ssd_512_resnet50_v1_voc', pretrained=True)
File "/venv/lib/python3.6/site-packages/gluoncv/model_zoo/model_zoo.py", line 60, in get_model
net = modelsname
File "/venv/lib/python3.6/site-packages/gluoncv/model_zoo/ssd/ssd.py", line 340, in ssd_512_resnet50_v1_voc
pretrained_base=pretrained_base, **kwargs)
File "/venv/lib/python3.6/site-packages/gluoncv/model_zoo/ssd/ssd.py", line 236, in get_ssd
pretrained=pretrained_base, classes=classes, **kwargs)
File "/venv/lib/python3.6/site-packages/gluoncv/model_zoo/ssd/ssd.py", line 92, in init
super(SSD, self).init(**kwargs)
File "/venv/lib/python3.6/site-packages/mxnet/gluon/block.py", line 621, in init
super(HybridBlock, self).init(prefix=prefix, params=params)
File "/venv/lib/python3.6/site-packages/mxnet/gluon/block.py", line 171, in init
self._prefix, self._params = _BlockScope.create(prefix, params, self._alias())
File "/venv/lib/python3.6/site-packages/mxnet/gluon/block.py", line 53, in create
prefix = _name.NameManager.current.value.get(None, hint) + ''
AttributeError: '_thread._local' object has no attribute 'value'

Having problem importing batchify

Hi, I'm following the Model Zoo instructions and trying to train faster_rcnn using train_faster_rcnn.py
But there was an error importing batchify, rcnn and coco_detection like "ImportError: cannot import name 'batchify'".

I've installed mxnet, gluon and the latest gluoncv, which should include things like batchify.

Could you please help me find out what might went wrong?

Thanks!

[Feature Request] Download datasets from code

For people following tutorials it is quite a bit of overhaed to get the datasets scripts and run them.

I would suggest adding a download=True flag to let the user choose to download the dataset automatically rather than asking them to run the scripts.

Failed to get_fcn_voc_resnet50 or get_fcn_voc_resnet101 with ctx=mx.gpu()

I tried to get a pretrained model with mx.gpu() as context but it complains that the parameters were previously initialized on [cpu(0)].

In [5]: net = fcn.get_fcn_voc_resnet50(pretrained=True, ctx=mx.gpu())
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-5-a5a5eb1eae21> in <module>()
----> 1 net = fcn.get_fcn_voc_resnet50(pretrained=True, ctx=mx.gpu())

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/gluoncv/model_zoo/fcn.py in get_fcn_voc_resnet50(**kwargs)
    140     >>> print(model)
    141     """
--> 142     return get_fcn('pascal_voc', 'resnet50', **kwargs)
    143
    144 def get_fcn_voc_resnet101(**kwargs):

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/gluoncv/model_zoo/fcn.py in get_fcn(dataset, backbone, pretrained, root, ctx, **kwargs)
    118         from .model_store import get_model_file
    119         model.load_params(get_model_file('fcn_%s_%s'%(backbone, acronyms[dataset]),
--> 120                                          root=root), ctx=ctx)
    121     return model
    122

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in load_params(self, filename, ctx, allow_missing, ignore_extra)
    339             del loaded
    340             self.collect_params().load(
--> 341                 filename, ctx, allow_missing, ignore_extra, self.prefix)
    342             return
    343

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/parameter.py in load(self, filename, ctx, allow_missing, ignore_extra, restore_prefix)
    796                         name[lprefix:], filename, _brief_print_list(self._params.keys()))
    797                 continue
--> 798             self[name]._load_init(arg_dict[name], ctx)

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/parameter.py in _load_init(self, data, ctx)
    218                 "Failed to load Parameter '%s' on %s because it was " \
    219                 "previous initialized on %s."%(
--> 220                     self.name, str(ctx), str(self.list_ctx()))
    221             self.set_data(data)
    222         self._deferred_init = ()

AssertionError: Failed to load Parameter 'fcn1_dilatedresnetv00_conv0_weight' on [gpu(0)] because it was previous initialized on [cpu(0)].

Undefined name 'stage' in model_zoo/ssd/vgg_atrous.py

https://github.com/dmlc/gluon-cv/blob/master/gluoncv/model_zoo/ssd/vgg_atrous.py#L121

flake8 testing of https://github.com/dmlc/gluon-cv on Python 3.6.3

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./gluoncv/data/mscoco/pycocotools/coco.py:272:41: F821 undefined name 'm'
                        img = np.ones( (m.shape[0], m.shape[1], 3) )
                                        ^
./gluoncv/data/mscoco/pycocotools/coco.py:272:53: F821 undefined name 'm'
                        img = np.ones( (m.shape[0], m.shape[1], 3) )
                                                    ^
./gluoncv/data/mscoco/pycocotools/coco.py:279:52: F821 undefined name 'm'
                        ax.imshow(np.dstack( (img, m*0.5) ))
                                                   ^
./gluoncv/data/mscoco/pycocotools/coco.py:441:16: F821 undefined name 'm'
        return m
               ^
./gluoncv/model_zoo/ssd/vgg_atrous.py:121:29: F821 undefined name 'stage'
                            stage.add(nn.BatchNorm())
                            ^
5     F821 undefined name 'm'
5

[Feature Request] attach some symbol file for model zoo(details of model connection)

Hi,

To see the connection of pretrained network,it can easly use visualize method,like the code below.
However,I think symbol file is more clearly than visualize the network (similar to caffe prototext),because we probably missing some details of the network connection or some parameters.
Is there any plan to attach some symbol file ? also , it can claims that the model is the same as the paper
thanks!

import gluoncv as gcv
import mxnet as mx
net = gcv.model_zoo.get_model('ssd_512_mobilenet1.0_voc', pretrained=True)
sym = net(mx.sym.var('data'))
# visualize it like usual

Image Segmentation Network, strange looking images

Using gluoncv 0.2 this is the output I get when running the image segmentation tutorial:

Is this the expected output? The mask looks strange and the denormalized image has artefact.

The training loss is nan in SSD

I use the default training script (train_ssd.py) to train SSD300, However, the training loss seems to be large and do not converge. The log file is shown below, and what's the problem for this? Thanks!

Namespace(batch_size=32, data_shape=300, dataset='voc', epochs=240, gpus='0', log_interval=100, lr=0.001, lr_decay=0.1, lr_decay_epoch='160,200', momentum=0.9, network='vgg16_atrous', num_workers=4, resume='', save_interval=10, save_prefix='ssd_300_vgg16_atrous_voc', seed=233, start_epoch=0, wd=0.0005)
Start training from [Epoch 0]
[Epoch 0][Batch 99], Speed: 31.855627 samples/sec, CrossEntropy=12.250115, SmoothL1=nan
[Epoch 0][Batch 199], Speed: 31.908488 samples/sec, CrossEntropy=12.212773, SmoothL1=nan
[Epoch 0][Batch 299], Speed: 31.279352 samples/sec, CrossEntropy=12.199556, SmoothL1=nan
[Epoch 0][Batch 399], Speed: 30.720933 samples/sec, CrossEntropy=12.192253, SmoothL1=nan
[Epoch 0][Batch 499], Speed: 31.459032 samples/sec, CrossEntropy=12.187363, SmoothL1=nan
[Epoch 0] Training cost: 548.135802, CrossEntropy=12.186622, SmoothL1=nan
[Epoch 0] Validation: 
aeroplane=0.000389
bicycle=0.000102
bird=0.004152
boat=0.000011
bottle=0.000183
bus=0.000084
car=0.000175
cat=0.004419
chair=0.001955
cow=0.000036
diningtable=0.000058
dog=0.001009
horse=0.000108
motorbike=0.005600
person=0.000281
pottedplant=0.000103
sheep=0.000045
sofa=0.000598
train=0.002716
tvmonitor=0.000453
mAP=0.001124
[Epoch 1][Batch 99], Speed: 31.706873 samples/sec, CrossEntropy=12.165724, SmoothL1=nan

SSD nms_topk params will be changed in validate() and will not be changed back in train() for train_ssd.py

Seems net.set_nms() only invoke in validate() for train_ssd.py that change ssd nms_topk from default -1 to 400.
And those nms params will not be set back to default in train().

Is it expected function and will this influence training result？

SSD results on datasets

Hi guys,
Can you please publish your latest results for MSCOCO & VOC datasets?

Thanks for this awesome repo!

SSD_512_mobilenet1_0: sd0_expand_bn3_gamma' has not been initialized

Hi,

I got the following error when use my trained model in demo:

RuntimeError: Parameter 'ssd0_expand_bn3_gamma' has not been initialized. Note that you should initialize parameters and create Trainer with Block.collect_params() instead of Block.params because the later does not include Parameters of nested child Blocks

Error on train Faster-RCNN

When I train Faster-RCNN on Mxnet-cu80, it arrise a bug:
mxnet.base.MXNetError: Cannot find argument 'static_alloc', Possible Arguments:
And I cannot use RoiAlign on this addition

Image Segmentation: gluoncv.model_zoo.resnet50_v1b(pretrained=True) not available

pretrained_net = gluoncv.model_zoo.resnet50_v1b(pretrained=True)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-10-4bcb8b012a53> in <module>()
----> 1 pretrained_net = gluoncv.model_zoo.resnet50_v1b(pretrained=True)

AttributeError: module 'gluoncv.model_zoo' has no attribute 'resnet50_v1b'

Does the tutorial needs to be updated?

Bad file descriptor ERROR when run train_faster_rcnn.py

Hi, when I try to run train_faster_rcnn.py and eval_faster_rcnn.py it is fail by some OSError,Bad file descriptor. Thanks for your attenation.

INFO:root:Namespace(batch_size=2, dataset='voc', epochs=30, gpus='0,1', log_interval=100, lr=0.001, lr_decay=0.1, lr_decay_epoch='14,20', momentum=0.9, network='resnet50_v2a', num_workers=8, resume='', save_interval=1, save_prefix='faster_rcnn_resnet50_v2a_voc', seed=233, start_epoch=0, val_interval=1, verbose=False, wd=0.0005)
INFO:root:Start training from [Epoch 0]
Process Process-2:
Traceback (most recent call last):
  File "/home/users/yunfan.lu/anaconda2/envs/py3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/users/yunfan.lu/anaconda2/envs/py3/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/users/yunfan.lu/anaconda2/envs/py3/lib/python3.6/site-packages/mxnet-1.2.0-py3.6.egg/mxnet/gluon/data/dataloader.py", line 157, in worker_loop
    data_queue.put((idx, batch))
  File "/home/users/yunfan.lu/anaconda2/envs/py3/lib/python3.6/multiprocessing/queues.py", line 341, in put
    obj = _ForkingPickler.dumps(obj)
  File "/home/users/yunfan.lu/anaconda2/envs/py3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/home/users/yunfan.lu/anaconda2/envs/py3/lib/python3.6/site-packages/mxnet-1.2.0-py3.6.egg/mxnet/gluon/data/dataloader.py", line 64, in reduce_ndarray
    fd = multiprocessing.reduction.DupFd(fd)
  File "/home/users/yunfan.lu/anaconda2/envs/py3/lib/python3.6/multiprocessing/reduction.py", line 191, in DupFd
    return resource_sharer.DupFd(fd)
  File "/home/users/yunfan.lu/anaconda2/envs/py3/lib/python3.6/multiprocessing/resource_sharer.py", line 48, in __init__
    new_fd = os.dup(fd)
OSError: [Errno 9] Bad file descriptor
Process Process-6:
Traceback (most recent call last):
  File "/home/users/yunfan.lu/anaconda2/envs/py3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/users/yunfan.lu/anaconda2/envs/py3/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/users/yunfan.lu/anaconda2/envs/py3/lib/python3.6/site-packages/mxnet-1.2.0-py3.6.egg/mxnet/gluon/data/dataloader.py", line 157, in worker_loop
    data_queue.put((idx, batch))
  File "/home/users/yunfan.lu/anaconda2/envs/py3/lib/python3.6/multiprocessing/queues.py", line 341, in put
    obj = _ForkingPickler.dumps(obj)
  File "/home/users/yunfan.lu/anaconda2/envs/py3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/home/users/yunfan.lu/anaconda2/envs/py3/lib/python3.6/site-packages/mxnet-1.2.0-py3.6.egg/mxnet/gluon/data/dataloader.py", line 64, in reduce_ndarray
    fd = multiprocessing.reduction.DupFd(fd)
  File "/home/users/yunfan.lu/anaconda2/envs/py3/lib/python3.6/multiprocessing/reduction.py", line 191, in DupFd
    return resource_sharer.DupFd(fd)
  File "/home/users/yunfan.lu/anaconda2/envs/py3/lib/python3.6/multiprocessing/resource_sharer.py", line 48, in __init__
    new_fd = os.dup(fd)
OSError: [Errno 9] Bad file descriptor
Process Process-7:
Traceback (most recent call last):
  File "/home/users/yunfan.lu/anaconda2/envs/py3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/users/yunfan.lu/anaconda2/envs/py3/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/users/yunfan.lu/anaconda2/envs/py3/lib/python3.6/site-packages/mxnet-1.2.0-py3.6.egg/mxnet/gluon/data/dataloader.py", line 157, in worker_loop
    data_queue.put((idx, batch))
  File "/home/users/yunfan.lu/anaconda2/envs/py3/lib/python3.6/multiprocessing/queues.py", line 341, in put
    obj = _ForkingPickler.dumps(obj)
  File "/home/users/yunfan.lu/anaconda2/envs/py3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)

Saving file with `?` on Windows machine

There is a tutorial where a file is downloaded with a query string.

When trying to save it on Windows machine, one will receive an exception, because a question mark is a reserved char and cannot be used in a filename.

Here is the user complaining about that: https://discuss.mxnet.io/t/invalid-argument-street-small-jpg-raw-true/1065