Comments (19)
@pyupcgithub if you want to train model by pycaffe, you should write data layer by yourself and then train the model. But it is really difficult because the data layer is really complex.
from cascade-rcnn.
@Peng-wei-Yu ....... if so ,how could i use the matlab script to do detection using the pretrained model to do detection on one single image?
from cascade-rcnn.
Have a look at:
https://gist.github.com/makefile/6731ca0e311b6401681c15635bb97330
from cascade-rcnn.
@Shadow992 why i can't open the url ?
from cascade-rcnn.
@pyupcgithub @Peng-wei-Yu @Shadow992 Hi, have you train the resnet101+fpn+cascade or resnet50+fpn+cascade model with your dataset successfully?
from cascade-rcnn.
@huinsysu
I have trained my own model based on Inception v3 on my own dataset. However I trained it on two different datasets:
-
Detecting license plate on images. This one worked amazingly well. Without much finetuning I reached quite good results (about 90% accuracy on different datasets).
-
Detecting characters on the croped license plates. This one does not work after two month of testing/finetuning/etc.
I found other threads and especially papers like "DSOD: Learning Deeply Supervised Object Detectors from Scratch" ( http://openaccess.thecvf.com/content_ICCV_2017/papers/Shen_DSOD_Learning_Deeply_ICCV_2017_paper.pdf ), which suggest that training models from scratch may fail due to ROI-Pooling-Layer. Therefore you should always use pre-trained models.
I tried training a classifier on the dataset, which only classifies the characters and then keep the convolutional layers and replace the others with RCNN components. It still fails to learn. No matter what hyper parameters, regions, apsect ratios, anchor points, etc. I choose, it always fails. If there would not be a paper, which uses Faster RCNN for OCR, I would have said: Faster RCNN can just not do it.
But I guess there is something wrong with my dataset/training/hyperparamaters. However I was not able to find it until now...
Edit:
"Does not work" means not bad results, but accuracy of about maybe 0.1% for detection/localization. It just does not work. However when I overfit the model on my training data and use the same inference code in C++ I used to use, then it works like a charm. Therefore I highly suspect that the way I am doing training or similar does not work perfectly or needs more finetuning. Especially localization seems to work extremely poor. Classification seems okish to me but still quite bad.
from cascade-rcnn.
@Shadow992 can you show me the url that i can't view.
from cascade-rcnn.
@Shadow992 Thanks for your detailed reply. When I trained the res50-15s-800-fpn-cascade model on my datasets, I just used the pre-trained model of ImageNet. But the result of my model was very bad. The scores of the bbox that the model detected were very low, which meant the model classfied those bbox to background. And I have no idea how to solve such problem. So I plan to dive into the training code and hope to find the reason of my problem.
from cascade-rcnn.
@huinsysu
This also happened to me when training on character recognition. My model nearly always predicts background with about ~90% or higher probability. I am also suffering from the exact same problem. However as mentioned for License Plate detection nearly the exact same network works great. I guess there must be something wrong or at least needs some more finetuning. I am now trying to make ROI-Pooling-Layer much bigger (now size of 15x9). Hopefully this helps, but I guess not.
A "tiny workaround" would be to interpret backgrounds with lower probability of 90% or similar as foreground and extract the maximum foreground object. But this does still not work quite good, especially when thinking about bbox regression and similar...
What kind of objects do you want to detect?
Python inference:
import os
import sys
import argparse
import numpy as np
from PIL import Image, ImageDraw
import cv2
import time
# Make sure that caffe is on the python path:
caffe_root = '../../..'
#os.chdir(caffe_root)
sys.path.insert(0, os.path.join(caffe_root, 'python'))
import caffe
# from google.protobuf import text_format
# from caffe.proto import caffe_pb2
class CaffeDetection:
def __init__(self, gpu_id, model_def, model_weights,
cascade=0, FPN=0):
if gpu_id < 0:
caffe.set_mode_cpu()
else:
caffe.set_device(gpu_id)
caffe.set_mode_gpu()
# Load the net in the test phase for inference, and configure input preprocessing.
self.net = caffe.Net(model_def, # defines the structure of the model
model_weights, # contains the trained weights
caffe.TEST) # use test mode (e.g., don't perform dropout)
# input preprocessing: 'data' is the name of the input blob == net.inputs[0]
#self.transformer = caffe.io.Transformer({'data': self.net.blobs['data'].data.shape})
#self.transformer.set_transpose('data', (2, 0, 1))
#self.transformer.set_mean('data', np.array([104, 117, 123])) # mean pixel
## the reference model operates on images in [0,255] range instead of [0,1]
#self.transformer.set_raw_scale('data', 255)
## the reference model has channels in BGR order instead of RGB
#self.transformer.set_channel_swap('data', (2, 1, 0))
self.cascade = cascade > 0
self.FPN = FPN > 0
print cascade,FPN
if not self.cascade:
# baseline model
if self.FPN:
self.proposal_blob_names = ['proposals_to_all']
else:
self.proposal_blob_names = ['proposals']
self.bbox_blob_names = ['output_bbox_1st']
self.cls_prob_blob_names = ['cls_prob_1st']
self.output_names = ['1st']
else:
# cascade-rcnn model
if self.FPN:
self.proposal_blob_names = ['proposals_to_all', 'proposals_to_all_2nd',
'proposals_to_all_3rd', 'proposals_to_all_2nd', 'proposals_to_all_3rd']
else:
self.proposal_blob_names = ['proposals', 'proposals_2nd', 'proposals_3rd',
'proposals_2nd', 'proposals_3rd']
self.bbox_blob_names = ['output_bbox_1st', 'output_bbox_2nd', 'output_bbox_3rd',
'output_bbox_2nd', 'output_bbox_3rd']
self.cls_prob_blob_names = ['cls_prob_1st', 'cls_prob_2nd', 'cls_prob_3rd',
'cls_prob_2nd_avg', 'cls_prob_3rd_avg']
self.output_names = ['1st', '2nd', '3rd', '2nd_avg', '3rd_avg']
self.num_outputs = len(self.proposal_blob_names)
assert(self.num_outputs==len(self.bbox_blob_names))
assert(self.num_outputs==len(self.cls_prob_blob_names))
assert(self.num_outputs==len(self.output_names))
# detection configuration
# detect_final_boxes = np.zeros(nImg, num_outputs)
#self.det_thr = 0.001 # threshold for testing
self.det_thr = 0.3 # threshold for demo
self.max_per_img = 100 # max number of detections
self.nms_thresh = 0.5 # NMS
if FPN:
self.shortSize = 800
self.longSize = 1312
else:
self.shortSize = 600
self.longSize = 1000
self.PIXEL_MEANS = np.array([104, 117, 123],dtype=np.uint8)
self.num_cls = 80
def detect(self, image_file):
'''
rcnn detection
'''
#image = caffe.io.load_image(image_file)
image = cv2.imread(image_file) # BGR, default is cv2.IMREAD_COLOR 3-channel
orgH, orgW, channel = image.shape
print("image shape:",image.shape)
rzRatio = self.shortSize / min(orgH, orgW)
imgH = min(rzRatio * orgH, self.longSize)
imgW = min(rzRatio * orgW, self.longSize)
imgH = round(imgH / 32) * 32
imgW = round(imgW / 32) * 32 # must be the multiple of 32
hwRatios = [imgH/orgH, imgW/orgW]
#transformed_image = self.transformer.preprocess('data', image)
#image = cv2.resize(im_orig, None, None, fx=im_scale, fy=im_scale,
resized_w = int(imgW)
resized_h = int(imgH)
print 'resized -> ',(resized_w, resized_h)
image = cv2.resize(image, (resized_w, resized_h), interpolation=cv2.INTER_LINEAR)
image -= self.PIXEL_MEANS
#cv2.imwrite("transformed_image.jpg", image)
transformed_image = np.transpose(image, (2,0,1)) # C H W
# set net to batch size of 1
self.net.blobs['data'].reshape(1, 3, resized_h, resized_w)
#Run the net and examine the top_k results
self.net.blobs['data'].data[...] = transformed_image.astype(np.float32, copy=False)
start = time.time()
# Forward pass.
blobs_out = self.net.forward()
print('output_bbox_1st---',blobs_out['output_bbox_1st'].shape)
#print blobs_out
end = time.time()
cost_millis = int((end - start) * 1000)
print "detection cost ms: ", cost_millis
detect_final_boxes = []
for nn in range(self.num_outputs):
# detect_boxes = cell(num_cls, 1);
tmp = self.net.blobs[self.bbox_blob_names[nn]].data.copy() # if no need modify,then no need copy
print(self.bbox_blob_names[nn], tmp.shape)
#tmp = tmp.reshape((-1,5))
tmp = tmp[:,:,0,0]
tmp[:,1] /= hwRatios[1]
tmp[:,3] /= hwRatios[1]
tmp[:,2] /= hwRatios[0]
tmp[:,4] /= hwRatios[0]
# clipping bbs to image boarders
tmp[:, 1] = np.maximum(0,tmp[:,1])
tmp[:, 2] = np.maximum(0,tmp[:,2])
tmp[:, 3] = np.minimum(orgW,tmp[:,3])
tmp[:, 4] = np.minimum(orgH,tmp[:,4])
tmp[:, 3] = tmp[:, 3] - tmp[:, 1] + 1 # w
tmp[:, 4] = tmp[:, 4] - tmp[:, 2] + 1 # h
output_bboxs = tmp[:,1:]
tmp = self.net.blobs[self.cls_prob_blob_names[nn]].data
print(self.cls_prob_blob_names[nn], tmp.shape)
cls_prob = tmp.reshape((-1,self.num_cls+1))
tmp = self.net.blobs[self.proposal_blob_names[nn]].data.copy()
print(self.proposal_blob_names[nn], tmp.shape)
tmp = tmp[:,1:]
tmp[:, 2] = tmp[:, 2] - tmp[:, 0] + 1 # w
tmp[:, 3] = tmp[:, 3] - tmp[:, 1] + 1 # h
proposals = tmp
keep_id = np.where((proposals[:, 2] > 0) & (proposals[:, 3] > 0))[0]
proposals = proposals[keep_id,:]
output_bboxs = output_bboxs[keep_id,:]
cls_prob = cls_prob[keep_id,:]
detect_boxes = []
for i in range(self.num_cls):
cls_id = i + 1
prob = cls_prob[:, cls_id][:, np.newaxis] # 0 is background
#print (output_bboxs.shape, prob.shape)
bbset = np.hstack([output_bboxs, prob])
if self.det_thr > 0:
keep_id = np.where(prob >= self.det_thr)[0]
bbset = bbset[keep_id,:]
keep = self.cpu_nms_single_cls(bbset, self.nms_thresh)
if len(keep) == 0: continue
bbset = bbset[keep,:]
cls_ids = np.array([cls_id] * len(bbset))[:, np.newaxis]
#print "cls_ids.shape", cls_ids.shape, bbset.shape
detect_boxes.extend(np.hstack([cls_ids, bbset]).tolist())
print "detected box num: ", len(detect_boxes)
detect_boxes = np.asarray(detect_boxes)
if self.max_per_img > 0 and len(detect_boxes) > self.max_per_img:
rank_scores = detect_boxes[:, 5].copy()[::-1]
rank_scores.sort() # 'descend'
print len(rank_scores),self.max_per_img
print np.where(detect_boxes[:, 5] >= rank_scores[self.max_per_img])
keep_id = np.where(detect_boxes[:, 5] >= rank_scores[self.max_per_img])[0]
detect_boxes = detect_boxes[keep_id,:]
#detect_final_boxes.extend(detect_boxes.tolist())
detect_final_boxes.append(detect_boxes.tolist())
return detect_final_boxes
def cpu_nms_single_cls(self, dets, thresh):
"""Pure Python NMS baseline."""
x1 = dets[:, 0]
y1 = dets[:, 1]
w = dets[:, 2]
h = dets[:, 3]
scores = dets[:, 4]
x2 = x1 + w - 1
y2 = y1 + h - 1
# areas = (x2 - x1 + 1) * (y2 - y1 + 1)
areas = w * h
order = scores.argsort()[::-1]
keep = []
while order.size > 0:
i = order[0]
keep.append(i)
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
w = np.maximum(0.0, xx2 - xx1 + 1)
h = np.maximum(0.0, yy2 - yy1 + 1)
inter = w * h
ovr = inter / (areas[i] + areas[order[1:]] - inter)
inds = np.where(ovr <= thresh)[0]
order = order[inds + 1]
return keep
def main(args):
'''main '''
wordname_15 = ['__background__', 'plane', 'baseball-diamond', 'bridge', 'ground-track-field', 'small-vehicle', 'large-vehicle', 'ship', 'tennis-court',
'basketball-court', 'storage-tank', 'soccer-ball-field', 'roundabout', 'harbor', 'swimming-pool', 'helicopter']
wordname_5 = ['__background__', '1:plane', '2:ship', '3:storage', '4:harbor', '5:bridge']
# {cls_name: cls_id} # start from 1
#cls_ids = {k: idx+1 for idx, k in enumerate(wordname_15)}
detection = CaffeDetection(args.gpu_id,
args.model_def, args.model_weights,
cascade=args.cascade, FPN=args.FPN)
results = detection.detect(args.image_file)
#print(results)
img = Image.open(args.image_file)
draw = ImageDraw.Draw(img)
width, height = img.size
for item in results[len(results)-1]:# the 3rd_avg result
xmin = int(round(item[1]))
ymin = int(round(item[2]))
xmax = int(round(item[1] + item[3] - 1))
ymax = int(round(item[2] + item[4] - 1))
cls_id = int(item[0])
draw.rectangle([xmin, ymin, xmax, ymax], outline=(255, 0, 0))
draw.text([xmin, ymin], str(cls_id), (0, 0, 255))
print [cls_id, xmin, ymin, xmax, ymax, round(item[-1]*1000)/1000]
img.save('detect_result.jpg')
def parse_args():
'''parse args'''
parser = argparse.ArgumentParser()
parser.add_argument('--gpu_id', type=int, default=0, help='gpu id')
parser.add_argument('--model_def',
default='models/deploy.prototxt')
parser.add_argument('--cascade', default=0, type=int)
parser.add_argument('--FPN', default=0, type=int)
parser.add_argument('--model_weights',
default='models/models_iter_120000.caffemodel')
parser.add_argument('--image_file', default='examples/images/fish-bike.jpg')
return parser.parse_args()
if __name__ == '__main__':
main(parse_args())
from cascade-rcnn.
@Shadow992
I am participating in a tank detection compitition and there are 189 classes in the dataset. I tried to test the effect of the rpn on some picture and found that the boxxes the rpn provided were not so bad, at least the high score bboxes were around the ground true. So I guess the rpn network works. It seems strange to me that the classfication performs well on stage 1 but performs bad on stage 2. Since there are too many classes in the dataset, I want to try to reduce the training classes to see the effect on the subset.
If you find any solution to this problem, please inform me. Thanks!
from cascade-rcnn.
@huinsysu Can we stay in closer contact? E.g. by using Skype/Discord or similar? So we can update each other on a regular basis? I would also be highly interested in solving this issue...
You can simply write me a mail for contact, if you want to:
Removed Email
from cascade-rcnn.
@Shadow992 thank u for the python inference code.
from cascade-rcnn.
@Shadow992 I test the python inference code you provided, however, i find it can't get the same good result as the matlab inference code. I just test the author provided trained model.
do you know why ?
from cascade-rcnn.
@Shadow992 like this , i use the same model, different inference code. I only detect people.
using the matlab inference code, result is 4.
using the python inference code,reuslt is 5.
this is one of the differences. Can you help me to solve this problem.
from cascade-rcnn.
You probably have to finetune parameters and change algorithms. Maybe apply some post processing. However as I am not the author and this is beyond the issues. I suggest to close it and play around by yourself with code.
from cascade-rcnn.
en..... @Shadow992 glad to receive your reply.
one more question, i find if i use the matlab inference code and the python inference code in the same image, the outputs are different, especially the score of the confidence.
I can not understand although i use the same model on the same image, the output are different.
just beacause of the different of matlab or python interface ?
I feel really confused. looking forward to your reply.
from cascade-rcnn.
@Shadow992
like this output result.
in the matlab:
detect_boxes =
1.0e+03 *
0.0010 0.0010 0.0411 0.2543 0.1361 0.1713 0.0010
0.0010 0.0010 1.0196 0.5097 0.1098 0.1028 0.0010
0.0010 0.0010 0.8632 0.4844 0.1704 0.1621 0.0010
0.0010 0.0010 0.4153 0.2646 0.1221 0.1774 0.0009
0.0010 0.0010 0.8185 0.3617 0.0951 0.1131 0.0009
0.0010 0.0010 0.5891 0.4763 0.1830 0.2173 0.0009
0.0010 0.0010 0 0.0624 0.0936 0.1204 0.0009
0.0010 0.0010 0.7620 0.4800 0.1422 0.1488 0.0009
0.0010 0.0010 0.5406 0.2861 0.1220 0.1924 0.0009
0.0010 0.0010 0.4313 0.1939 0.1162 0.1401 0.0009
0.0010 0.0010 0.7104 0.3299 0.1051 0.1452 0.0009
0.0010 0.0010 0.6834 0.2532 0.0991 0.1096 0.0009
0.0010 0.0010 0.2715 0.1798 0.1358 0.3119 0.0008
0.0010 0.0010 0.5974 0.2326 0.0972 0.1194 0.0007
in the python:
[1.0, 880.21435546875, 481.3125305175781, 157.732666015625, 167.05300903320312, 0.9509992599487305]
[1.0, 17.491165161132812, 251.82504272460938, 161.8278350830078, 180.31094360351562, 0.8915714025497437]
[1.0, 1019.8265991210938, 510.274169921875, 117.43792724609375, 108.07086181640625, 0.8827307224273682]
[1.0, 3.20751953125, 62.82538986206055, 89.68313598632812, 127.01773071289062, 0.8688411116600037]
[1.0, 413.26153564453125, 263.39776611328125, 172.83038330078125, 176.29830932617188, 0.7869312763214111]
[1.0, 872.2628784179688, 398.31329345703125, 47.70599365234375, 77.80911254882812, 0.7195702791213989]
[1.0, 482.8349304199219, 190.6676788330078, 141.91409301757812, 143.6922149658203, 0.6592983603477478]
[1.0, 678.8290405273438, 250.62388610839844, 103.9158935546875, 187.9010467529297, 0.6096863746643066]
[1.0, 417.4122009277344, 269.6662902832031, 256.9304504394531, 228.03402709960938, 0.5993428826332092]
[1.0, 543.610595703125, 287.9579772949219, 129.0999755859375, 187.22341918945312, 0.5436170697212219]
[1.0, 603.4822998046875, 230.12550354003906, 91.89520263671875, 231.51707458496094, 0.5290579199790955]
[1.0, 1256.5042724609375, 627.1253051757812, 23.7042236328125, 65.6826171875, 0.4823896884918213]
the score of confidence is
[1.0, 691.6392211914062, 254.77044677734375, 87.100341796875, 107.69497680664062, 0.47900158166885376]
[1.0, 641.8724365234375, 228.37530517578125, 102.33233642578125, 231.7137451171875, 0.4411845803260803]
[1.0, 634.2685546875, 228.98475646972656, 55.02569580078125, 123.33226013183594, 0.42556729912757874]
[1.0, 584.3050537109375, 463.4409484863281, 319.29437255859375, 186.97891235351562, 0.40026265382766724]
especailly the last, the score of confidence.
from cascade-rcnn.
@pyupcgithub
As mentioned earlier: I am not the author of this code and I am barely coding in Matlab/Python. However this looks like Python is applying a different kind of NMS or sets IoU different. Just play around with the parameters. I am not able to help you on this problem, sorry.
from cascade-rcnn.
@Shadow992 yes, i think you are right.
anyway, thank you .
from cascade-rcnn.
Related Issues (20)
- have you try stage with IoU thr = 0.8 HOT 1
- Decodebbox layer
- cascade rcnn真的增加了正样本的数量吗 HOT 1
- Where can I find a information about stage weight in the original paper? HOT 1
- hello HOT 1
- rpn foreground accuracy is always 0 HOT 3
- Any Dockerfile?
- inference process HOT 1
- Check failed: error == cudaSuccess (8 vs. 0) invalid device function
- Message type "caffe.LayerParameter" has no field named "box_group_output_param". HOT 2
- model size HOT 1
- problem when compile the code.
- how to test one img with c++
- Where is your cascade loss function?
- trianing res101-15s-800-fpn-cascade problem·
- How much memory of GPU is needed for inference?
- bbox_std value in different stages
- When do the test inference, does the model hold the proposals from all the three stage output,or just take the output from the last stage headers?
- loss不下降 HOT 4
- how to download pretrained model ?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cascade-rcnn.