Paper-Reading

R-CNN & Fast R-CNN & Faster R-CNN

tags: Deep Learning, Computer Vision, Detection, CVPR 2014, ICCV 2015

R-CNN: Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Paper:http://www.cs.berkeley.edu/~rbg/#girshick2014rcnn
Tech report: http://arxiv.org/pdf/1311.2524v5.pdf
Project:https://github.com/rbgirshick/rcnn
Slides: http://www.cs.berkeley.edu/~rbg/slides/rcnn-cvpr14-slides.pdf

Referrence: a blog

object detection system

Three modules:

  1. Generate region proposals (~2k/image)
  2. Compute CNN features
  3. Classify regions using linear SVM

R-CNN at test time

Training R-CNN

Fast R-CNN

Paper: http://arxiv.org/pdf/1504.08083v1.pdf

Project: https://github.com/rbgirshick/fast-rcnn

Referrence: blog

Motivation

Drawback of R-CNN and the modification:

  1. Training is a multi-stage pipeline. -> End-to-end joint training.
  2. Training is expensive in space and time. -> Convolutional layer sharing. Classification in memory.
    For SVM and regressor training, features are extracted from each warped object proposal in each image and written to disk.(VGG16, 5k VOC07 trainval images : 2.5 GPU days). Hundreds of gigabytes of storage.
  3. Test-time detection is slow. -> Single scale testing, SVD fc layer.
    At test-time, features are extracted from each warped proposal in each img. (VGG16: 47s / image).

Contributions:

  1. Higher detection quality (mAP) than R-CNN
  2. Training is single-stage, using a multi-task loss
  3. All network layers can be updated during training
  4. No disk storage is required for feature caching

Fast R-CNN training

Fast R-CNN detection

Faster R-CNN

Paper: http://arxiv.org/abs/1506.01497
Caffe Project: https://github.com/ShaoqingRen/caffe

Reference: blog1 blog2

Region Proposal Networks

RPN input: image of any size, output: rectangular object proposals with objectness score