目录
1.ROIAlign层
2.Mask分支
3.整体框架回顾
1.ROIAlign层
在上文中,我们现在手里已经有了正负样本数据以及它们对应的标签。接下来我们就要进行预测的操作了!
但在预测之前,还有些小问题:
①每个ROI大小也不一样,在哪个特征图中也不知道。
②应该把每个ROI中的特征都提取出来,连接一些全连接层实现预测分类功能。
我们应该找到每个ROI所在的特征图(P?)、以及所在特征图中的区域。再接着对这个特征图连上一些全连接层得到分类结果和回归参数的预测。由于全连接层的限制(输入维度相同),因此我们必须将所有ROI转换成相同大小。
ROIPooing和ROIAlign的区别详见我的MaskRCNN部分博客。
# Network Heads # TODO: verify that this handles zero padded ROIs mrcnn_class_logits, mrcnn_class, mrcnn_bbox =\ fpn_classifier_graph(rois, mrcnn_feature_maps, config.IMAGE_SHAPE, config.POOL_SIZE, config.NUM_CLASSES)
mrcnn_feature_maps = P2 - P6特征图。
############################################################ # Feature Pyramid Network Heads ############################################################ def fpn_classifier_graph(rois, feature_maps, image_shape, pool_size, num_classes): """Builds the computation graph of the feature pyramid network classifier and regressor heads. rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized coordinates. feature_maps: List of feature maps from diffent layers of the pyramid, [P2, P3, P4, P5]. Each has a different resolution. image_shape: [height, width, depth] pool_size: The width of the square feature map generated from ROI Pooling. num_classes: number of classes, which determines the depth of the results Returns: logits: [N, NUM_CLASSES] classifier logits (before softmax) probs: [N, NUM_CLASSES] classifier probabilities bbox_deltas: [N, (dy, dx, log(dh), log(dw))] Deltas to apply to proposal boxes """ # ROI Pooling # Shape: [batch, num_boxes, pool_height, pool_width, channels] x = PyramidROIAlign([pool_size, pool_size], image_shape, name="roi_align_classifier")([rois] + feature_maps) # Two 1024 FC layers (implemented with Conv2D for consistency) x = KL.TimeDistributed(KL.Conv2D(1024, (pool_size, pool_size), padding="valid"), name="mrcnn_class_conv1")(x) x = KL.TimeDistributed(BatchNorm(axis=3), name='mrcnn_class_bn1')(x) x = KL.Activation('relu')(x) x = KL.TimeDistributed(KL.Conv2D(1024, (1, 1)), name="mrcnn_class_conv2")(x) x = KL.TimeDistributed(BatchNorm(axis=3), name='mrcnn_class_bn2')(x) x = KL.Activation('relu')(x) shared = KL.Lambda(lambda x: K.squeeze(K.squeeze(x, 3), 2), name="pool_squeeze")(x) # Classifier head mrcnn_class_logits = KL.TimeDistributed(KL.Dense(num_classes), name='mrcnn_class_logits')(shared) mrcnn_probs = KL.TimeDistributed(KL.Activation("softmax"), name="mrcnn_class")(mrcnn_class_logits) # BBox head # [batch, boxes, num_classes * (dy, dx, log(dh), log(dw))] x = KL.TimeDistributed(KL.Dense(num_classes * 4, activation='linear'), name='mrcnn_bbox_fc')(shared) # Reshape to [batch, boxes, num_classes, (dy, dx, log(dh), log(dw))] s = K.int_shape(x) mrcnn_bbox = KL.Reshape((s[1], num_classes, 4), name="mrcnn_bbox")(x) return mrcnn_class_logits, mrcnn_probs, mrcnn_bbox
①进行ROIAlign操作:计算ROI对应的feature_map属于哪一层并将每个ROI进行ROIAlign操作使其缩放到同一尺寸(7*7)。
经过ROI Align操作后,和传统的Faster R-CNN做法相同了:
展平处理得到1024维的特征,将其BatchNorm及ReLU处理:
x = KL.TimeDistributed(KL.Conv2D(1024, (pool_size, pool_size), padding="valid"), name="mrcnn_class_conv1")(x) x = KL.TimeDistributed(BatchNorm(axis=3), name='mrcnn_class_bn1')(x) x = KL.Activation('relu')(x) x = KL.TimeDistributed(KL.Conv2D(1024, (1, 1)), name="mrcnn_class_conv2")(x) x = KL.TimeDistributed(BatchNorm(axis=3), name='mrcnn_class_bn2')(x) x = KL.Activation('relu')(x)
预测目标概率分数:
# Classifier head mrcnn_class_logits = KL.TimeDistributed(KL.Dense(num_classes), name='mrcnn_class_logits')(shared) mrcnn_probs = KL.TimeDistributed(KL.Activation("softmax"), name="mrcnn_class")(mrcnn_class_logits)
这里由于是气球数据集,因此预测类别只有两类(是气球、不是气球),在COCO数据集里面这里就是81了。再通过softmax层将预测分数转成目标概率。
预测边界框回归参数:
# BBox head # [batch, boxes, num_classes * (dy, dx, log(dh), log(dw))] x = KL.TimeDistributed(KL.Dense(num_classes * 4, activation='linear'), name='mrcnn_bbox_fc')(shared) # Reshape to [batch, boxes, num_classes, (dy, dx, log(dh), log(dw))] s = K.int_shape(x) mrcnn_bbox = KL.Reshape((s[1], num_classes, 4), name="mrcnn_bbox")(x)
对于每个类别(2)均预测了回归边界框的四个参数。
向上层函数返回每个边界框的预测分数、概率、偏移量的指标值。
2.Mask分支
和上文差不多:
def build_fpn_mask_graph(rois, feature_maps, image_shape, pool_size, num_classes): """Builds the computation graph of the mask head of Feature Pyramid Network. rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized coordinates. feature_maps: List of feature maps from diffent layers of the pyramid, [P2, P3, P4, P5]. Each has a different resolution. image_shape: [height, width, depth] pool_size: The width of the square feature map generated from ROI Pooling. num_classes: number of classes, which determines the depth of the results Returns: Masks [batch, roi_count, height, width, num_classes] """ # ROI Pooling # Shape: [batch, boxes, pool_height, pool_width, channels] x = PyramidROIAlign([pool_size, pool_size], image_shape, name="roi_align_mask")([rois] + feature_maps) # Conv layers x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"), name="mrcnn_mask_conv1")(x) x = KL.TimeDistributed(BatchNorm(axis=3), name='mrcnn_mask_bn1')(x) x = KL.Activation('relu')(x) x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"), name="mrcnn_mask_conv2")(x) x = KL.TimeDistributed(BatchNorm(axis=3), name='mrcnn_mask_bn2')(x) x = KL.Activation('relu')(x) x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"), name="mrcnn_mask_conv3")(x) x = KL.TimeDistributed(BatchNorm(axis=3), name='mrcnn_mask_bn3')(x) x = KL.Activation('relu')(x) x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"), name="mrcnn_mask_conv4")(x) x = KL.TimeDistributed(BatchNorm(axis=3), name='mrcnn_mask_bn4')(x) x = KL.Activation('relu')(x) x = KL.TimeDistributed(KL.Conv2DTranspose(256, (2, 2), strides=2, activation="relu"), name="mrcnn_mask_deconv")(x) x = KL.TimeDistributed(KL.Conv2D(num_classes, (1, 1), strides=1, activation="sigmoid"), name="mrcnn_mask")(x) return x
在上文的ROI Align中,我们是将ROI统一大小为7*7的,在这里,我们debug一下:
是14*14的。
回想一下我们的标签:是28*28的。我们应该也把我们的预测结果做成28*28大小的,才能和标签对应起来才能用sigmoid交叉熵计算每一个像素的0/1值是否匹配。
这里每个ROI经过4个3*3卷积,这里由于参数padding = same,ROI大小不变。
连接一个转置卷积的操作,每个ROI大小变成了28*28。
最后再连接一个1*1的卷积层,输出mask信息。
最后的结果是28*28*2,2代表类别个数(气球、非气球)即每一个ROI我们计算出2个mask信息。(对于COCO数据集每一个ROI计算81个mask信息)