CoAE源码分析

侯德柱

数据加载

lib/roi_data_layer/roibatchLoader.py

1	class roibatchLoader(data.Dataset):

__init__

def __init__(self,
             roidb: List[Dict[str, Any]],
             ratio_list: np.ndarray,
             ratio_index: np.ndarray,
             query: List[Dict[int, Dict[str, Any]]],
             batch_size: int,
             num_classes: int,
             training: bool = True,
             normalize=None,
             seen: bool = True):

成员变量
- self._cat_ids 在COCO中为1~90
- self._classes字典，1_80到190映射
- self.list
  注意，filter只后的self.list应该等于imdb.list
中间过程
- 最后调用filter和probability

__getitem__
1
def __getitem__(self, index: int) -> (torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, int):
- 中间过程
  - blob 即下面get_minibatch的返回值
  - query: np.ndarray 1x128x128x3
    在138行变成torch.Tensor
    之后换成1x3x128x128，并squeeze3x128x128
  - data是blob['data']变成Tensor，dtype=torch.float32，shape=[1,600,960,3]，已标准化
- 返回值（训练）
  - padding_data type=torch.float32 shape=[3,600,906]
  - query torch.float32 shape=[3,128,128]
  - im_info e.g.
    1
    torch.Tensor([600,906,1.4151])
  - gt_boxes_padding type=torch.float32 shape=[50x5]
  - num_bboxes 1
self.filter

lib/roi_data_layer/minibatch.py

get_minibatch
1
def get_minibatch(roidb, num_classes)-> Dict[str, Any]:
- 返回值
  key type 备注
  data np.ndarray shape=(1,600,800,3)读取的图片 float32
  gt_boxes List[np.ndarray] 长度为nx5的向量(x,y,w,h,class) float32
  im_info np.ndarray shape=(1,3) float32
  img_id int

key	type	备注
data	np.ndarray	shape=(1,600,800,3)读取的图片 float32
gt_boxes	List[np.ndarray]	长度为nx5的向量(x,y,w,h,class) float32
im_info	np.ndarray	shape=(1,3) float32
img_id	int

lib/roi_data_layer/roidb.py

query类型 List[Dict[int, Dict[str, Any]]]

roidb类型 List[Dict[str, Any]]

combined_roidb

1 2	def combined_roidb(imdb_names: str, training=True, seen=1)\ -> (imdb_, List[Dict[str, Any]], np.ndarray, np.ndarray, List[Dict[int, Dict[str, Any]]])::

参数
- imdb_names 数据集名称，为"coco_2017_train","voc_2007_trainval"等，以+分割不同的数据集
返回值
- imdb

get_training_roidb
1
def get_training_roidb(imdb, training):
为combined_roidb的子函数

get_roidb

1 2	def get_roidb(imdb_name: str, training: bool) \ -> (imdb_, List[Dict[str, Any]], List[Dict[int, Dict[str, Any]]], List[int]):

为combined_roidb的子函数

参数
- imdb_name 数据集名称，为"coco_2017_train","voc_2007_trainval"等
过程
1
imdb.set_proposal_method(cfg.TRAIN.PROPOSAL_METHOD)
如果按照cfgs的默认配置，该句会使self.roidb_handler=self.gt_roidb

返回值

imdb
roidb : 见下prepare_roidb

cat_data 长度为81，key从0~80，每个value为一个List保存coco_class_ind对应的图片的bbox和图片路径，即query

e.g.

{
    0:[]

    1:[{
        'boxes': [339.88, 22.16, 492.76, 321.89000000000004], 
        'image_path': '<path to image>\\000000391895.jpg'
    }, {
        'boxes': [471.64, 172.82, 506.56, 219.92], 
        'image_path': '<path to image>\\000000391895.jpg'
    }, 
        ...
    ]

    2:[
        ...
    ]
}

imdb.inverse_list长度为60，为选中的coco_cat_id(1~90)

prepare_roidb

1	def prepare_roidb(imdb) -> List[Dict[str, Any]]:

返回的imdb.roidb变成这样

[
    {'width': 640, 
     'height': 360, 
     'boxes': array([
       [359, 146, 470, 358],
       [339,  22, 492, 321],
       [471, 172, 506, 219],
       [486, 183, 515, 217]], dtype=uint16), 
     'gt_classes': array([4, 1, 1, 2]),
     'gt_overlaps': <4x81 sparse matrix of type '<class 'numpy.float32'>'
	with 4 stored elements in Compressed Sparse Row format>, 
     'flipped': False, 
     'seg_areas': array(
         [12190.445  , 14107.271  ,   708.26056,   626.9852 ], dtype=float32), 
     'img_id': 391895, 
     'image': '<path to image>\\000000391895.jpg', 
     'max_classes': array([4, 1, 1, 2], dtype=int64), 
     'max_overlaps': array([1., 1., 1., 1.], dtype=float32)}
 ...
]

rank_roidb_ratio
1
def rank_roidb_ratio(roidb: List[Dict[str, Any]])->(np.ndarray,np.ndarray):
针对roidb中的每一个图像，在ratio_list中记录宽/高。返回按ratio从小到大排序号的ratio_list，ratio_index，长度为训练集图像的总数，第一个dtype为float，第二个dtype为int64
test_rank_roidb_ratio
测试时调用

trainval_net.py

sampler

1	class sampler(Sampler):

继承自torch.utils.data.sampler.Sampler

数据集

lib/datasets/imdb.py

imdb

1 2	class imdb(object): """Image database."""

__init__
1
def __init__(self, name: str, classes=None):
- 参数
  - classes在所有的调用中为None
- 成员变量
  - self._roidb_handler
  - self._classes 在子类中实现
  - self.roidb是self._roidb_handler调用后的值
  - self.roidb_handler 在coco.py中等于self.gt_roidb，在roidb.py中等于self.gt_roidb
  - self.cat_data 就是query

lib/datasets/coco.py

coco

__init__

1	def __init__(self, image_set: str, year: str):

参数
- image_set 为"train" "val" "test"
- year为"2014" "2017"

成员变量

self._data_path 该路径下面应该有annotions和images两个文件夹
self._roidb_handler
self._classes Tuple[(str)] 包括background
self._class_to_ind 类别名到0~80的映射
self._class_to_coco_cat_id 类别名到1-90映射
self.coco_cat_id_to_class_ind 1_90到180的映射
self.coco_class_ind_to_cat_id 1_80到190映射
总之 class代表类别名 coco_class_ind为1~80 coco_cat_id为1~90
self._image_index =COCO.getImgIds()
self._gt_splits = ('train', 'val', 'minival')

self.reference_image Dict[int,Dict[int,Dict[str,Any]]]

图像检测结果字典，为每一张图像id为key，其的检测bbox的与实际对比的结果

e.g.

{
    391895: {
        1: {
            'category_id': 1, 
            'category_name': 'person', 
            'score': 0.9975148439407349, 
            'iou': 0.7170171752233037
        }, 
        2: {
            'category_id': 1, 
            'category_name': 'person', 
            'score': 0.9939727187156677, 
            'iou': 0.8616664860812628
        }, 
        0: {'category_id': 4, 
            'category_name': 'motorcycle', 
            'score': 0.9700725078582764, 
            'iou': 0.8821412520463373
        }
    }, 
    522418: {
        0: {
            'category_id': 1, 
            'category_name': 'person', 
            'score': 0.9950047135353088, 
            'iou': 0.8602898932977073
        }, 
        1: {
            'category_id': 49, 
            'category_name': 'knife', 
            'score': 0.9600722193717957, 
            'iou': 0.8965148825297287
        }, 
        2: {
            'category_id': 65, 
            'category_name': 'bed', 
            'score': 0.8499319553375244, 
            'iou': 0.910496123229155
        }
    }, 
  184613: {
        ...
  }
    ...
}

self.cat_data Dict[int,List] key的范围1~90
self.list 长度为60，是选出来用于训练的coco的cat_id，最大值90

filter
1
def filter(self, seen: int = 1):
根据命令行参数seen的值决定哪些类可见
- 中间变量
  - cfg.train_categories 和cfg.test_categories**均为[1] self.list开始为[1]
  - self.list
    上面两个cfg变量长度为1时self.list每4个数中在1_{80中取一个，保存的是cat_id的值(1}90)
  - self.inverse_list是根据上面的self.list保存1~80的值
  - all_index 开始为0~所有图片的数量，先删掉类别在self.list中的图片
  - 在此调用self.gt_roidb self.roidb就是self.gt_roidb的返回值
  - 最后在self._image_index和self.roidb中删掉all_index中的图片，其保留的图片就是含有self.list中的
  - 返回到roidb.py 的get_roidb

gt_roidb

1	def gt_roidb(self)->List[Dict[str, Any]]

修改成员变量self.cat_data即query

返回值

e.g.

[
    {
        'width': 640, 
        'height': 360, 
        'boxes': array([
           [359, 146, 470, 358],
           [339,  22, 492, 321],
           [471, 172, 506, 219],
           [486, 183, 515, 217]], dtype=uint16), 
        'gt_classes': array([4, 1, 1, 2]), 
        'gt_overlaps': <4x81 sparse matrix of type <class numpy.float32>
        with 4 stored elements in Compressed Sparse Row format>, 
        'flipped': False, 
        'seg_areas': array(
           [12190.445  , 14107.271  , 708.26056,   626.9852 ], dtype=float32)
    },
	...
]

gt_classes的序号为1~80

_load_coco_annotation
1
def _load_coco_annotation(self, index):
生成roidb和self.cat_data即query
_get_ann_file(self)
1
def _get_ann_file(self):
根据"train" "val" "test"生成对应文件的路径

lib/datasets/factory.py

get_imdb
1
def get_imdb(name:str)->imdb:
- 数据集名，为"coco_2017_train","voc_2007_trainval"等

自定义数据集

以coco为例

修改coco.py中image_path_from_index读取图片的路径
修改coco.py中_get_ann_file读取ann_file的路径
在test_val.py中加对应的数据集名称
在factory.py中加__set[name]

数据处理

lib/model/util/blob.py

crop
1
def crop(image: np.ndarray, purpose: List[float], size: int) -> np.ndarray:
裁剪图像
- 参数
  - image为np.ndarray
  - purpose为要裁剪的位置
  - size是裁剪后要resize的大小
prep_im_for_blob
1
def prep_im_for_blob(im: np.ndarray, pixel_means: np.ndarray, target_size: int, max_size: int) -> (np.ndarray, float):
图片归一化并标准化
- 参数
  有用到的是im和target_size，target_size = 108
- 返回值
  - 处理后的图像
  - 放大倍数，没用到

im_list_to_blob

def im_list_to_blob(ims: List[np.ndarray])->np.ndarray:
    """Convert a list of images into a network input.

    Assumes images are already prepared (means subtracted, BGR order, ...).

将List[np.ndarray]转成np.ndarray，其shape[0]为输入的list长度

返回值
- shape为(1x128x128x3)，第一维是图片数量

训练

在coco训练的时候dataset.list和mdb.list为
[2, 3, 4, 6, 7, 8, 10, 11, 13, 15, 16, 17, 19, 20, 21, 23, 24, 25, 28, 31, 32, 34, 35, 36, 38, 39, 40, 42, 43, 44, 47, 48, 49, 51, 52, 53, 55, 56, 57, 59, 60, 61, 63, 64, 65, 70, 72, 73, 75, 76, 77, 79, 80, 81, 84, 85, 86, 88, 89, 90]
在coco测试的时候
[2, 6, 10, 15, 19, 23, 28, 34, 38, 42, 47, 51, 55, 59, 63, 70, 75, 79, 84, 88]

魔改

roibatchLoader.filter(seen)

我不知道roibatchLoader的self.filter(seen)有什么用，在coco上面测试
1
2
assert dataset.list == imdb.list, (dataset.list, imdb.list)
assert dataset.list_ind == imdb.inverse_list, (dataset.list, imdb.list)
均能通过
故将imdb作为形参，在roibatchLoader初始化的时候调用，取代filter以初始化list,list_ind

网络

oneshotbase.py

matchnet

__init__.py
1
def __init__(self, inplanes: int):
- 参数
  - inplanes赋值给self.in_channels在coco中为1024
- 成员变量
  - self.in_channels=1024（coco）
  - self.inter_channels=self.in_channels//2（in coco = 512）
  - self.g 输入self.in_channels 输出self.inter_channel的1x1卷积
  - self.W输入self.inter_channels输出self.in_channels的1x1卷积，带batch_normalize
  - self.Q同self.W
  - self.theta self.phi 输入self.in_channels输出self.inter_channels的1x1卷积
forward
1
def forward(self, detect: torch.Tensor, aim: torch.Tensor):
- 参数
  - detect是输入图像（经过resnet提特征后）size=(1,1024,38,w) 对应batch_size channels h w
  - aim是查询图像（经过resnet提特征后）size=(1,1024,8,8) 对应batch_size channels h w
- 中间变量
  - d_x 为 detect经过self.g并变形之后size=(1,38×w ,512) 2546=38×w w={57 38 50}
  - a_x 为 aim经过self.g并变形之后size=(1,64,512) 64=8×8
  - theta_x为aim经过self.theta并变形后 size=(1,64,512)
  - phi_x为detect经过self.phi并变形后size=(1,512,38×w)
  - f=matmul(theta_x, phi_x) size=(1,64,38×w)矩阵乘法
  - f_div_C为f除以第二个维度长度，并变形为(1,64,38×w)
  - fi_div_C为f除以最后维度长度，并变形为(1,38×w,64)
  - non_aim ： non-local aim size=(1,1024,8,8) =matmul(f_div_C,d_x)+aim
  - non_det ： non-local detect size=(1,1024,38,w) =matmul(fi_div_C,a_x)+detect。对应F(I)
  - c_weight：size(1,1024,1,1)见下ChannelGate
  - act_aim：=non_aim * c_weight size(1,1024,8,8) activation
  - act_det：=non_det * c_weight size(1,1024,38,w) activation \[ non\_aim=W(\theta(aim)\times\phi(dectect)/(38\times w)\times g(detect))+aim\\ non\_det=Q(\theta(aim)\times\phi(dectect)/64\times g(aim))+dectect\\ act\_aim=c\_weight*non\_aim\\ act\_det=c\_weight*non\_det \]
    \[ non\_det是论文中的F(I)\\ non\_aim是论文中的F(p)\\ act\_det是论文中的\tilde{F}(I)\\ act\_aim是论文中的\tilde{F}(p) \]
- 返回值
  non_det, act_det, act_aim, c_weight

OneShotBase

__init__.py
1
def __init__(self, classes:Tuple[str], class_agnostic:bool):
- 参数
  - classes就是imdb.classes。包括__background__
  - class_agnostic类别不可知=False
- 成员变量
  - RCNN_base 最开始的那个特征提取网络
  - RCNN_bbox_pred 2048->4的全连接网络
  - RCNN_cls_score 2N->8->2的那个网络
  - RCNN_proposal_target不参与反传，这层接到rpn_proposal_layer给的proposal,主要完成proposal跟gt的配对,然后传给loss进行学习
  - triplet_loss = torch.nn.MarginRankingLoss \[ loss(x1,x2,y)=max(0,-y*(x1-x2)+margin) \]
  - _head_to_tail 在resnet中是一个resnet layer4，感觉是roi pool之后RCNN提取特征的，roi pool的结果和query生成的~F(p)都经过
forward
1
def forward(self, im_data:torch.Tensor, query:torch.Tensor, im_info:torch.Tensor, gt_boxes:torch.Tensor, num_boxes:torch.Tensor):
- 参数
  - im_data图片，size=(1,3,600,600)，猜测为(batch_size, channel, height, width)
  - query查询图片，size=(1, 3, 128, 128)
  - im_info=(600,600,0.9804) size=(1, 3) 在RCNN_RPN中使用
  - gt_boxes size=(1,50,5)
  - num_boxes size=1
- 中间变量
  1. 孪生网络
  - detect_feat： detect feature为im_data过了self.RCNN_base的torch.Tensor shape=(1,1024,38,38)
  - query_feat： query_feature为qurey过滤self.RCNN_base的torch.Tensor shape=(1,1024,8,8)
  1. non-local feature
  - rpn_feat：为target Image的Non-local Feature size=(1,1024,57,38)
  - act_feat：为target Image激活后的图~F(I) size=(1,1204,57,38)
  - act_aim：为query Image激活后的图~F(p) size=(1,1204,8,8)
  - c_weight：联合注意的权重 size=(1,1204,1,1)
  1. rpn
  - rois size=(1,2000,5)
  - rpn_loss_cls=0.6905 标量 requires_grad=True
  - rpn_loss_bbox=0.0162标量 requires_grad=True
  1. self.RCNN_proposal_target 仅在训练中有
  - rois size=(1,128,5)
  - rois_label size= (1,128)
  - rois_target size= (1,128.4)
  - rois_inside_ws, size= (1,128,4)
  - rois_outside_ws size= (1,128,4)
  1. self.RCNN_roi_align
  - pooled_feat size=(128, 1024, 7, 7)
  1. head_to_tail
  - pooled_feat size=(128,2048)
  - query_feat size=(1,2048)
  1. bbox_pred
  - size=(128,4)
  1. cat 拼接roi和查询图像过R-CNN后的特征图pooled_feat 和query_feat
  - pooled_feat size=(128,4096)
    猜测拼接后的结果是
    roi[0:2048] query[2048,4096]
    0 roi0 query
    1 roi1 query
    128 roi128 query
    bbox的损失用的拼接前的结果，score的损失用的拼接后的结果
  1. RCNN_cls_score
  - score size=(128,2)
  - score_prob size=128 为score softmax后结果
  1. R-CNN Loss cls
  - RCNN_loss_cls是score, rois_label的交叉熵
  1. margin_loss
  - score_label size变为(1,128)
  - score_prob size变为(1,128)
  - gt_map=|score_label-score_label| 真值ground_turh size=(1,128,128)
  - pr_map=|score_prob-score_prob| 预测的predict size=(1,128,128)
  - target=-gt_map^2+3gt_map+1
  - margin_loss=3*self.triplet_loss()
  1. RCNN_loss_bbox
  - RCNN_loss_bbox smooth-L1函数
- 返回值
  - rois
  - cls_prob
  - bbox_pred
  - rpn_loss_cls 用于反传
  - rpn_loss_bbox, 用于反传
  - RCNN_loss_cls, 用于反传
  - margin_loss, 用于反传
  - RCNN_loss_bbox，用于反传
  - rois_label
  - c_weight

	roi[0:2048]	query[2048,4096]
0	roi0	query
1	roi1	query
128	roi128	query

oneshotresnet.py

lib/model/utils/net_utils.py

ChannelGate

用于squeeze

__init__
1
def __init__(self, gate_channels: int, reduction_ratio: int = 16, pool_types: List[str] = ['avg', 'max'])
- 参数
  - gate_channels：=1024(coco)
- 成员变量
  - self.mlp 输入为gate_channels->gate_channels/16->ReLU->gate_channels网络编码解码器
forward
- 参数
  - x 是non_aim size=(1,1024,8,8)
- 返回值
  size=(1,1024,1,1)
- 过程
  1. x全局平均池化
  2. self.mlp
  3. x全局最大池化
  4. self.mlp
  5. 4加上2的结果
  6. sigmoid并返回

lib/model/rpn/rpn.py

_RPN

__init__
1
def __init__(self, din: int):
参数为通道数，在coco中为1024
forward
- 参数，均为torch.Tensor
  - base_feat 为target Image的Non-local Feature size=(1,1024,57,38)
  - im_info size=(1,3) =[[900,600,1.4052]]
  - gt_boxes size=(1,50,5)
  - num_boxes boxes的数量size=1
- 返回值均为torch.Tensor
  - rois size=(1,2000,5)
  - rpn_loss_cls=0.6905 标量 requires_grad=True
  - rpn_loss_bbox=0.0162标量 requires_grad=True

lib/model/rpn/proposal_target_layer_cascade.py

_ProposalTargetLayer

__init__
1
def __init__(self, nclasses: int):
forward
- 参数，均为torch.Tensor
  - rois size=(1,2000,5)
  - gt_boxes size=(1,50,5)
  - num_boxes boxes的数量size=1
- 返回值均为torch.Tensor
  - rois size=(1,128,5)
  - labels size= (1,128)
  - bbox_targets size= (1,128.4)
  - bbox_inside_weights, size= (1,128.4)
  - bbox_outside_weights size= (1,128.4)