1. nuScenes Dataloader

对nuScenes数据集处理的了解，大体上的核心还是getitem函数、prepare_data函数，以及collate_batch函数三个部分的处理。其中prepare_data函数和collate_batch函数是在Dataset这个父类实现的，基本的处理流程基本不变，这里记录data_dict的数据处理部分。

1.1 NuScenesDataset初始化

在初始化的步骤中，进行了两个重要的操作，一个是加载了在create_nuscenes_infos 函数处理后的pkl文件，另外一个就是对类别不平衡进行了重采样的操作。
ps：如果构建出这个pkl文件在后续会介绍到

class NuScenesDataset(DatasetTemplate):
    def __init__(self, dataset_cfg, class_names, training=True, root_path=None, logger=None):
        root_path = (root_path if root_path is not None else Path(dataset_cfg.DATA_PATH)) / dataset_cfg.VERSION
        super().__init__(
            dataset_cfg=dataset_cfg, class_names=class_names, training=training, root_path=root_path, logger=logger
        )
        self.infos = []
        .....
        self.include_nuscenes_data(self.mode)
        if self.training and self.dataset_cfg.get('BALANCED_RESAMPLING', False):    # train time
            self.infos = self.balanced_infos_resampling(self.infos)

这里是对[‘nuscenes_infos_10sweeps_train.pkl’]文件的加载，其加载的结果如下：
在这里插入图片描述

每个列表是一个scense场景，每个scenes场景还有对应了多帧sweep场景，对于每个scense是一个字典结构，包含了14个key，详细如下所示：
在这里插入图片描述

随后会执行类别不均衡函数，这个处理是统计每个sences场景中的gt，然后按照gt的类别进行划分，此时构架的样本是重复的，因为一个sensces一般是包含了多个类别的gt，那么这个scenes会被同时存放在这些类别gt的对应列表结构中。
在这里插入图片描述

在进行了按类别的scense存储之后，再按照每个gt的数量从列表中进行随机采样，重新构建出一个scense列表结构。此时的列表数量要比原始的多，也包含了重复的样本，但总体上各类别的gt样本会均衡一点。

def balanced_infos_resampling(self, infos):
    """
    Class-balanced sampling of nuScenes dataset from https://arxiv.org/abs/1908.09492
    """
    if self.class_names is None:
        return infos

    # 每个场景包含了某个类别都进行存储
    cls_infos = {name: [] for name in self.class_names}
    for info in infos:
        for name in set(info['gt_names']):
            if name in self.class_names:
                cls_infos[name].append(info)

    duplicated_samples = sum([len(v) for _, v in cls_infos.items()])    # 1639
    cls_dist = {k: len(v) / duplicated_samples for k, v in cls_infos.items()}   # 类别样本在每个scene中的分布占比

    sampled_infos = []

    frac = 1.0 / len(self.class_names)    # 0.1
    ratios = [frac / v for v in cls_dist.values()]  # 分布占比越大权重越小

    # 从每个类别的样本中按权重进行随机采样，每个类别采样的数量是：len(cur_cls_infos) * ratio，采样后的数据构建成一个列表存储
    for cur_cls_infos, ratio in zip(list(cls_infos.values()), ratios):
        sampled_infos += np.random.choice(
            cur_cls_infos, int(len(cur_cls_infos) * ratio)
        ).tolist()
    self.logger.info('Total samples after balanced resampling: %s' % (len(sampled_infos)))  # 1630

    # 对类别不平衡采样后重新组织类别数据
    cls_infos_new = {name: [] for name in self.class_names}
    for info in sampled_infos:
        for name in set(info['gt_names']):
            if name in self.class_names:
                cls_infos_new[name].append(info)

    cls_dist_new = {k: len(v) / len(sampled_infos) for k, v in cls_infos_new.items()}
    # 返回的是重新采样后的数据，此时样本量会扩充
    return sampled_infos

此时的采样数据中更改如下：
在这里插入图片描述

可以看出，从原来的nuscenes_infos_10sweeps_train.pkl文件只有300+的样本，此时采样数量更改到了1600+个样本，里面的很多scenes都是重复的。getitem函数每次也是从这里随机采样一个场景来进行处理，也可以理解为重复对类别gt较少的样本进行重复处理，来减少长尾分布带来的性能影响。这个是从数据层面的上采样实现，简单来说就是对类别gt较少的样本进行重复多遍实现的数据上采样。
在这里插入图片描述

1.2 getitem函数对单场景处理

在上述提到，在初始化中进行了对样本的重采样，从而实现对样本数量的扩充，对一些类别数目较少的类别gt场景进行重复的处理。

info = copy.deepcopy(self.infos[index])
points = self.get_lidar_with_sweeps(index, max_sweeps=self.dataset_cfg.MAX_SWEEPS)  # 整合sweep点数据

随机获取了一个scenes数据，随后就是根据对应的路径读取点云数据，此外还包含了对个sweep数据的补充，这里会将多个sweep点云场景与scenes场景进行拼接操作。

def get_sweep(self, sweep_info):
    def remove_ego_points(points, center_radius=1.0):
        mask = ~((np.abs(points[:, 0]) < center_radius) & (np.abs(points[:, 1]) < center_radius))   # ~取反
        return points[mask]

    lidar_path = self.root_path / sweep_info['lidar_path']
    points_sweep = np.fromfile(str(lidar_path), dtype=np.float32, count=-1).reshape([-1, 5])[:, :4]
    points_sweep = remove_ego_points(points_sweep).T    # 删除中间范围内-1~1的点
    if sweep_info['transform_matrix'] is not None:
        num_points = points_sweep.shape[1]    # 26182
        points_sweep[:3, :] = sweep_info['transform_matrix'].dot(
            np.vstack((points_sweep[:3, :], np.ones(num_points))))[:3, :]   # 提取xyz信息并添加1行‘1’元素进行矩阵点乘实现转换

    cur_times = sweep_info['time_lag'] * np.ones((1, points_sweep.shape[1]))    # (1, 26182)
    return points_sweep.T, cur_times.T

def get_lidar_with_sweeps(self, index, max_sweeps=1):
    info = self.infos[index]
    lidar_path = self.root_path / info['lidar_path']

    # 提取scene点：(x,y,z,intensity,ring index)
    points = np.fromfile(str(lidar_path), dtype=np.float32, count=-1).reshape([-1, 5])[:, :4]
    sweep_points_list = [points]
    sweep_times_list = [np.zeros((points.shape[0], 1))]    # scene场景的时间戳设置为0

    # 提取sweep点：从当前的sweep数量中最多提取max_sweeps-1个
    for k in np.random.choice(len(info['sweeps']), max_sweeps - 1, replace=False):
        points_sweep, times_sweep = self.get_sweep(info['sweeps'][k])   # 获取每个sweep场景的点信息与时间戳信息
        sweep_points_list.append(points_sweep)
        sweep_times_list.append(times_sweep)

    # 将单帧的scene场景和多帧的sweep场景点云进行拼接
    points = np.concatenate(sweep_points_list, axis=0)
    times = np.concatenate(sweep_times_list, axis=0).astype(points.dtype)

    points = np.concatenate((points, times), axis=1)    # 拼接 (270396, 5)
    return points

然后开始构建data_dict的字典。这里对于少于1个点的gt进行筛选，从事构建gt和gt的类别信息到data_dict中：

# 筛选点数少于等于1的gt,更新gt_names和gt_boxes
if 'gt_boxes' in info:
    if self.dataset_cfg.get('FILTER_MIN_POINTS_IN_GT', False):
        mask = (info['num_lidar_pts'] > self.dataset_cfg.FILTER_MIN_POINTS_IN_GT - 1)
    else:
        mask = None

    input_dict.update({
        'gt_names': info['gt_names'] if mask is None else info['gt_names'][mask],
        'gt_boxes': info['gt_boxes'] if mask is None else info['gt_boxes'][mask]
    })

此时的data_dict结构如下：
在这里插入图片描述

随后会进行prepare_data操作，这包含了点云数据增强、特征选择、点云数据处理，将点云场景量化为voxel的几个操作，操作完后的data_dict更下如下：
在这里插入图片描述

此时，算是处理完了一个scenes场景，在同时处理完batch_size个之后，就会执行collect_batch函数，将多个scene数据重建成多维矩阵，进行矩阵的并行计算。这里我设置的batch_size是4，所以这里一个batch数据的最高维度是4，如下所示：
在这里插入图片描述

这里的point和voxek_coords由于在对应的维度中添加了一列来区分其所属的scenes，所以直接拼接起来就好了。随后，这个batch_data的数据就会丢给模型来进行处理，处理的是voxel这个特征，其他的key是用来辅助的。

2. nuScenes CreateDatabase

以上过程是介绍了从pkl序列信息读取内容到batch数据的构建整个过程，那么如果构建从pkl文件是这里的内容。首先在pycharm中进行设置，以实现可以debug一些子模块
在这里插入图片描述

可以看见，上面设置的参数可以正常传入args中。

这一部分包含了两个函数的处理，一个是create_groundtruth_database来进行gt sample的数据增强，另外的是create_nuscenes_info函数来进行nuScenes数据集的处理，来生成nuscenes_infos_10sweeps_train.pkl文件，随后在nuScense数据集处理的时候进行初始化的操作。

2.1 create_nuscenes_info

create_nuscenes_info函数的核心其实是调用了nuscenes_utils.fill_trainval_infos函数，在此之前create_nuscenes_info函数对nuScenes mini版本中划分的训练集和测试集的scenes场景路径进行判断，看是否存在
在这里插入图片描述

随后将name转化成token的形式，传入nuscenes_utils.fill_trainval_infos函数进行pkl文件的生成处理。（这里的filter内置函数配置lambda的写法很值得学习）

train_scenes = list(filter(lambda x: x in available_scene_names, train_scenes))
val_scenes = list(filter(lambda x: x in available_scene_names, val_scenes))
train_scenes = set([available_scenes[available_scene_names.index(s)]['token'] for s in train_scenes])   # 将name转化成token的形式来表示
val_scenes = set([available_scenes[available_scene_names.index(s)]['token'] for s in val_scenes])

相关信息保存
在nuscenes_utils.fill_trainval_infos函数中，会依次遍历每个nusc.sample，随后将相关信息构造成一个info的字典，随后存储在一个列表中。

for index, sample in enumerate(nusc.sample):    # len(nusc.sample): 404
    progress_bar.update()

    ref_sd_token = sample['data'][ref_chan]
    ref_sd_rec = nusc.get('sample_data', ref_sd_token)
    ref_cs_rec = nusc.get('calibrated_sensor', ref_sd_rec['calibrated_sensor_token'])
    ref_pose_rec = nusc.get('ego_pose', ref_sd_rec['ego_pose_token'])
    ref_time = 1e-6 * ref_sd_rec['timestamp']

    ref_lidar_path, ref_boxes, _ = get_sample_data(nusc, ref_sd_token)

    ref_cam_front_token = sample['data']['CAM_FRONT']
    ref_cam_path, _, ref_cam_intrinsic = nusc.get_sample_data(ref_cam_front_token)

    # Homogeneous transform from ego car frame to reference frame
    ref_from_car = transform_matrix(
        ref_cs_rec['translation'], Quaternion(ref_cs_rec['rotation']), inverse=True
    )

    # Homogeneous transformation matrix from global to _current_ ego car frame
    car_from_global = transform_matrix(
        ref_pose_rec['translation'], Quaternion(ref_pose_rec['rotation']), inverse=True,
    )

    # 构建每个sample的info列表
    info = {
        'lidar_path': Path(ref_lidar_path).relative_to(data_path).__str__(),
        'cam_front_path': Path(ref_cam_path).relative_to(data_path).__str__(),
        'cam_intrinsic': ref_cam_intrinsic,
        'token': sample['token'],
        'sweeps': [],
        'ref_from_car': ref_from_car,
        'car_from_global': car_from_global,
        'timestamp': ref_time,
    }

sweep信息保存
此外，由于每个sample场景多有多个密集的sweep帧与之相匹配，这里还会筛选挑选MAX_SWEEP个sweep场景的点云与sample的点进行拼接操作。并且，首个sample是没有prev的，所以这里会重复的采样同一个sweep，而其他用prev的会重复采样不同的max_sweeps个sweep场景。

sample_data_token = sample['data'][chan]
curr_sd_rec = nusc.get('sample_data', sample_data_token)
sweeps = []
while len(sweeps) < max_sweeps - 1:     # 挑选sweep（不超过max_sweeps）
    if curr_sd_rec['prev'] == '':
        if len(sweeps) == 0:
            sweep = {
                'lidar_path': Path(ref_lidar_path).relative_to(data_path).__str__(),
                'sample_data_token': curr_sd_rec['token'],
                'transform_matrix': None,
                'time_lag': curr_sd_rec['timestamp'] * 0,
            }
            sweeps.append(sweep)
        else:
            sweeps.append(sweeps[-1])
    else:
        curr_sd_rec = nusc.get('sample_data', curr_sd_rec['prev'])

        # Get past pose
        current_pose_rec = nusc.get('ego_pose', curr_sd_rec['ego_pose_token'])
        global_from_car = transform_matrix(
            current_pose_rec['translation'], Quaternion(current_pose_rec['rotation']), inverse=False,
        )

        # Homogeneous transformation matrix from sensor coordinate frame to ego car frame.
        current_cs_rec = nusc.get(
            'calibrated_sensor', curr_sd_rec['calibrated_sensor_token']
        )
        car_from_current = transform_matrix(
            current_cs_rec['translation'], Quaternion(current_cs_rec['rotation']), inverse=False,
        )

        tm = reduce(np.dot, [ref_from_car, car_from_global, global_from_car, car_from_current])

        lidar_path = nusc.get_sample_data_path(curr_sd_rec['token'])

        time_lag = ref_time - 1e-6 * curr_sd_rec['timestamp']

        sweep = {
            'lidar_path': Path(lidar_path).relative_to(data_path).__str__(),
            'sample_data_token': curr_sd_rec['token'],
            'transform_matrix': tm,
            'global_from_car': global_from_car,
            'car_from_current': car_from_current,
            'time_lag': time_lag,
        }
        sweeps.append(sweep)

info['sweeps'] = sweeps

gt信息保存
随后，对当前sample的gt进行统计，利用的是get_sample_data函数来实现。

def get_sample_data(nusc, sample_data_token, selected_anntokens=None):
    """
    Returns the data path as well as all annotations related to that sample_data.
    Note that the boxes are transformed into the current sensor's coordinate frame.
    Args:
        nusc:
        sample_data_token: Sample_data token.
        selected_anntokens: If provided only return the selected annotation.

    Returns:

    """
    # Retrieve sensor & pose records
    sd_record = nusc.get('sample_data', sample_data_token)
    cs_record = nusc.get('calibrated_sensor', sd_record['calibrated_sensor_token'])
    sensor_record = nusc.get('sensor', cs_record['sensor_token'])
    pose_record = nusc.get('ego_pose', sd_record['ego_pose_token'])

    data_path = nusc.get_sample_data_path(sample_data_token)

    if sensor_record['modality'] == 'camera':
        cam_intrinsic = np.array(cs_record['camera_intrinsic'])
        imsize = (sd_record['width'], sd_record['height'])
    else:
        cam_intrinsic = imsize = None

    # Retrieve all sample annotations and map to sensor coordinate system.
    if selected_anntokens is not None:
        boxes = list(map(nusc.get_box, selected_anntokens))
    else:
        boxes = nusc.get_boxes(sample_data_token)

    # Make list of Box objects including coord system transforms.
    box_list = []
    for box in boxes:
        box.velocity = nusc.box_velocity(box.token)
        # Move box to ego vehicle coord system
        box.translate(-np.array(pose_record['translation']))
        box.rotate(Quaternion(pose_record['rotation']).inverse)

        #  Move box to sensor coord system
        box.translate(-np.array(cs_record['translation']))
        box.rotate(Quaternion(cs_record['rotation']).inverse)

        box_list.append(box)

    return data_path, box_list, cam_intrinsic

annotations的结构信息如下所示：
在这里插入图片描述

这里将相关的信息，比如位置，尺寸，旋转角度，速度的信息拼接在一起组成了gt的维度信息，这里的维度是9维，相比与KITTI数据集多了两个维度。

if not test:
    annotations = [nusc.get('sample_annotation', token) for token in sample['anns']]

    # the filtering gives 0.5~1 map improvement
    num_lidar_pts = np.array([anno['num_lidar_pts'] for anno in annotations])
    num_radar_pts = np.array([anno['num_radar_pts'] for anno in annotations])
    mask = (num_lidar_pts + num_radar_pts > 0)

    # 构造gt信息(dim=8)
    locs = np.array([b.center for b in ref_boxes]).reshape(-1, 3)
    dims = np.array([b.wlh for b in ref_boxes]).reshape(-1, 3)[:, [1, 0, 2]]  # wlh == > dxdydz (lwh)
    velocity = np.array([b.velocity for b in ref_boxes]).reshape(-1, 3)
    rots = np.array([quaternion_yaw(b.orientation) for b in ref_boxes]).reshape(-1, 1)
    names = np.array([b.name for b in ref_boxes])
    tokens = np.array([b.token for b in ref_boxes])
    gt_boxes = np.concatenate([locs, dims, rots, velocity[:, :2]], axis=1)

    assert len(annotations) == len(gt_boxes) == len(velocity)

    info['gt_boxes'] = gt_boxes[mask, :]
    info['gt_boxes_velocity'] = velocity[mask, :]
    info['gt_names'] = np.array([map_name_from_general_to_detection[name] for name in names])[mask]
    info['gt_boxes_token'] = tokens[mask]
    info['num_lidar_pts'] = num_lidar_pts[mask]
    info['num_radar_pts'] = num_radar_pts[mask]

if sample['scene_token'] in train_scenes:
    train_nusc_infos.append(info)   # 追加train序列信息
else:
    val_nusc_infos.append(info)     # 追加val序列信息

2.2 create_groundtruth_database

现在获取到当前数据集中的待处理sample的序列信息，那么可以对这些sample的gt构建出一个database来进行gt的数据增强。

首选依次读取每一个sample的序列信息，对每个sample及其sweep场景的点云进行拼接操作，提取出其gt和gt_name

for idx in tqdm(range(len(self.infos))):
    sample_idx = idx
    info = self.infos[idx]
    points = self.get_lidar_with_sweeps(idx, max_sweeps=max_sweeps)
    gt_boxes = info['gt_boxes']
    gt_names = info['gt_names']

随后利用roiaware_pool3d_utils.points_in_boxes_gpu函数来对每个点的所属的gt进行判别，已实现后续每个gt的点提取。-1表示背景点，不属于任何一个gt，其他根据gt位置尺寸一一进行判别，当前的sample的gt数量是66，那么该函数的返回结果中，最大值就是65，表示第65个gt的点。

box_idxs_of_pts = roiaware_pool3d_utils.points_in_boxes_gpu(    # 判断每个点在哪个gt中,-1表示背景,其他的属于哪一类就为i
    torch.from_numpy(points[:, 0:3]).unsqueeze(dim=0).float().cuda(),
    torch.from_numpy(gt_boxes[:, 0:7]).unsqueeze(dim=0).float().cuda()
).long().squeeze(dim=0).cpu().numpy()

随后，在拼接后的sample场景中，开始对其gt进行一一遍历。分别提取出每个gt的点，随后将其平移到原点坐标上，保存点序列文件。随后保存在对于类别的列表all_db_infos中。

for i in range(gt_boxes.shape[0]):  # 对sample场景每个gt均进行构建
    filename = '%s_%s_%d.bin' % (sample_idx, gt_names[i], i)
    filepath = database_save_path / filename
    gt_points = points[box_idxs_of_pts == i]

    gt_points[:, :3] -= gt_boxes[i, :3]
    with open(filepath, 'w') as f:    # gt点云序列的提取
        gt_points.tofile(f)

    if (used_classes is None) or gt_names[i] in used_classes:
        db_path = str(filepath.relative_to(self.root_path))  # gt_database/xxxxx.bin
        db_info = {'name': gt_names[i], 'path': db_path, 'image_idx': sample_idx, 'gt_idx': i,  # gt信息保存在对应的类别列表中
                   'box3d_lidar': gt_boxes[i], 'num_points_in_gt': gt_points.shape[0]}
        if gt_names[i] in all_db_infos:
            all_db_infos[gt_names[i]].append(db_info)
        else:
            all_db_infos[gt_names[i]] = [db_info]

这里的列表有9个表示设置了检测9个类别，所有的sample场景的gt全部都会保存在这个gt列表中，最后会将这个序列信息进行保存，后续在进行preprare_data函数中进行gt_sample读取时再进行操作，来实现gt的采样操作实现copy_paste的数据增强。
在这里插入图片描述

其中gt的database文件夹中存储的是每一个gt的点信息，也就是一个bin文件。完整的处理代码如下所示：

def create_groundtruth_database(self, used_classes=None, max_sweeps=10):
    import torch

    database_save_path = self.root_path / f'gt_database_{max_sweeps}sweeps_withvelo'
    db_info_save_path = self.root_path / f'nuscenes_dbinfos_{max_sweeps}sweeps_withvelo.pkl'

    database_save_path.mkdir(parents=True, exist_ok=True)
    all_db_infos = {}

    for idx in tqdm(range(len(self.infos))):
        sample_idx = idx
        info = self.infos[idx]
        points = self.get_lidar_with_sweeps(idx, max_sweeps=max_sweeps)
        gt_boxes = info['gt_boxes']
        gt_names = info['gt_names']

        box_idxs_of_pts = roiaware_pool3d_utils.points_in_boxes_gpu(    # 判断每个点在哪个gt中,-1表示背景,其他的属于哪一类就为i
            torch.from_numpy(points[:, 0:3]).unsqueeze(dim=0).float().cuda(),
            torch.from_numpy(gt_boxes[:, 0:7]).unsqueeze(dim=0).float().cuda()
        ).long().squeeze(dim=0).cpu().numpy()

        for i in range(gt_boxes.shape[0]):  # 对sample场景每个gt均进行构建
            filename = '%s_%s_%d.bin' % (sample_idx, gt_names[i], i)
            filepath = database_save_path / filename
            gt_points = points[box_idxs_of_pts == i]

            gt_points[:, :3] -= gt_boxes[i, :3]
            with open(filepath, 'w') as f:    # gt点云序列的提取
                gt_points.tofile(f)

            if (used_classes is None) or gt_names[i] in used_classes:
                db_path = str(filepath.relative_to(self.root_path))  # gt_database/xxxxx.bin
                db_info = {'name': gt_names[i], 'path': db_path, 'image_idx': sample_idx, 'gt_idx': i,  # gt信息保存在对应的类别列表中
                           'box3d_lidar': gt_boxes[i], 'num_points_in_gt': gt_points.shape[0]}
                if gt_names[i] in all_db_infos:
                    all_db_infos[gt_names[i]].append(db_info)
                else:
                    all_db_infos[gt_names[i]] = [db_info]
    for k, v in all_db_infos.items():
        print('Database %s: %d' % (k, len(v)))

    with open(db_info_save_path, 'wb') as f:
        pickle.dump(all_db_infos, f)