SparseDrive

news2025/4/17 17:26:45

清华地平线合作开发的e2e的框架

SparseDrive资源
论文 https://arxiv.org/pdf/2405.19620
git https://github.com/swc-17/SparseDrive

个人觉得该文章厉害的地方

纯sparse mapping， 3d detection方案, 用的检测头sparse4D V3 sparsev1v2v3基本一致，map也是稀疏检测。这对落地的可能性大大提升。
使用两阶段实现了e2e，第一阶段mapping和3d detection的训练，第二阶段固定mapping和3d detection对planning和motion prediction训练。这解决收敛难得问题，虽然分了两步，但是是比较实际得方案。

之前只关注了obstacle得部分，和个人得工作内容有关，最近看它得map实现也是基于sparse实现得，觉得还挺神奇得，看了它得消融实验和maptr比还是掉点得。58.7降低到56.2。 sparse得细节内容看 sparsev1v2v3

整体结构

在这里插入图片描述

整体的SparseDrive的框架结构包含image encoder, symmetric sparse perception and parallel motion planner。

image encoder：包含backbone和neck；
symmetric sparse perception module：特征图 I 由两组实例组成。这两组实例分别代表周围的obstacle和map，然后被输入到并行motion planner中，与初始化的自车实例进行交互。
motion planner：同时预测周围obstacle和自车的轨迹，并通过分层规划选择策略选择一条安全的轨迹作为最终的规划结果。

Symmetric Sparse Perception

代码实现内容： map和detection都有个InstanceBank

get: 拿anchor,feature,cached_feature,cached_anchor,time_interval
update: reffine后根据cls得分update anchor和feature
Cache: 该bs结束后，cache topk(confidence, self.num_temp_instances, instance_feature, anchor)
get_instance_id
update_instance_id
MotionPlanning中有一个InstanceQueue来负责管理map和detection的结果的储存
get：输入detection的检测结果和map的mask以及feature等，
- prepare_motion：使用detection结果对instance_feature_queue，anchor_queue更新
- prepare_planning：使用feature_maps,map的mask对ego_feature_queue，ego_anchor_queue，ego_period进行更新
- 融合temperal信息输出ego_feature, ego_anchor, temp_instance_feature, temp_anchor, temp_mask也就是planning的输入
cache_motion
cache_planning

anchor的理解

针对这个模块，是sparse头得基础实现，是由6层这样的模块重复实现而生，每次都在更新anchor和feature来获得最终得feature和anchor。bs是batch size.

3d anchor: bs * 900 * 11 表征方式 [x,y,z,w,h,l,yaw,vx,vy,vz] (vcs坐标系)
- 900 是3D的num query的数目
- 11是代表一个3D检测框的属性，中心点，长宽高，角度，3个维度的速度
map anchor: bs * instance num * sample num * 2 (vcs坐标系) => reshape成 bs* [ instance num * sample num] * 2
- instance num 最大instance的数目
- sample num 每个instance采样的点
- 2点的x,y坐标系
  
  anchor会经过投影获得图像上的点在deformable aggregation在feature上进行采样，获得2D和3D的链接，这deformable aggregation这个方案的最重要理解的地方。

Parallel Motion Planner

Ego Instance Initialization: 对于ego feature Fe使用前视相机的最小featuremap进行初始化;对于ego anchor Be和车辆差不多一致初始化
Spatial-Temporal Interactions：由于自车实例在初始化时没有时间信息，而时间对于规划非常重要，设计了一个的实例记忆队列进行时间建模，H是4存储帧的数量。自车实例与周围的参与者进行拼接，以获得参与者级别的实例

下图是周围物体和自车的轨迹预测和对应得分：

Km是Kp是运动预测和规划的模式数量，Tm 和 Tp 是运动预测和规划的未来时间戳数量，Ncmd 是规划的驾驶命令数量。

End-to-End Learning

分阶段训练
- stage1 从头开始训练对称稀疏感知模块，以学习稀疏场景表示;
- stage2 稀疏感知模块和并行motion planer一起训练，没有冻结模型权重，充分利用端到端优化的优势。
Loss Functions: det+map+motion+plan+depth

消融实验

Backbone SparseDrive-S ResNet50 image size is 256×704. SparseDrive-B, ResNet101 512×1408
3D detection：49.6% mAP and 58.8% NDS；
multi-object tracking：50.1% AMOTA, ID switch of 632
online mapping： mAP of 56.2%
motion prediction: minADE 0.60 EPA↑ 0.555
planning: L2↓ 1s 0.29m; 3s 0.91m; Col. Rate(碰撞率)↓ 1s 0.01 3s 0.13
在这里插入图片描述