使用激光雷达（LiDAR）和相机进行3D物体跟踪

使用相机和激光雷达进行时间到碰撞（TTC）计算

在我的先前文章中，我介绍了通过检测关键点和匹配描述符进行2D特征跟踪的主题。在本文中，我将利用这些文章中的概念，以及更多的内容，开发一个软件流水线，使用相机和激光雷达测量在3D空间中检测和跟踪对象，并使用两者估计每个时间步长与前方车辆的时间到碰撞（TTC）（如本文开头的GIF所示）。我完成了这个项目，作为我Udacity传感器融合纳米学位课程的一部分。

要理解整个过程，请参考下面的流程图。我的先前文章详细介绍了流程图中的第5、6和7点。本文将简要介绍代码片段中的其余部分。

建立TTC计算的基本块

该项目分为4个部分：

1. 首先，通过使用关键点对应关系来开发一种匹配3D对象的方法。

2. 其次，基于激光雷达测量计算TTC。

3. 然后，使用相机执行相同的操作，这首先要将关键点匹配到感兴趣区域，然后基于这些匹配计算TTC。

4. 最后，对框架进行各种测试。目标是识别用于TTC估计的最适合的检测器/描述符组合，同时搜索可能导致相机或激光雷达传感器测量错误的问题。

检测和分类对象

我们需要一种方法来检测图像中的车辆，以便我们可以隔离匹配的关键点以及投影的激光雷达点，并将它们与特定对象关联起来。然而，为了计算特定车辆的时间到碰撞，我们需要隔离该车辆上的关键点，以便TTC估计不会因包括例如道路表面、静止物体或场景中其他车辆上的匹配而扭曲。实现这一目标的一种方法是使用对象检测自动识别场景中的车辆。基于这些边界框，我们可以轻松地将关键点匹配到对象，并实现稳定的TTC估计。

一种易于使用且基于深度学习的方法是YOLO，这是一个非常快速的检测框架，它随OpenCV库一起提供。我使用YOLOv3，它可以根据COCO（一个大规模的对象检测、分割和字幕数据集）在图像和视频中识别80种不同的对象。

这是图像通过神经网络后的最终输出。所有检测到的车辆都被包含在边界框中。

裁剪和使用边界框对激光雷达点进行聚类

由于我们的目标是计算与前车的TTC，我们应该只关注紧挨着自动车的车辆，并忽略所有其他车辆。在对激光雷达点进行聚类后，我们过滤掉所有位于相邻车道和道路上的点。然后，我们遍历所有激光雷达点，将它们投影到相机平面，并将它们关联到各自的边界框。这是在过滤之前的样子；对象的激光雷达点被包含在各自的边界框内。

在过滤掉与我们的功能无关的对象之后，图像将如下所示：

跟踪3D对象边界框

一旦我们对多个图像执行了上述步骤，我们就可以开始在连续图像之间跟踪边界框。为了在帧之间匹配边界框，我们遍历所有关键点匹配对，并在两个帧中关联相应的边界框。然后，我们存储匹配的唯一框ID，并创建一个边界框匹配对。

具有最高关键点匹配次数的边界框对然后被选择为最佳匹配对（bbBestMatches），因此我们能够在连续帧中唯一跟踪相同的对象。

以下图像说明了这个概念。左图是来自T-1时刻的帧，右图是T时刻的帧。前一帧中的边界框（框ID 1）在当前帧中有两个边界框（框ID 3,4）上的关键点匹配。然而，由于BB1和BB3之间的关键点匹配对具有与BB1和BB4相比的最高匹配次数，所以边界框对1–3被选为最佳匹配对，因此代表相同的对象。

以下是实现相同的代码：

/**
 * @brief Match list of 3D objects (vector<BoundingBox>) between current and previous frame
 *
 * @param matches       List of best matches between previous and current frame
 * @param bbBestMatches Output list of best matches between previous and current frame (matched boxID pairs)
 * @param prevFrame     Previous frame
 * @param currFrame     Current frame
 */
void matchBoundingBoxes(std::vector<cv::DMatch> &matches, std::map<int, int> &bbBestMatches, DataFrame &prevFrame, DataFrame &currFrame)
{


    /* STEP 1 : Go through all keypoint matches and associate them with their respective bounding boxes in both images */


    std::multimap<int, int> bbTempMatches;


    // Loop over all the keypoint match pairs
    for (auto matchIt = matches.begin(); matchIt != matches.end(); ++matchIt)
    {
        int prevImgBoxID = -1;
        int currImgBoxID = -1;


        // Loop through all the BBoxes in previous image and find the box ID of the 'query' keypoint(match keypoint in previous image)
        for (auto it = prevFrame.boundingBoxes.begin(); it != prevFrame.boundingBoxes.end(); ++it)
        {
            cv::KeyPoint keyPt;
            keyPt = prevFrame.keypoints[matchIt->queryIdx];
            if (it->roi.contains(keyPt.pt))
            // if ((keyPt.pt.x > (it->roi.x)) && (keyPt.pt.x < (it->roi.x + it->roi.width)) &&
            //     (keyPt.pt.y > (it->roi.y)) && (keyPt.pt.y < (it->roi.y + it->roi.height)))
            {
                it->keypoints.push_back(keyPt);
                prevImgBoxID = it->boxID;
                break;
            }
        }


        // Loop through all the BBoxes in current image and find the box ID of the 'train' keypoint(match keypoint in current image)
        for (auto it = currFrame.boundingBoxes.begin(); it != currFrame.boundingBoxes.end(); ++it)
        {
            cv::KeyPoint keyPt;
            keyPt = currFrame.keypoints[matchIt->trainIdx];
            if (it->roi.contains(keyPt.pt))
            // if ((keyPt.pt.x > (it->roi.x)) && (keyPt.pt.x < (it->roi.x + it->roi.width)) &&
            //     (keyPt.pt.y > (it->roi.y)) && (keyPt.pt.y < (it->roi.y + it->roi.height)))
            {
                it->keypoints.push_back(keyPt);
                currImgBoxID = it->boxID;
                break;
            }
        }


        // Store the box ID pairs in a temporary multimap
        if ((prevImgBoxID != -1) && (currImgBoxID != -1)) // Exclude pairs which are not part of either BBoxes
        {
            bbTempMatches.insert(std::make_pair(prevImgBoxID, currImgBoxID));
        }


    } // eof Loop over all keypoint match pairs


    /* STEP 2: For each BBox pair count the number of keypoint matches*/


    // Find all the unique keys (BBox IDs in the prev image) from the multimap
    std::set<int> unique_keys;
    int last_key = INT_MIN; // some value that won't appear


    for (auto it = bbTempMatches.begin(); it != bbTempMatches.end(); ++it)
    {
        if (it->first != last_key)
        {
            unique_keys.insert(it->first);
            last_key = it->first;
        }
    }


    // Display contents of multimap
    if (false)
    {
        for (auto itr = bbTempMatches.begin(); itr != bbTempMatches.end(); ++itr)
        {
            cout << '\t' << itr->first << '\t' << itr->second
                 << '\n';
        }
    }


    // Create a map to count occurrences of each key-value pair
    std::map<std::pair<int, int>, int> count_map;


    // Loop through each element in the multimap
    for (auto it = bbTempMatches.begin(); it != bbTempMatches.end(); ++it)
    {
        // Create a pair from the key and value of multimap
        std::pair<int, int> key_value_pair = std::make_pair(it->first, it->second);


        // Check if the pair already exists in count_map
        if (count_map.find(key_value_pair) == count_map.end())
        {
            // If not, insert it with a count of 1
            count_map.insert(std::make_pair(key_value_pair, 1));
        }
        else
        {
            // If it exists, increment the count
            count_map[key_value_pair]++;
        }
    }


    // Display the count of each key-value pair
    if (false)
    {
        for (auto it = count_map.begin(); it != count_map.end(); ++it)
        {
            std::cout << "(" << it->first.first << ", " << it->first.second << "): " << it->second << std::endl;
        }
    }


    /* STEP 3: The BBox pair with highest number of keypoint match occurences is the best matched BBox pair*/


    // Iterate through each unique bounding box IDs in the previous image
    for (auto it = unique_keys.begin(); it != unique_keys.end(); ++it)
    {
        int BBoxIdx1 = -1; // BBox index
        int BBoxIdx2 = -1;
        int maxKyPtCnt = INT_MIN;


        // Loop through all the BBox matched pairs and find the ones with highest keypoint occurences
        for (auto it1 = count_map.begin(); it1 != count_map.end(); ++it1)
        {
            int currKyPtCnt = it1->second; // Number of occurences for the current BBox pair


            if (it1->first.first == *it)
            {
                if (currKyPtCnt >= maxKyPtCnt)
                {
                    maxKyPtCnt = currKyPtCnt;
                    BBoxIdx1 = it1->first.first;
                    BBoxIdx2 = it1->first.second;
                }
            }
        }


        if ((BBoxIdx1 != -1) && (BBoxIdx2 != -1)) // Exclude pairs which are not part of either BBoxes
        {
            bbBestMatches.insert(std::make_pair(BBoxIdx1, BBoxIdx2));
        }
    }


    // Display the count of each BBox pair
    if (false)
    {
        for (auto it = bbBestMatches.begin(); it != bbBestMatches.end(); ++it)
        {
            std::cout << "(" << it->first << ", " << it->second << "): " << std::endl;
        }
    }
}

所有属于当前帧边界框的关键点匹配都被聚类并通过根据它们相对于边界框中所有匹配的欧几里德距离消除离群匹配来进行过滤。

这一步是为了使用相机计算鲁棒的TTC，我们稍后会看到。

下面的代码展示了如何实现上述的过滤器。

/**
 * @brief Cluster keypoint matches with the current bounding box
 *
 * @param boundingBox   Current bounding box
 * @param kptsPrev      Previous frame keypoints
 * @param kptsCurr      Current frame keypoints
 * @param kptMatches    Keypoint matches between previous and current frame
 */
void clusterKptMatchesWithROI(BoundingBox &boundingBox, std::vector<cv::KeyPoint> &kptsPrev, std::vector<cv::KeyPoint> &kptsCurr, std::vector<cv::DMatch> &kptMatches)
{
    /* STEP 1: Associate the keypoint match with the bounding box and calculate keypoint match distance*/


    // Loop through all the keypoint matches and find the ones which are part of the current bounding box
    // Also find the distance between the matched keypoints
    std::vector<double> distKptMatches;
    for (auto it = kptMatches.begin(); it != kptMatches.end(); ++it)
    {
        cv::KeyPoint keyPtPrev, keyPtCurr;
        double dist;
        keyPtPrev = kptsPrev[it->queryIdx];
        keyPtCurr = kptsCurr[it->trainIdx];


        if (boundingBox.roi.contains(keyPtCurr.pt))
        {
            boundingBox.kptMatches.push_back(*it);
            dist = cv::norm(keyPtCurr.pt - keyPtPrev.pt);
            distKptMatches.push_back(dist);
        }
    }


    /*STEP 2: Remove outlier keypoint matches from the bounding box*/


    // Get the Q1 and Q3 percentile for the distance vector
    double q1 = percentile(distKptMatches, 0.25);
    double q3 = percentile(distKptMatches, 0.75);
    // Find the IQR for the distance vector
    double iqr = q3 - q1;


    // Go through all the matched keypoint pairs in the current bounding box and remove the ones which are outliers from the bounding box
    auto it1 = boundingBox.kptMatches.begin();
    while (it1 != boundingBox.kptMatches.end())
    {
        cv::KeyPoint keyPtPrev, keyPtCurr;
        double dist;
        keyPtPrev = kptsPrev[it1->queryIdx];
        keyPtCurr = kptsCurr[it1->trainIdx];
        dist = cv::norm(keyPtCurr.pt - keyPtPrev.pt);


        if ((dist < (q1 - 1.5 * iqr)) || (dist > (q3 + 1.5 * iqr)))
        {
            it1 = boundingBox.kptMatches.erase(it1); // erase() returns the next iterator
        }
        else
        {
            ++it1;
        }
    }
}

计算使用 LiDAR 点的时间到碰撞（TTC）

这一步使用仅基于 LiDAR 测量的方式计算前车的时间到碰撞（TTC）。

如上图所示，d0 是时刻 t0 时自车与前车之间的距离，d1 是时刻 t1 时的距离。基于恒定速度模型，计算 TTC 的方程如下：

使用 LiDAR 计算 TTC

一旦知道相对速度 v0，就可以通过将两车之间的剩余距离除以 v0 来轻松计算碰撞时间。因此，鉴于 LiDAR 传感器能够进行精确的距离测量，可以基于 CVM 和上述方程组开发一个用于 TTC 估计的系统。然而，请注意，雷达传感器将是 TTC 计算的更优解，因为它可以直接测量相对速度，而在 LiDAR 传感器中，我们需要从两个（带有噪音的）距离测量中计算 v0。

为了使 TTC 对离群值具有鲁棒性，我使用 IQR 方法在距离测量中进行了滤除。以下是 TTC 的实现：

/**
 * @brief Compute time-to-collision (TTC) based on Lidar measurements
 *
 * @param lidarPointsPrev Previous Lidar points
 * @param lidarPointsCurr Current Lidar points
 * @param frameRate       Frame rate of the camera
 * @param TTC             Output TTC
 */
void computeTTCLidar(std::vector<LidarPoint> &lidarPointsPrev,
                     std::vector<LidarPoint> &lidarPointsCurr, double frameRate, double &TTC, double &velLidar)
{
    // auxiliary variables
    double dT = 1.0 / frameRate; // time between two measurements in seconds
    std::vector<double> lidarPointsPrevX, lidarPointsCurrX;
    std::vector<double> filtLidarPointsPrevX, filtLidarPointsCurrX;


    // Create vector of all x points from previous and current Lidar points
    for (auto it = lidarPointsPrev.begin(); it != lidarPointsPrev.end(); ++it)
    {
        lidarPointsPrevX.push_back(it->x);
    }


    for (auto it = lidarPointsCurr.begin(); it != lidarPointsCurr.end(); ++it)
    {
        lidarPointsCurrX.push_back(it->x);
    }


    // Filter out outliers from the X-distance vector
    filtLidarPointsPrevX = removeOutliers(lidarPointsPrevX);
    filtLidarPointsCurrX = removeOutliers(lidarPointsCurrX);


    // find closest distance to Lidar points within ego lane
    double minXPrev = INT_MAX, minXCurr = INT_MAX;
    for (auto it = filtLidarPointsPrevX.begin(); it != filtLidarPointsPrevX.end(); ++it)
    {
        minXPrev = minXPrev > *it ? *it : minXPrev;
    }


    for (auto it = filtLidarPointsCurrX.begin(); it != filtLidarPointsCurrX.end(); ++it)
    {
        minXCurr = minXCurr > *it ? *it : minXCurr;
    }


    // compute TTC from both measurements
    TTC = minXCurr * dT / (minXPrev - minXCurr);
    // Compute velocity
    velLidar = (minXPrev - minXCurr) / dT;
}




/**
 * @brief Function to remove outliers based on IQR
 *
 * @param data                  Input dataset
 * @return std::vector<double>  Filtered dataset
 */
std::vector<double> removeOutliers(const std::vector<double> &data)
{
    std::vector<double> sorted_data = data;
    std::sort(sorted_data.begin(), sorted_data.end());


    double q1 = percentile(sorted_data, 0.25);
    double q3 = percentile(sorted_data, 0.75);


    double iqr = q3 - q1;


    std::vector<double> filtered_data;
    for (double x : data)
    {
        if (x >= q1 - 1.5 * iqr && x <= q3 + 1.5 * iqr)
        {
            filtered_data.push_back(x);
        }
    }


    return filtered_data;
}


/**
 * @brief Function to calculate the percentile of a sorted dataset
 *
 * @param dataIn    Input dataset
 * @param p         Percentile
 * @return double   Value of the percentile
 */
double percentile(const std::vector<double> &dataIn, double p)
{
    std::vector<double> data = dataIn;
    std::sort(data.begin(), data.end());


    int N = data.size();
    double n = (N - 1) * p + 1;


    // If n is an integer, then percentile is a data point
    if (n == floor(n))
    {
        return data[n - 1];
    }
    else
    {
        int k = floor(n);
        double d = n - k;
        return data[k - 1] + d * (data[k] - data[k - 1]);
    }
}

下面的图表显示了所有18个连续帧的 LiDAR TTC 值。TTC 值不一致且变化很大。这可能是因为使用恒定速度模型计算 TTC。这不是一个好的假设，因为前车的速度并非恒定。图像中的红线显示了如果使用恒定速度模型时的 TTC。

另一个原因可能是由于使用 Lidar 点的最小 X 距离计算 TTC，这是一个单一点。这个点也可能是噪声。通过使用 Lidar 点的 X 距离的中值，这可能会得到改善。

使用相机计算 TTC

这一步使用仅相机计算前车的时间到碰撞（TTC）。

通过测量连续的距离，可以使用 3D Lidar 传感器计算 TTC。然而，使用 2D 相机进行 TTC 计算是具有挑战性的，原因是缺乏 3D 测量和用于运动跟踪的准确车辆识别。

单眼相机无法测量度量距离。它们是依赖于环境光的被反射到相机镜头中的被动传感器。因此，与 Lidar 技术测量光的运行时不同，无法测量光的运行时。

为了测量对象之间的距离，可以使用两个对齐的相机来定位两个图像中的共同兴趣点并三角测量它们的距离。然而，由于其体积大、成本高以及找到对应特征的计算负载，立体相机在 ADAS 和自动驾驶车辆中变得不太受欢迎。

尽管单眼相机有其局限性，我们仍然可以尝试在没有距离测量的情况下计算 TTC。我们将使用恒定速度运动模型，并用相机可以准确测量的图像平面上的像素距离替换度量距离。

在下图中，您可以看到前车的高度 H 如何通过透视投影映射到图像平面上。我们可以看到相同的高度 H 在图像平面上取决于车辆的距离 d0 和 d1。显然，高度 h，H，d 和针孔相机的焦距 f 之间存在几何关系 —— 这就是我们想要在下文中利用的。

以下一组方程显示了如何仅使用投影计算 TTC。

使用相机计算 TTC

因此，通过观察图像传感器上的相对高度变化，可以测量时间到碰撞。不需要距离测量，因此我们可以使用单眼相机通过直接观察图像中相对高度（也称为尺度变化）的变化来估计时间到碰撞。

神经网络（YOLO）为每辆车返回一个边界框，但它们的尺寸可能不准确。使用边界框的高度或宽度进行 TTC 计算将导致显著的误差。

与依赖于整个车辆的检测不同，我们现在希望在较小的尺度上分析其结构。如果能够找到可以从一帧跟踪到下一帧的唯一可识别的关键点，我们可以使用车辆上所有关键点之间的距离相对于彼此的距离来计算 TTC 方程中高度比的鲁棒估计。下图说明了这个概念。

关键点缩放

在（a）中，已检测到一组关键点，并计算了关键点 1-7 之间的相对距离。在（b）中，使用称为描述符的高维相似性测量，成功地匹配了连续图像之间的 4 个关键点（其中关键点 3 不匹配）。

可以使用彼此之间所有相对距离的比率来计算可靠的 TTC 估计，方法是将高度比 ℎ1/ℎ0 替换为所有距离比率 dk/dk` 的均值或中值。

以下代码实现了先前的概念。同样，为了使 TTC 对离群值具有鲁棒性，我们使用 IQR 方法。

/**
 * @brief Compute time-to-collision (TTC) based on keypoint correspondences in successive images
 * 
 * @param kptsPrev      Previous frame keypoints
 * @param kptsCurr      Current frame keypoints
 * @param kptMatches    Keypoint matches between previous and current frame
 * @param frameRate     Frame rate of the camera 
 * @param TTC           Output TTC
 * @param visImg        Output visualization image
 */
void computeTTCCamera(std::vector<cv::KeyPoint> &kptsPrev, std::vector<cv::KeyPoint> &kptsCurr,
                      std::vector<cv::DMatch> kptMatches, double frameRate, double &TTC, cv::Mat *visImg)
{
    /* STEP 1: Compute distance ratios between all matched keypoints*/


    vector<double> distRatios; // stores the distance ratios for all keypoints between curr. and prev. frame
    for (auto it1 = kptMatches.begin(); it1 != kptMatches.end() - 1; ++it1)
    { // outer keypoint match  loop


        // get current keypoint and its matched partner in the prev. frame
        cv::KeyPoint kpOuterCurr = kptsCurr.at(it1->trainIdx);
        cv::KeyPoint kpOuterPrev = kptsPrev.at(it1->queryIdx);


        for (auto it2 = kptMatches.begin() + 1; it2 != kptMatches.end(); ++it2)
        { // inner keypoint match loop


            double minDist = 100.0; // min. required distance in pixels


            // get next keypoint and its matched partner in the prev. frame
            cv::KeyPoint kpInnerCurr = kptsCurr.at(it2->trainIdx);
            cv::KeyPoint kpInnerPrev = kptsPrev.at(it2->queryIdx);


            // compute distances and distance ratios
            double distCurr = cv::norm(kpOuterCurr.pt - kpInnerCurr.pt);
            double distPrev = cv::norm(kpOuterPrev.pt - kpInnerPrev.pt);


            if (distPrev > std::numeric_limits<double>::epsilon() && distCurr >= minDist)
            { // avoid division by zero


                double distRatio = distCurr / distPrev;
                distRatios.push_back(distRatio);
            }
        } // eof inner loop over all matched kpts
    }     // eof outer loop over all matched kpts


    // only continue if list of distance ratios is not empty
    if (distRatios.size() == 0)
    {
        TTC = NAN;
        return;
    }


    /* STEP 2: Filter out outliers from the distance ratio vector*/


    // Remove the outliers from the distance ratio vector
    std::vector<double> filtDistRatios = removeOutliers(distRatios);


    /* STEP 3: Compute camera-based TTC from distance ratios*/


    // compute camera-based TTC from mean distance ratios
    double meanDistRatio = std::accumulate(filtDistRatios.begin(), filtDistRatios.end(), 0.0) / filtDistRatios.size();


    double dT = 1 / frameRate;
    // TTC = -dT / (1 - meanDistRatio);


    // Alternate method to compute camera-based TTC from distance ratios using median
    double medianDistRatio;


    std::sort(filtDistRatios.begin(), filtDistRatios.end());
    if (distRatios.size() % 2 == 0)
    {
        medianDistRatio = (filtDistRatios[filtDistRatios.size() / 2 - 1] + filtDistRatios[filtDistRatios.size() / 2]) / 2;
    }
    else
    {
        medianDistRatio = filtDistRatios[filtDistRatios.size() / 2];
    }


    TTC = -dT / (1 - medianDistRatio);
}

上述 TTC 计算是针对多种检测器-描述符组合进行的。在所有组合中，对于以下组合，获得了最佳的 TTC 相机数值：

为了获得快速执行时间：FAST — ORB

为了获得良好的准确性：SHITOMASI — BRIEF

正如图中所示，在绘制中，与其他组合相比，SHITOMASI — BRIEF 组合的 TTC 计算非常一致且变化较小。FAST — ORB 组合的 TTC 计算与 TTC Lidar 值不太接近，且变化较大。这可能是因为 FAST 检测器对照明和视点变化不太稳健，而 ORB 描述符对视角变化也不太稳健。这可能是 FAST — ORB 组合性能较差的原因。此外，边界框并不总是完美地包含仅包含前置对象的部分，有时还包含来自道路某些部分的关键点匹配。

· END ·

HAPPY LIFE