【论文精读】RayMVSNet

news2025/4/14 18:53:00

今天读的是发表在CVPR2022上的无监督MVS文章，作者来自于国防科大。
文章链接：RayMVSNet
项目地址：Github

Abstract

作者希望直接优化每个camera ray上的深度值，所以提出这个RayMVSNet来学习1D implicit field的序列预测。使用了传统MVS里的方法进行极线搜索和transformer提取特征，并且使用了mutli-task learning。

1 Introduction

贡献主要是：

一个新颖的表现形式，来学习1D隐式场。
epipolar transformer来学习特征。
mutli-task learning来建模和预测，并且基于LSTM。
效果好。

2 Related Work

介绍了基于深度学习的MVS和implicit的表征。

3 Method

Overview

3.1 3D Cost Volume and Coarse Depth Prediction

Build a variance-based 3D cost volume and get coarse depth map.

3.2 Epipolar Transformer

Goal is to estimate the location of the zero-crossing point on each ray, so we can obtainthe depth map of reference image.

Why ray-based?

depth map is view-dependent. So optimization is more straightforward and lightweight.
all the 1D implicit fields share an identical spatial property, i.e. the monotonicity of the SDFs along the ray direction.

Zero-crossing hypothesis sampling

adopt coarse depth map and uniformly sample $K$ points $P=\{p_k\}_{1}^{K}$ on the ray in the range of $\pm \delta$ .

Epipolar transformer

Use 4 self-attention layers, each followed by 2 AddNorm and 1 feed-forward layer.

3.3 Ray-based 1D Implicit Field

Given the features of the hypothesized points, the ray-based 1D implicit fields are learned with an LSTM. Crucially, we leverage two attributes of LSTM.

The mechanism of sequential processing inherently facilitates the learning of the SDF monotonicity along the ray direction.
The property of time invariance increases the network robustness by allowing the zero-crossing position to appear at any place (time-step) on the ray.

Ray-based 1D implicit field