PredRecon: 3d Reconstruction by Purposeful Movement Based on Surface Prediction
Today, we will cover this (opens in a new tab) ICRA2023 paper: PredRecon: A Prediction-boosted Planning Framework for Fast and High-quality Autonomous Aerial Reconstruction. Before start, let us you have a 3d scanner, and you want to have 3d mesh model of a chair.
Here is what a human moves to full scanning:
We know how a chair looks like. This will somewhat guide our scanning movement.
We keep track of covered and uncovered region by watching the real-time 3d scanning result. Of course, we move toward the uncovered region to choose the next viewpoint.
If you can understand the above, you might not have much difficulty in understanding this paper (although there are many technical terminologies, you can just study them)
- PCN (opens in a new tab) point completion network
- Hierarchical path planning: 2 step planner = global + local
- PointNet backbone (opens in a new tab)
- Huber loss (opens in a new tab)
- Chamfer distance (opens in a new tab) (measure of reconstruction perforamance)
- GHPR (opens in a new tab): hiddle point removal from pointcloud
- Traveling salesman problem (opens in a new tab)
- B-spline representation
- Dijkstra algorithm (the way this paper used this is very similar to my previous paper (opens in a new tab))
This paper contains two topics: 1) surface prediction module (SPM) and 2) viewpoint planning for a good MVS (multi-view stereo) performance. (here, MVS was used as an synonym for 3d reconstruction)
First, we are interested in predicting the structure of the target. The idea is that having a rough guess of the structure can be helpful when planning the future viewpoints (next section).
You can think of SPM as a kind of PCN combined with a scale estimator network (to assist better performance of PCN). But the authors did several variations.
Input: incomplete pointcloud
output: normalized incomplete pointcloud
For any network, scale normalization is important. This step estimates the scale (x,y,z) of total point cloud. Although you can think of this as a 3d bounding box (BB) of a given pointcloud, it has the ability to derive BB even for missing regions. As a very simplified example, if we have only half of a circle, it can deduce its total scale as 2*radius.
Input: normalized incomplete pointcloud
output: prediction (completed) pointcloud (inverse normalized) + inner space / surface discrimination
With the aid of the step 1 network, this network receives the normalized version of the incomplete point cloud and fill the missing region. Although it has a very similar role with PCN, the authors removed convolution operation of real time feasibility. This network is trained by reducing the Chamfer distance. After filling the missing regions, a post-process step is involved to discriminate internal space (hidden space from outside) and surface. In the following viewpoint planning step, we will not sample view points inside the internal region.
In the previous section, we predicted the surface of structure from SPM along with internal space where a viewpoint should not be selected. Using the prediction and the history of the covered region by a robot, we now plan the view path.
First step here is to generate a viewpoint skeleton which can cover each uncovered cluster (assume that we computed clusters for surface points).
- As can be seen in the above picture, each cluster has a region inside which we can observe the cluster. This region is depicted as gray 2d cones in the figure.
- When we sample a set of view points inside each cone, we can choose the best view point which can observe the largest number of uncovered surface points.
- Having them, we can compute a single-shot traveling order by ATSP (See eq (9)). The authors used Lin-Kernighan-Helsguan algorithm (opens in a new tab) to solve the TSP.
In this way, we can get a sequence of viewpoints which can observe uncovered region very wall while minimizing total travel distance and the deviation from the previous movement (the authors referred this as global consistency).
Given two viewpoints from the previous step, the second step outputs a local trajectory represented with a B-spline. According to eq (10), the control points of the B-spline are determined so that:
- surface points are observed with enough magnitude,
- surface points observed with consistent distance along timesteps,
- relative view angles along time steps are similar to desired triangulation angles.
The local control point sequence is obtained with Dijkstra algorithm.