Learning-based  Localizability  Estimation  for  Robust  LiDAR Localization


Table of Contents


LiDAR-based localization and mapping systems are a core component in many modern robotic systems. It directly inherits the depth and geometry information of the environment, allowing accurate motion estimation and high-quality map generation in real time. However, insufficient environmental constraints can lead to localization failures, which often occur in symmetric scenes such as tunnels. The work in this paper precisely addresses this issue by using a neural network approach to detect degradation in the robot’s surroundings. We specifically focus on LiDAR frame-to-frame matching, as it is a critical component of many LiDAR odometry. Different from previous methods, our method detects the possibility of localization failure directly from the raw point cloud measurements, rather than during the registration process. In addition, previous methods are limited in generalization ability due to the need to artificially adjust the threshold for degradation detection. Our proposed method avoids this problem by learning from a range of different environments to make the network perform better in each scenario. Furthermore, the network is trained exclusively on simulated data, avoiding arduous data collection in challenging and degraded and often inaccessible environments. The proposed method is tested without any modification in field experiments in a challenging environment and on two different sensor types. The observed detection performance is comparable to that of state-of-the-art methods after specially tuning the threshold.

main contribution

1.A learning-based method is proposed to detect whether a single-frame point cloud is degraded in six degrees of freedom.

2.A verification method is proposed to verify the localization ability of quadruped robots in challenging and degraded scenarios.

3.All parts are fully designed and implemented, including the collection and generation of data sets. Relevant parts will be open sourced in the robot community.

  1. Problem Modeling

This article wants to detect the positioning ability of the robot at a certain moment: (localizability). This can be represented by a 6-dimensional vector:

Among them, x, y, z represent the components in the translation direction, Φ, θ, ψ represent the components in the rotation direction (expressed by Euler angles). Each component of the vector d_k is binary (0 or 1, where 0 indicates that the positioning information on this component is reliable, and 1 indicates that it is unreliable).

The process of estimating d_k can be constructed as a multi-label binary classification problem, and the result is obtained through a neural network classifier.

2.Training data generation

According to the above problem modeling method, an important point is how to generate labeled training data? Specifically, for a point cloud, how to evaluate its degradation in six degrees of freedom?

The method proposed in this paper is to add disturbance and then look at the registration error. For a point cloud s and the pose T of the robot when it is collected, M sub-point clouds are generated, and the way to generate a sub-point cloud is:

1.Perturb the pose. The value of the disturbance is sampled by a 0-mean Gaussian distribution. The δ of the Gaussian distribution in each dimension is:

2.According to the perturbed pose, in the simulation environment, a new sub-point cloud is obtained by ray casting.

The M sub-point clouds are registered with the original point cloud, and the registration method is the point-to-plane ICP method. Calculate the difference e between the transform obtained by each sub-point cloud registration and the original perturbed transform, decompose it into six dimensions to take the absolute value, and add and average to obtain e_p. By evaluating whether e_p exceeds a specific threshold in 6 dimensions, it is judged whether the original point cloud is prone to degradation in a specific dimension. The thresholds on the 6 dimensions are:

3.Network structure

In the previous step, the tagged point cloud was obtained, and this step is to learn and train through the network. The network structure adopted in this paper is based on 3D ResUNet, and its specific network structure is:

The classification network is a 5-layer MLP, and the output is a 6-dimensional vector.

The loss function of this network is defined as:

Among them, t represents the label of the k-th point cloud on the i-th component, and p represents the corresponding probability value predicted by the network.

Since the network predicts a probability value, this paper determines whether each dimension is degraded by setting different thresholds for each dimension when using it, and its value is


1.Ablation Experiment

In this experiment, the feature extraction network was replaced by Point-Net, and the classification performance before and after the replacement was compared:

It can be concluded from the table that in most indicators, ResUNet is higher than PointNet.

2.Field experiments

Site experiments were carried out in three scenes: tunnel, open outdoor and town scene. The SLAM framework used is CompSLAM. After CompSLAM detects the degradation of the lidar, it switches to the result of VIO for recursion. In the scene without degradation, the result of VIO is used as the prior of LOAM for inter-frame registration. This paper replaces this method with the proposed learning-based method and compares it with the original version of CompSLAM.

a.Tunnel scenario:

The figure below shows the mapping effect in the tunnel scene. The three pictures on the left are the results of Ji Zhang’s method using different thresholds, and the far right is the result proposed in this paper. The bar chart below shows the time when the degradation occurs in each dimension.

b.Open outdoor scene:

Same tunnel scene.

c.Urban scenes

In this scenario, no comparative experiment is done, but only the proposed method is tested. The test scenarios are the interior and exterior of the coffee shop. The robot walks from indoors to outdoors and back indoors. It can be seen that the mapping has maintained good consistency, and the learned 6-dimensional vector representation has not degraded throughout the process.

3.Generalization experiments

For the tunnel scene, VLP-16 and Ouster 128-line lidar are used for testing. As can be seen from the figure below, using Ji zhang’s method (that is, finding the minimum eigenvalue of a specific matrix), for the two sensors, the difference in the minimum eigenvalue is huge. However, the method proposed in this paper can still detect the degradation of a specific segment of the tunnel relatively well, and it is consistent with the tunnel experiment in the field experiment.

Personal evaluation

The innovation of this work lies in the evaluation of degradation directly on the point cloud instead of the result of registration. The data generation and training methods are relatively straightforward and easy to understand, and the experimental results are also very good.

This method is applied to a system similar to LOAM, so when making a training set, only the matching result of a single-frame point cloud is used to generate a degraded label. For direct scan-map registration methods like Fast-LIO, this method of making training sets and training is not necessarily reliable: because in some scenarios, scan-scan registration degradation may occur, but scan-map does not It must be degenerate (the local map has richer information). One of the ways to solve this problem is to use the scan-map registration method to evaluate the registration error during training.

Request User Manual

Request Datasheet