In the realm of 3D urban point cloud analysis, various methods exist, both semi-automatic and automatic. While this field shows great potential, a consensus on the best detection, segmentation, and classification methods is yet to be reached. To encourage innovation, we’ve gathered LiDAR dataset for all to access, aiming to inspire new approaches in detection, segmentation, and classification.
1. WHU-TLS Point Cloud
The Wuhan University Institute of Space Intelligence, in collaboration with the Technical University of Munich, Finnish Geospatial Research Institute, Norwegian University of Science and Technology, and Delft University of Technology, has released the world’s largest and most diverse TLS (Terrestrial Laser Scanning) point cloud registration benchmark dataset. This publicly available WHU-TLS benchmark dataset includes 11 different environments, such as subway stations, high-speed railway stations, mountains, forests, parks, campuses, residential areas, riverbanks, cultural heritage buildings, underground mines, and tunnels. It comprises a total of 115 scanning stations, 1.74 billion 3D points, and the corresponding real transformation matrices between point clouds.
Moreover, this benchmark dataset also provides valuable data for various applications, including railway safety operations, river surveying and management, forest structure assessment, cultural heritage preservation, landslide monitoring, and underground asset management. With its diverse scenes and massive data volume, the WHU-TLS dataset offers an essential resource for the development and evaluation of TLS data registration algorithms and applications.

[Download]
[Referances]: S M Iman Zolanvari, Susana Ruano, Aakanksha Rana, Alan Cummins, Rogerio Eduardo da Silva, Morteza Rahbar, Aljosa Smolic. 2019 DublinCity: Annotated LiDAR Point Cloud and its Applications. 30th BMVC, September 2019.
2. Oakland 3D Point Cloud
The Oakland 3D dataset is collected using the Navlab11 and side-view LMS (Laser Measurement System) laser scanners. The data acquisition took place on the University of Chicago campus in Oakland, Pittsburgh, Pennsylvania. The dataset is provided in ASCII format with x, y, and z coordinates and label confidence, with each point represented in a separate line and space used as a separator. Additionally, corresponding VRML files (.wrl) and label count files (.stats) are also provided.
The dataset consists of two subsets, namely part 2 and part 3, each with its own local reference frame. Each file in the subsets contains 100,000 3D points. To improve the data quality, the dataset has undergone filtering for training/validation and testing purposes. Furthermore, the original 44 labels have been remapped to 5 labels for ease of use and analysis.
Complete | Training | Validation | Testing | |
Statistics | 17 files, 1.6 million 3-D pts, 44 labels | 1 file, 36932 3-D pts, 5 labels | 1 file, 91579 3-D pts, 5 labels | 15 files, 1.3 million 3-D pts, 5 labels |
Label distribution | ![]() | ![]() | ![]() | ![]() |
Snapshots | ![]() | ![]() |
[Download]
[Referances]: Daniel Munoz, J. Andrew (Drew) Bagnell, Nicolas Vandapel and Martial Hebert Conference Paper, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), June, 2009
3. Paris-rue-Madame Dataset
From the charming streets of Paris, the Paris-rue-Madame dataset presents meticulous 3D mobile laser scanner data. This comprehensive dataset, augmented with manual annotations, empowers urban detection, segmentation, and classification methodologies.
The dataset contains two PLY files, with each file containing 10 million points. Each file includes a point list (x, y, z, reflective, label, class). The x, y, z coordinates correspond to the geographic reference coordinates (E, N, U) in the Lambert 93 and altitude IGN1969 (grid RAF09) reference coordinate system. The “reflective” parameter represents the laser intensity, “label” indicates the object labels obtained after segmentation, and “class” represents the object category.

[Download]
[Referances]: A. Serna, B. Marcotegui, F. Goulette and J.-E. Deschaud “Paris-rue-Madame database: a 3D mobile laser scanner dataset for benchmarking urban detection, segmentation and classification methods”. ICPRAM 2014.
4. IQmulus & TerraMobilita Dataset
The IQmulus & TerraMobilita dataset offers an expansive competition dataset, brimming with 300 million 3D points. This wealth of data, manually annotated and intelligently categorized, serves as a cornerstone for semantic understanding in 3D point cloud analysis.
All coordinates correspond to the geographic reference coordinates (E, N, U) in the Lambert 93 and altitude IGN1969 (grid RAF09) reference system. The reflectance represents the laser intensity. An offset has been subtracted from the XY coordinates to improve data accuracy. Each file contains the following attributes:
- (float32) X, Y, Z: Cartesian geographic reference coordinates in the Lambert 93 system.
- (float32) X, Y, Z: Original coordinates.
- (float32) Reflectance: Backscatter intensity corrected for distance.
- (uint8) Num_echo: Number of echoes (processing multiple echoes).
Each processed file provided by each participant must include the original points in the same order as the PLY file and have two additional attributes as follows:
- (uint32) Id: Unique identifier/label for each segmented object.
- (uint32) Class: Classification results for each segmented object using semantic class labels. Points with the same ID must have the same class.
As each point in the dataset contains an ID and a class, computations will be performed on a point-wise basis.

[Download]
[Referances]: Bruno Vallet, Mathieu Brédif, Andrés Serna, Beatriz Marcotegui, Nicolas Paparoditis. TerraMobilita/IQmulus urban point cloud analysis benchmark. Computers and Graphics, Elsevier, 2015, Computers and Graphics, 49, pp.126-133. https://hal.archives-ouvertes.fr/hal-01167995v1
5. The District of Columbia Dataset
Unlocking the secrets of Washington D.C., this dataset provides public access to classified point cloud data, inviting exploration of the district’s geographical characteristics for diverse applications:
- Class 1: Processed, but unclassified
- Class 2: Bare earth
- Class 7: Low noise
- Class 9: Water
- Class 10: Ignored ground
- Class 11: Withheld
- Class 17: Bridge decks
- Class 18: High noise
Researchers, professionals, and the general public can utilize this dataset to gain insights and information related to the geographical features and characteristics of the District of Columbia. The point cloud data, along with its classification, allows for various geospatial analyses and applications in fields such as urban planning, environmental assessment, and infrastructure management.
[Download]
6. Semantic3D Dataset
For in-depth semantic segmentation evaluation, Semantic3D stands tall, encompassing over 4 billion labeled points. Its dynamic urban scenes empower researchers and practitioners to benchmark segmentation algorithms effectively.
The dataset has been created for the purpose of semantic segmentation evaluation in 3D scenes. Within this framework, Semantic3D provides the following:
- A substantial point cloud dataset containing over 4 billion labeled points.
- Ground truth labels manually annotated by professional evaluators.
- A universal evaluation tool that includes established cross-association metrics and a comprehensive confusion matrix.
Researchers and practitioners can utilize Semantic3D to evaluate and benchmark the performance of semantic segmentation algorithms on complex and diverse 3D scenes. The dataset’s large scale and high-quality annotations make it a valuable resource for advancing the field of 3D point cloud processing and semantic understanding.

[Download]
[Referances]: Hackel T, Savinov N, Ladicky L, et al. Semantic3d. net: A new large-scale point cloud classification benchmark[J]. arXiv preprint arXiv:1704.03847, 2017.
7. Paris-Lille-3D Dataset
A meticulous benchmark for classification algorithms, the Paris-Lille-3D dataset showcases laser-scanned data from Paris and Lille. Meticulous hand-labeling into 50 categories ignites advancements in automated point cloud analysis.
- (float) x, y, z: The position of the point.
- (float) x_origin, y_origin, z_origin: The location of the LiDAR.
- (double) GPS_time: The time of point cloud acquisition.
- (uint8) reflectance: The reflectance of the point.
- (uint32) label: The label to which the point belongs.
- (uint32) class: The category to which the point belongs.
Researchers can leverage the Paris-Lille-3D dataset to evaluate and compare the performance of their point cloud classification algorithms. The availability of hand-labeled data with various categories makes it a valuable resource for advancing the field of point cloud analysis and semantic understanding in 3D scenes.

[Download]
[Referances]: Roynard X, Deschaud J E, Goulette F. Paris-Lille-3D: A Point Cloud Dataset for Urban Scene Segmentation and Classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2018: 2027-2030.
8. DublinCity Dataset
Delving into Dublin’s heart, this dataset captures 1.4 billion point clouds, annotated across 13 classes and 3 levels. An invaluable resource for urban modeling and semantic understanding, DublinCity fosters research in complex urban environments.
The dataset is annotated into 13 classes across 3 levels (as shown in Figure 7a):
- Level 1: This level includes coarse labels with four categories: (a) Building, (b) Ground, (c) Vegetation, and (d) Undefined. Buildings represent the shapes of habitable urban structures, such as houses, offices, schools, and libraries. Ground primarily consists of points at terrain elevations. The Vegetation category includes all types of plants. Lastly, Undefined points encompass less desirable points that may be present within urban elements, such as garbage bins, decorative sculptures, cars, benches, lamp posts, mailboxes, and non-static objects. Approximately 10% of points are labeled as Undefined, mainly representing points over rivers, railways, and construction sites.
- Level 2: In this level, the three categories from Level 1 are further classified. Buildings are labeled as roofs and outer walls; Vegetation is categorized into different types of plants, such as trees and shrubs; Ground points are classified as streets, sidewalks, and grass.
- Level 3: This level includes any type of windows and doors on roofs (e.g., skylights and rooftop windows) and outer walls.

The DublinCity dataset provides a valuable resource for research and development in 3D urban modeling, point cloud analysis, and semantic understanding in urban environments.
[Download]
[Referances]: S M Iman Zolanvari, Susana Ruano, Aakanksha Rana, Alan Cummins, Rogerio Eduardo da Silva, Morteza Rahbar, Aljosa Smolic. 2019 DublinCity: Annotated LiDAR Point Cloud and its Applications. 30th BMVC, September 2019.
Join us in this exploration of LiDAR datasets, and share your insights on the crucial parameters you prioritize when acquiring LiDAR technology. In exchange for your valued input, the most upvoted comment will be rewarded with a complimentary 1*Vi-Beam1L LiDAR device, including free shipping.