LAS formal class

A LAS class is a representation in R of a las file that aims to respect as closely as possible the official LAS specification that describes the file format. “As closely as possible” means that, due to R internal limitations, it is not possible to represent a las file exactly as it should be represented. Additionally, some aspects of the las format specifications are not supported in lidR. Still, the contents of a LAS object must reflect the fact that it is a representation of a standardized format, so some restrictions are imposed on users.

Build a LAS object reading a las file

The function readLAS reads one or several .las or .laz file(s) to build a LAS object.

LASfile <- system.file("extdata", "example.laz", package="rlas")
las <- readLAS(LASfile)
print(las)
#> class        : LAS (v1.0 format 1)
#> memory       : 5.5 Kb 
#> extent       : 339002.9, 339015.1, 5248000, 5248001 (xmin, xmax, ymin, ymax)
#> coord. ref.  : NAD83 / UTM zone 17N 
#> area         : 9.9 m²
#> points       : 30  points
#> type         : airborne
#> density      : 3.03 points/m²
#> density      : 2.63 pulses/m²

Basic structure of a LAS object

A LAS object is composed of four slots: @data, @header, @crs and @index.

@data: the point cloud

The slot data of a LAS object contains a data.table with the data read from .las or .laz file(s). The columns of the table are named after the LAS specification version 1.4. Each name is reserved and is associated with a given type:

  • X Y Z (dbl)
  • Intensity (int)
  • gpstime (dbl)
  • ReturnNumber (int)
  • NumberOfReturns (int)
  • ScanDirectionFlag (int)
  • EdgeOfFlightline(int)
  • Classification (int)
  • ScannerChannel (int) (point format >= 6)
  • Synthetic_flag (bool)
  • Withheld_flag (bool)
  • Keypoint_flag (bool)
  • Overlap_flag (bool) (point format >= 6)
  • ScanAngleRank (int) (point format < 6)
  • ScanAngle (dbl) (point format >= 6)
  • UserData (int)
  • PointSourceID (int)
  • R G B (int)
  • NIR (int)

Here we can already see some deviations from the official las format specifications. For example, the attribute ‘Classification’ should be an unsigned char stored on 8 bits. However, the R language does not support this data type and consequently this attribute is stored in a 32-bit signed int. One can read the official las specifications to figure out the other deviations from the original file format induced by the fact that R only has 32-bit signed integers and 64-bit signed decimal numbers.

@header: the header

A LAS object contains a slot @header that represents the header of the las file. The header is stored in a LASheader object. A LASheader object contains two slots: @PHB for the public header block and @VLR for the variable length records. Both slots are lists labelled according to the las file format specification. See public documentation of las file format for more information about las headers. Users should never normally have to worry about the header as long as they use functions from lidR. Everything is managed internally to ensure that objects are valid. However, users still need to know that the contents of the header are important, especially when writing LAS objects into las or laz files.

print(header(las))
#> File signature:           LASF 
#> File source ID:           0 
#> Global encoding:
#>  - GPS Time Type: GPS Week Time 
#>  - Synthetic Return Numbers: no 
#>  - Well Know Text: CRS is GeoTIFF 
#>  - Aggregate Model: false 
#> Project ID - GUID:        00000000-0000-0000-0000-000000000000 
#> Version:                  1.0
#> System identifier:        LAStools (c) by rapidlasso GmbH 
#> Generating software:      las2las (version 201011) 
#> File creation d/y:        343/2011
#> header size:              227 
#> Offset to point data:     323 
#> Num. var. length record:  1 
#> Point data format:        1 
#> Point data record length: 28 
#> Num. of point records:    30 
#> Num. of points by return: 26 4 0 0 0 
#> Scale factor X Y Z:       0.001 0.001 0.001 
#> Offset X Y Z:             6e+05 6500000 0 
#> min X Y Z:                339002.9 5248000 973.145 
#> max X Y Z:                339015.1 5248001 978.345 
#> Variable Length Records (VLR):
#>    Variable Length Record 1 of 1 
#>        Description: by LAStools of rapidlasso GmbH 
#>        Tags:
#>           Key 1024 value 1 
#>           Key 3072 value 26917 
#>           Key 3076 value 9001 
#>           Key 4099 value 9001 
#> Extended Variable Length Records (EVLR):  void

@crs: the CRS

The slot @crs contains a crs object from package sf and stores the coordinate reference system (CRS) of the las file. In the official las specifications the CRS is stored in the header. In a LAS object the CRS is stored in the header using the EPSG code of the CRS or a WKT string (depending how it has been recorded in the file), but it is also stored in the slot @crs. This is to ensure it meets R standards and is in accordance with other spatial data packages in the R ecosystem. Consequently, to get a valid LAS object properly written into a las file it is important to set the CRS using the function st_crs(). This function updates the header of the LAS object and the crs

st_crs(las) <- 26917
st_crs(las)

According to LAS specification the CRS system can also be a WKT string when the WKT bit flag is set to TRUE.

las@header@PHB[["Global Encoding"]][["WKT"]] = TRUE

@index: the spatial indexing mode

Records what kind of spatial index will be used internally to perform computations that require spatial indexing. See help("lidR-spatial-index").

Allowed and non-allowed manipulation of a LAS object

R users who are used to manipulating spatial data are likely to be very familiar with the sp or sf packages and all the classes used to store spatial data, such as SpatialPointsDataFrame, SpatialPolygonsDataFrame, sf and so on. The data contained in these classes are freely modifiable by the user because they can be of any type. A LAS object is not freely modifiable because it is a strongly standardized representation of a las file.

For example, users cannot replace the Classification attribute with the value 0 because 0 is a decimal number in R and the ‘Classification’ attribute is an integer. The following throws an error:

las$Classification <- 0
#> Error: Trying to replace data of type integer by data of type double: this action is not allowed

In R 0L is an integer and thus the following is allowed:

las$Classification <- 0L

It would be possible to automatically cast the input into the correct type without throwing an error. But for the lidR package we chose to be very pedantic on this point to avoid any potential problems and because we would prefer users to be careful about the content of their data.

The addition of a new column is also restricted. For example, one may want to add an attribute R corresponding to the red channel.

las$R <- 0
#> Error: Addition of a new column using [[ is forbidden for LAS objects. See ?add_attribute

This is not allowed because a LAS object should always be valid. By allowing the user to add an R column the LAS object would no longer be valid for two reasons:

  1. R is a reserved name of the core attributes and must be an integer. In the example above it is a decimal number.
  2. A LAS file with RGB attributes is of type 3, 7 or 8. As a result the header must be updated, but in the previous example it is not.

In consequence, adding a column must be done via the functions add_attribute, add_lasattribute, add_lasrgb. This way users are forced to read the documentation of these two functions. And yet some restrictions are still in place. For example, the following is not allowed for the same reasons as above:

las <- add_attribute(las, 0, "R")
#> Error: R is part of the core attributes and is a forbidden name.

But anyway, R being R, there is no way to completely restrict editing of objects. Users can always by-pass the restrictions to make LAS objects that are not strictly valid:

las@data$R <- 0

In conclusion, a LAS object is not actually immutable but at least there are some restrictions to ensure that the user is aware that not everything is authorized.

Extra attributes and extra bytes in a LAS object

As we have seen, a LAS object contains a core of attributes associated with reserved names in accordance with the las specifications. It is possible, however, to add more attributes to a LAS object even if they are not part of the core attributes imposed by the las specifications.

Extra attributes

Extra attributes are just like adding a column in a regular table in R. One can freely modify the data using the function add_attribute. It is thus possible to add an attribute to a LAS object. For example, it is possible to attribute an ID to each point and use this value in subsequent code:

las  <- add_attribute(las, 1:30, "ID")
las2 <- filter_poi(las, ID > 15)

But it is important to understand that this attribute is invalid with respect to the las specifications. Thus it can be used at the R level but will not be written in a las file and thus will be lost at write time. Depending on the purpose of this attribute it may or may not be useful to be able to write this extra data. Most of the time the information is only useful at the R level but sometimes it might be appropriate to store the data in a file.

Extra bytes attributes

The las specifications allow for storing extra attributes that are not part of the core attributes. The way to do this is more complex. Basically it is called extra bytes attributes and it implies modification of the LAS object header to indicate that the contents of the file contains more than the core attributes. This is abstracted with the function add_lasattribute.

las  <- add_lasattribute(las, 1:30, "ID", "An ID for each point")

Using this function, the header is updated according to the las specification and thus the extra bytes attributes can be written in the file. lidR supports up to 10 extra bytes attributes. The extra bytes attributes are limited to being of type numeric. Indeed, the las specifications do not allow for storing extra bytes attributes of type string or type boolean. Thus the following fails:

abc  <- sample(letters, 30, replace = TRUE)
las  <- add_lasattribute(las, abc, "ID", "An ID for each point")
#> Error: 'x' must be numeric. LAS format specifications do not allow storing of 'character' extra bytes.

Validation of LAS objects

It is common that users report bugs arising from the fact that a point cloud is invalid. This is why we introduced the function las_check to perform a deep inspection of LAS objects. This function checks if a LAS object is in accordance with the las specifications but also it checks for weird point clouds that could be valid with respect to the specifications but invalid for actual processing. For example, it often happens that a las file contains duplicated points for no valid reason. This may lead to trees being detected twice, to invalid metrics, or to errors in DTM generation, and so on…

las_check(las)
#> 
#>  Checking the data
#>   - Checking coordinates... ✓
#>   - Checking coordinates type... ✓
#>   - Checking coordinates range... ✓
#>   - Checking coordinates quantization... ✓
#>   - Checking attributes type... ✓
#>   - Checking ReturnNumber validity... ✓
#>   - Checking NumberOfReturns validity... ✓
#>   - Checking ReturnNumber vs. NumberOfReturns... ✓
#>   - Checking RGB validity... ✓
#>   - Checking absence of NAs... ✓
#>   - Checking duplicated points... ✓
#>   - Checking degenerated ground points... skipped
#>   - Checking attribute population... ✓
#>   - Checking gpstime incoherances ✓
#>   - Checking flag attributes... ✓
#>   - Checking user data attribute...
#>     🛈 30 points have a non 0 UserData attribute. This probably has a meaning
#>  Checking the header
#>   - Checking header completeness... ✓
#>   - Checking scale factor validity... ✓
#>   - Checking point data format ID validity... ✓
#>   - Checking extra bytes attributes validity... ✓
#>   - Checking the bounding box validity... ✓
#>   - Checking coordinate reference system... ✓
#>  Checking header vs data adequacy
#>   - Checking attributes vs. point format... ✓
#>   - Checking header bbox vs. actual content... ✓
#>   - Checking header number of points vs. actual content... ✓
#>   - Checking header return number vs. actual content... ✓
#>  Checking coordinate reference system...
#>   - Checking if the CRS was understood by R... ✓
#>  Checking preprocessing already done 
#>   - Checking ground classification... no
#>   - Checking normalization... no
#>   - Checking negative outliers... ✓
#>   - Checking flightline classification... yes
#>  Checking compression
#>   - Checking attribute compression... no

Display a LAS object

lidR provides a simple plot function to plot a LAS object in 3D. It is based on the rgl package. The rgl package is amazing but has some problems working with large point clouds. We are currently developing our own viewer to overcome this issue. This viewer is fully compatible with lidR but still in heavy development.

plot(las)

The parameter color expects the name of the attribute you want to use to colorize the points. Default is Z

plot(las, color = "Intensity")

If your file contains RGB data the string "RGB" is supported:

plot(las, color ="RGB")

Memory considerations

This section is of major importance because there are many instances where R is weak at memory management.

Firstly, it is important to note that R only enables manipulation of 32-bit integers and 64-bit decimal numbers. But the las specification states, for example, that the intensity is stored on 16 bits (see previous sections). When read in R it must be converted to 32 bits and therefore will use twice as much memory than is needed. Worse, the return numbers are stored on 3 bits in las files but 32 bits in R, therefore using 11 times more memory than is required. Last but not least, flags are stored on 1 bit, whereas R uses 32 bits. This is 32 times more memory than is needed. As a consequence, a LAS object is 2 to 3 times larger than it needs to be.

Secondly, the way the point cloud is stored and the way R works implies that copies will be made of the point cloud either in the user’s workspace or internally. Considering that point clouds can be huge it is important to be aware of this point.

Deep copies

Let’s assume we have loaded a large las file that uses 1 GB of R memory.

las.original <- readLAS("big_file.las")

Suppose we now want to remove a few outliers above 50 m. One can write the following:

las.denoised <- filter_poi(las.original, Z < 50)

And the user now has two objects:

  • las.original of size 1 GB
  • las.denoised that is also 1 GB, because we only removed a dozen or so points out of millions.

This uses 2 GB of memory. This is how R works. When a vector is subsetted it is necessarily copied. We talk about deep copies. In regular data processing it rarely matters and this behavior is barely noticeable. Indeed, it is rare that data uses a lot of memory. But LiDAR datasets are often massive, and this necessitates that users must carefully consider memory usage to avoid running out of RAM.

Shallow copies

In the previous example we showed a deep copy. A deep copy means that the point cloud is actually copied into the memory. A deep copy occurs when the number of points of the output is different from the number of points of the input. But many functions return the same number of point as the input. In such cases only shallow copies are made. For example, when classifying points into ground and non-ground:

las.classified <- classify_ground(las.original, csf())

In this case the vectors that store the X Y Z coordinates as well as those that store the Intensity, ReturnNumber, NumberOfReturn and other attributes were not modified by the function. Only the contents of the ‘Classification’ attribute were modified. In this case las.classified and las.original, even though they are two different objects, share the same memory for X Y Z, and so on, but the attributes ‘Classification’ are different. In conclusion:

  • las.original is of size 1 GB
  • las.classified is also 1 GB.

But both together they are not equal to 2 GB, but ~1.1 GB because they share the same memory. The content of the original LAS object was shallow copied. An understanding of the concepts of deep and shallow copies is important for optimizing your scripts.

As we have seen, because of the way R is designed, lidR uses a large amount of memory anyway. To deal with this limitation readLAS has two optimizations: the parameter select and the parameter filter.

Parameter select

To save memory only useful data can be loaded. readLAS can take an optional parameter select which enables the user to selectively load the data of interest. For example, one can load only the X Y Z fields. This selection is done at the C++ level while reading and is memory-optimized.

las = readLAS("file", select = "xyz")
las = readLAS("file", select = "xyzi")
las = readLAS("file", select = "* -i -u") # Negation works too

Parameter filter

While select enables the user to select “columns” (or attributes) while reading files, filter allows selection of “rows” (or points) while reading. Again, the selection is done at the C++ level and is memory-optimized so not a single bit is lost at the R level. Removing data at reading time that is superfluous for your purposes saves memory and decreases computation time.

las = readLAS("file", filter = "-keep_first")
las = readLAS("file", select = "xyzi", filter = "-keep_first -drop_z_below 5 -drop_z_above 50")