
Note: This release has yet to be submitted to CRAN.
This package provides tools for semantic segmentation of geospatial data using convolutional neural network-based deep learning. Utility functions allow for creating masks, image chips, data frames listing image chips in a directory, and DataSets for use within DataLoaders. Additional functions are provided to serve as checks during the data preparation and training process. Training can also be conducted by dynamically generated chips (still experimental). The package relies on torch for implementing deep learning, which does not require the installation of a Python environment. Raster geospatial data are handled with terra. Models can be trained using a CUDA-enabled GPU; however, multi-GPU training is not supported by torch in R. Both binary and multiclass models can be trained.
Full details about the package are documented in a PLOS One article:
Maxwell, A.E., Farhadpour, S., Das, S. and Yang, Y., 2024. geodl: An R package for geospatial deep learning semantic segmentation using torch and terra. PLoS One, 19(12), p.e0315127.
A UNet architecture can be defined with 4 blocks in the encoder, a bottleneck block, and 4 blocks in the decoder. The UNet can accept a variable number of input channels, and the user can define the number of feature maps produced in each encoder and decoder block and the bottleneck. Users can also choose to (1) replace all ReLU activation functions with leaky ReLU or swish, (2) implement attention gates along the skip connections, (3) implement squeeze and excitation modules within the encoder blocks, (4) add residual connections within all blocks, (5) replace the bottleneck with a modified atrous spatial pyramid pooling (ASPP) module, and/or (6) implement deep supervision using predictions generated at each stage in the decoder.
A second UNet architecture is implemented with a MobileNet-v2 backbone. This model can be initialized using ImageNet weights for the encoder. The encoder can be frozen or trained during the training loop. If the number of input predictor variables or channels is not three, ImageNet weights are averaged for all input channels in the first layer. If three channels or predictor variables are provided, then the user can choose to use the ImageNet weights or average the weights in the first layer.
Two additional models are provided: UNet3+ and a modified version of HRNet.
A unified focal loss framework is implemented after:
Yeung, M., Sala, E., Schönlieb, C.B. and Rundo, L., 2022. Unified focal loss: Generalising dice and cross entropy-based losses to handle class imbalanced medical image segmentation. Computerized Medical Imaging and Graphics, 95, p.102026.
We have also implemented assessment metrics using the luz package including overall accuracy, F1-score, recall, and precision.
Trained models can be used to predict to spatial data without the need to generate chips from larger spatial extents. Functions are available for performing accuracy assessment.
Utility functions are provided to generate a variety of land surface parameters (LSPs) from a digital terrain model (DTM).
This package is still experimental and is a work-in-progress. We are interested in finding additional contributors/collaborators.
You can install the development version of geodl from GitHub with:
# install.packages("devtools")
devtools::install_github("maxwell-geospatial/geodl")Chapter 15 and Chapter 16 in the free and openly available online text Geospatial Supervised Learning using R serve as the documentation for this package.