
EarthScape: A Multimodal Dataset for Surficial Geologic Mapping and Earth Surface Analysis. Advancements in deep learning and the proliferation of remote sensing imagery present an opportunity to expand surficial geologic mapping, overcoming the limitations of tedious and biased traditional workflows.
Interesting paper from Massey and Imran (2025) with supporting Open source code and data in GitHub. Surficial geologic map units are in the dataset, capturing 3 dominant geological processes, fluvial transport and deposition, gravitational sedimentation, and in-situ weathering of bedrock.
Alluvium consists of unconsolidated sediments deposited by active river processes in floodplains and riverbeds. Terrace deposits are older deposits of alluvium, but elevated above current floodplains, left behind as rivers incised their valleys. Alluvial fans are fan-shaped deposits formed where high-gradient streams suddenly lose velocity, causing rapid sediment deposition; these deposits can sometimes signify areas prone to hazardous debris flows. Colluvium represents unconsolidated materials on slopes that are actively eroding due to gravity, while colluvial aprons are more stable deposits found at the bases of slopes. Residuum consists of in-situ weathered material overlying its bedrock parent. Artificial fill represents anthropogenic materials used to modify landscapes for construction and infrastructure projects.
Abstract
Surficial geologic mapping is essential for understanding Earth surface processes, addressing modern challenges such as climate change and national security, and supporting common applications in engineering and resource management. However, traditional mapping methods are labor-intensive, limiting spatial coverage and introducing potential biases. To address these limitations, we introduce EarthScape, a novel, AI-ready multimodal dataset specifically designed for surficial geologic mapping and Earth surface analysis. EarthScape integrates high-resolution aerial RGB and near-infrared (NIR) imagery, digital elevation models (DEM), multi-scale DEM-derived terrain features, and hydrologic and infrastructure vector data. The dataset provides detailed annotations for seven distinct surficial geologic classes encompassing various geological processes. We present a comprehensive data processing pipeline using open-sourced raw data and establish baseline benchmarks using different spatial modalities to demonstrate the utility of EarthScape. As a living dataset with a vision for expansion, EarthScape bridges the gap between computer vision and Earth sciences, offering a valuable resource for advancing research in multimodal learning, geospatial analysis, and geological mapping.