Introduction

Landforms are the result of interior and exterior forces of ecological, hydrological, geomorphological, and geological processes that shape the Earth's surface1. The shapes and sizes of volcanic landforms range from small scoria cones to large flood basalt2, which are classified into polygenetic and monogenetic volcanoes3. The first type includes a shield, composite, and caldera volcanoes. The second category includes tuff rings and cones, mafic tiny centres (e.g., scoria and spatter cones), and maars and kimberlites4. Volcanoes are among the most complicated and dynamic landforms. Tectonic setting, eruption way, lava composition and volume, surface environment, and age contribute to the significant morphological diversity of volcanic landforms5. Volcanoes affect the terrain in the surrounding areas. Volcanism influences the landscape of a region in many ways, including the patterns and types of fissures and vents, the duration of its activity, the relative age of the volcanism, the composition and physical characteristics of the extruded material, and the amount and extent of erosion6. Therefore, their locations and features must be detected and mapped.

Glacial landforms in mountains are a critical water source for the future, particularly in semi-arid and arid regions7. Studies have revealed that approximately 2.15% of the world's drinkable water is stored as ice in polar and mountainous glaciers, with their residence times ranging from 20 to 100 years8,9. Typically, glacial landforms can be determined using a combination of the visible and shortwave infrared band ratios and Synthetic Aperture Radar (SAR) and topographic datasets10. These techniques, on the other hand, are insufficient to detect and map glacial landforms, which are spectrally indistinguishable from the surrounding paraglacial terrain11. In addition, atmospheric and earth factors may affect the spectral reflectance of glacial landforms12. Therefore, accurate information and inventories of glacial landforms are essential to their management.

Advances in satellite remote sensing methods have provided accurate information from vast and inaccessible volcanic areas. Satellite remote sensing overcomes the limitations of field-based methods such as global positioning system (GPS), which can only be applied locally and during specific seasons and months. Using this method, volcanoes can also be monitored and mapped despite their remoteness and harsh nature13,14. In previous studies, machine learning methods15,16,17,18,19 and object-based image analysis (OBIA)6,13,14,20,21 were used to detect and map the Earth's landforms. Image classification techniques based on machine learning have provided accurate information about the Earth's features22. As a strong alternative to traditional statistical methods, machine learning techniques that combine computational power with big data, are able to capture non-linear behaviors and learn as new data arrive23. These techniques like Support Vector Machine (SVM) and logistic regression, on the other hand, need to pre-process the datasets using Histogram of Oriented Gradients methods or smoothing filters to overcome a specific classification problem. Additionally, the detection of the Earth's features is largely carried out at the pixel level in machine learning-based approaches24. Since landforms are representative of complicated features, expert knowledge plays a key role in the accuracy of their detection with conventional rule-based methods in geographic object-based image analysis (GEOBIA)25. Nonetheless, determining the appropriate thresholds for grouping objects into landforms based on each geomorphological diagnostic factor is a challenging task26. Classification methods using only spectral information, such as the parallelepiped, the maximum likelihood, and the minimum distance from the mean are not sufficient for analyzing multispectral images at high resolutions. An object with a given earth's feature tends to be represented by pixels with heterogeneous spectral reflectance characteristics in high spatial resolution images27.

Recently, various machine learning-driven methods such as deep learning have been integrated with GEOBIA for the detection and mapping of land use/cover28,29,30,31, gully erosion32,33, and landslide34,35,36. According to the literature review, there is a limited number of research explored the efficiencies of an integrated GEOBIA and convolutional neural network (CNN) to delineate volcanic and glacial landforms.

Aim

In this study, we investigate the performance of an integrated GEOBIA and CNN approach for volcanic and glacial landforms mapping in Sahand Volcano, Iran by using Sentinel-2 imagery as the foundational dataset and secondary data, such as the Digital Elevation Model (DEM), slope, aspect, flow accumulation, and curvature. We also intend to compare the effectiveness of Ground Control Points (GCPs) gathered from the study area and objects that were generated by GEOBIA for training and validating the landform's CNN model.

Datasets and methodology

Datasets

For volcanic and glacial landforms mapping, we acquired freely available Sentinel-2 imagery (with bands 2 (Blue), 3 (Green), 4 (Red), and 8 (NIR)). Although a high-resolution, freely available DEM for the study area exists (with spatial resolution of 12.5 m), it was necessary to have consistent and comparable datasets that would be applicable to other locations. Due to this, a national topographic map at a scale of 1:25,000 was used to drive DEM. With the spatial analysis carried out in the ArcGIS environment, secondary datasets were generated, including aspect, slope, curvature, and flow accumulation with a spatial resolution of 12.5 m (Fig. 1). In Table 1, we list the characteristics of predisposing variables for volcanic and glacial landforms.

Figure 1

Various predisposing variables for volcanic and glacial landforms mapping, including (a) DEM, (b) aspect, (c) curvature, (d) slope, (e) flow accumulation, and (f) Sentinel-2 image.

Full size image
Table 1 Characteristics of predisposing variables for volcanic and glacial landforms mapping.
Full size table

To train the CNN models, we used an inventory map of volcanic and glacial landforms, delineated outlines generated from semi-automated GEOBIA, and ground control points (GCPs). All 935 GCPs (Fig. 7) were collected from the study area with GPS, geomorphological maps, and Google Earth to validate the accuracy of GEOBIA-generated objects and CNN. 70% of these datasets were used to train models, while the remainder (30%) were employed to verify classification accuracy.

Methodology

An overview of the methodology for detecting and mapping volcanic and glacial landforms is shown in Fig. 2. In the first step, we segmented our datasets using the Multi-Resolution Segmentation (MRS) algorithm in the eCognition software (www. geospatial.trimble.com). Our next step was to generate landform image objects based on geometrical, textural, spectral and contextual features in GEOBIA. To train the landform CNN models, not only generated objects from GEOBIA but also an inventory map of volcanic and glacial landforms as well as GCPs were used. Finally, seven evaluation indexes, including intersection over union (IOU) values, recall (RC), precision (PC), specificity (SP), F-measure (FM), accuracy (ACC), kappa (KP), Fivefold cross validation as well as fuzzy synthetic evaluation (FSE) were employed to validate the accuracy of the classification results.

Figure 2

An overview of the methodology used for detecting and mapping volcanic and glacial landforms.

Full size image

Geographic object-based image analysis

GEOBIA is an image analysis paradigm, which relies on groups of homogeneous pixels to classify images' features37. These image objects reveal real-world entities that are tested based on their textural, spectral, contextual, and geometrical features. Image objects are then classified on the basis of their features using a cohesive methodological platform, integrating multi-scale regionalization methods augmented with nested representations and rule-based classifiers38. Such an integrated framework is linking digital-based remote sensing and vector-based GIS domain39. GEOBIA includes two main steps in this study, image segmentation and feature extraction.

Image segmentation

It has been argued that determining an optimal segmentation parameter value is a heuristic, subjective, challenging, and time-consuming process in the GEOBIA literature40,41,42. As a result, several GEOBIA software options have been developed to increase objectivity and automate the process of determining the optimal value, which are categorized into two groups namely free and open-source (GRASS GIS, Orfeo Toolbox, InterIMAGE, and RSGISLib) and commercial software (Trimble eCognition, L3Harris Geospatial ENVI Features Extraction, Esri ArcGIS Pro, and PCI Geomatica) options43. Based on an overview of over 200 GEOBIA-based researches for earth's features classification40, found Trimble eCognition software the most popular (81%) image-segmentation method for satellite-based classification, while 4% employed L3Harris Geospatial ENVI Features Extraction. Therefore, this study used eCognition software to extract landform-derived objects for CNN training. Table 2 represents a brief description of all available software/tools for object-based segmentation. Figure 3 also portrays segmentation results for eCognition ESP2, ENVI Feature Extraction, and GRASS GIS SPUS PO.

Table 2 All available software/tools, their algorithms, and source for object-based segmentation.
Full size table
Figure 3

Segmentation results for, (a) ENVI feature extraction, (b) eCognition ESP2, (c) GRASS GIS SPUS PO, and (d) original Sentinel-2 image.

Full size image

Following the preparation of the image dataset and the configuration of the legend, selected bands of the image dataset and predisposing variables are used for segmentation and merging in order to split the scene into multiple components. As a result of this combined step, GEOBIA is able to produce real-world images representing real geographic objects44. Multi-resolution segmentation (MRS) algorithm was used in this study to generate a segmentation image using the software package eCognition Developer (Trimble Geospatial, Munich). The MRS method is a nearly complicated image and user-based algorithm, which generates a polygonal object based on the bottom-up strategy. The highly correlated adjacent pixels are initially segmented into objects. Through this process, random seed pixels are chosen that are suited appropriately for merging, and then homogeneity within the same object and heterogeneity between objects are maximized. This procedure is repeated until all the object conditions, which are controlled by color, compactness, and scale are met45. Of these parameters, scale is the most important factor46. In image segmentation, the scale parameter determines the size of the objects that appear in the image. As this parameter increases, the image becomes roughly divided. The shape parameter weights the object's shape based on its spectral color. In this regard, spectral characteristics are more influential in segmenting images when the value is small. Compactness is the ratio between the boundary and the entire object47. We tested several scale parameters based on previous knowledge and used an interactive "trial-and-error" approach to segment images into homogeneous objects. We investigated different scale, shape and compactness parameters, which are presented in Table 3. Figure 4 shows the applied scale parameter to select the most appropriate scale value.

Table 3 Variations of segmentation parameters using the MRS method to detect and map volcanic and glacial landforms.
Full size table
Figure 4

Various image segmentation scale used in MRS method: (a) with the scale factor of 200, (b) with the scale factor of 400, (c) with the scale factor of 600, and (d) original Sentinel-2 image.

Full size image

Object-based features extraction

As shown in Table 4, the segments were selected for volcanic and glacial landforms mapping based on their spectral, textural, and geometrical characteristics. 19 variables were derived from eCognition in order to get as many variables as possible (Table 5). AND fuzzy-based operator then employed in eCognition to classify volcanic and glacial landforms based on fuzzy membership values. Not only GCPs collected by GPS, but also the existing geomorphological map, as well as aerial images, were employed to acquire training data. In sum, 935 sample points were used to identify the most appropriate threshold values for object features and to train CNN models. A rule-based approach is necessary to identify and apply object features to landform classes in GEOBIA. As a result, we incorporated training data along with the efficiency of related spatial and spectral object features for each landform class, obtained through fuzzy threshold values (Table 5). Figure 5 illustrates the performance of some of the training data over selected object-based features.

Table 4 Various spectral, geometrical and textural features relevant to delineating volcanic and glacial landforms and their variables.
Full size table
Table 5 Volcanic and glacial landforms classes and their corresponding thresholds values for object-based features.
Full size table
Figure 5

Some samples from object-based features and targeted landforms, including: (a) GLCM Mean (volcanic tuff), (b) GLCM SD (glacial cirque), (c) length/width (glacial valley), (d) shape index (caldera), (e) density (suspended valley), and (f) shape index (volcanic cone).

Full size image

Convolutional neural network

A convolutional neural network (CNN) is a type of machine learning technique that works with arrays of data, such as one-dimensional signals or sequences as well as two-dimensional visible-light images or audio spectrograms48. Images are consisted of two-dimensional arrays of data, which makes CNN an appropriate tool for image analysis. The CNN is the most common type of deep neural network applied to remote sensing images due to its high generalization capabilities, derived from the features it extracts and its ability to train on extremely large datasets49,50. Neurons are the building blocks of all layers in a neural network. Each neuron represents a convolutional layer aimed at automatic feature extraction from the input image51,52. Figure 6 illustrates the CNN modelling structure for volcanic and glacial landforms.

Figure 6

CNN structure for volcanic and glacial landforms.

Full size image

Hardware and software

This study used the Keras Python Package based on the TensorFlow backend to construct and train the GEOBIA-CNN model for volcanic and glacial landforms. The specifications of the computer system employed were Intel Core i7-6700 K, VGA (GTX 1080), HDD (256 GB SSD + 1 TB SATA) and 32 GB memory. Python programming language based on Spyder software was utilized to implement all the prediction models. In addition to user-friendliness, modularity, and extensibility, Keras provides users with easy and fast prototyping capabilities.

Training step

We used 16 and 9 (Table 5) convolutional layers to train our CNN models for volcanic and glacial landforms, respectively (Table 6). There are several factors involved in each convolutional layer, including a pooling operation, multiple weights, and an activation function. Max-pooling was used with \(2 \times 2\) filters and a two-pixel stride to down-sample the feature maps in the encoder based on a maximum operator by taking the maximum of each \(4 \times 4\) matrix and putting it in the output. Landform detection in this study was done using a \(128 \times 128\) pixel input window. Totally, we applied a twenty-four-layer CNN model for landform detection separately. We fed the twenty-four -layer depth CNN separately with all input window sizes of the training sample patches using nineteen variables (Table 5). In this regard, the input sample patch had \(a \times a \times 19\) units, where \(a \times a\) denotes the size of one layer of sample patches (\(128 \times 128\)), and 19 is all number of the layers required for the analysis. Several convolutions were performed on input using different filters \(\left( {2 \times 2} \right)\) resulting in distinct feature maps. All these feature maps are stacked together to form the convolution layer. By stacking all features along the depth dimension, we generated the final landform outputs volume of \(128 \times 128 \times 19\) by the network by using 24 different filters (one filter per convolutional layer).

Table 6 Characteristics of employed GEOBIA-CNN models for volcanic and glacial landforms.
Full size table

Layers of convolution are applied to the valid portions of the image (without any kind of padding) and they are associated with a convolution combined with an activation function to introduce nonlinearity53. There are different activation functions (e.g., sigmoid, softmax, tanh, hyperbolic tangent, ReLU and Leaky ReLU), which are required for the forward propagation and its derivative for backpropagation54. Sigmoid, tanh, hyperbolic tangent and Softmax are typically used in normal neural networks. Rectified Linear Unit (ReLu), on the other hand, is commonly used in CNN algorithms due to their superior performance55. Thus, ReLu function was employed in this study to train landform models. In Eq. (1), the Rectified Linear Unit (ReLU) parameters are defined as follows:

$$ f\left( x \right) = \left\{ {\begin{array}{*{20}c} x & {if\,x > 0} \\ 0 & {if\,x \le 0} \\ \end{array} } \right. = {\text{max}}\left( {x, 0} \right) $$
(1)

Loss/cost function

Since training is an iterative process, the loss/cost function is necessary to quantify how good the current state of the network (with specific sets of weights) is. This function is based on the principle of increasing forecast accuracy and reducing errors in the network in order to optimize output at the lowest cost possible56. There are several loss/cost functions for problem-solving in classification, including Mean Squared Error (MSE), Cross-Entropy, and Mean Absolute Error (MAE), and subsequently used Cross-Entropy loss (log loss)57.

Since landform classification is a binary classification (0 = No Landform and 1 = Landform), Cross-Entropy is employed in this study to measure the performance of a classification model58,59,60. It is given by the following equation:

$$ L\left( {y, \hat{y}} \right) = - \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left( {y_{i} \log \left( {\hat{y}_{i} } \right)} \right) + \left( {1 - y_{i} } \right){\text{log}}\left( {1 - \hat{y}_{i} } \right)) $$
(2)

where \(N\) represents the number of sample datasets, \(y_{i}\) denotes the actual output of ample \(i\), which is equals to 0 or 1, \(\hat{y}_{i}\) presents the forecasted possibility sample \(i\) having output 1, and \(y_{i} ,\hat{y}_{i}\) are the vectors of actual outputs and forecasted possibilities.

Optimization

This study used ADAM to optimize the results of landform-based models. This function can replace the SGD algorithm and take advantage of AdaGrad and RMSprop, which have better performance in sparse gradients and unstable conditions, respectively61,62. Equations (3) and (4) are defined the ADAM optimizer:

$$ m_{t}^{\left( j \right)} = \beta_{1} m_{t - 1}^{\left( j \right)} + \left( {1 - \beta_{1} } \right)g_{t}^{\left( j \right)} $$
(3)
$$ v_{t}^{\left( j \right)} = \beta_{2} v_{t - 1}^{\left( j \right)} + \left( {1 - \beta_{2} } \right)(g_{t}^{\left( j \right)} )^{2} $$
(4)

where \(\beta_{1}\) and \(\beta_{2}\) are commonly chosen to be 0.9 and 0.999, respectively. The first and second moments are then bias-corrected:

$$ \hat{m}_{t}^{\left( j \right)} = \frac{{m_{t}^{\left( j \right)} }}{{1 - \beta_{1}^{t} }},\quad \hat{v}_{t}^{\left( j \right)} = \frac{{v_{t}^{\left( j \right)} }}{{1 - \beta_{2}^{t} }} $$
(5)

And used to weight the update:

$$ w_{t + 1}^{\left( j \right)} = w_{t}^{\left( j \right)} - \frac{\alpha }{{\sqrt {\hat{v}_{t}^{\left( j \right)} + \varepsilon } }} \hat{m}_{t}^{\left( j \right)} $$
(6)

where \(\alpha\) is the initial learning rate, which the default value for it is 0.001.

Accuracy assessment

Fuzzy synthetic evaluation

GEOBIA uses fuzzy decision rules and membership values as the basis of object-based classification, which makes it a "soft classifier" approach. Due to the segmentation process, scale regulation, and fuzzy decision rules, it is difficult to assign true or false labels to objects in a binary mode63. As a consequence, we applied Fuzzy Synthetic Evaluation (FSE) for the accuracy assessment of the classification results. Two groups of data, including control point data and the respective rate obtained for each point are used in the FSE to calibrate the overall and per-class accuracy of classified maps using GOBIA according to two steps64. The first step is to compute the classification confidence or magnitude of error for each class using the Difference fuzzy function. To obtain the single accuracy value, the second step weights the Difference function categories. Following these two steps, the degree of confidence in the classification can be calculated based on the ratio of matches between sample and reference data, based on their respective interpretation confidence ratings (ICR), for which default values have been suggested65. A combination of GPS data, control points from Google Earth, high-resolution aerial photographs, and geomorphology maps (scale of 1/25,000) was incorporated in our research as reference datasets.

The FSE operates by assuming that \(N\) is the landform classes in the classified map, labeled \(C_{1}\) to \(C_{N}\), organized as a set of \({\Omega } = \left\{ {C_{1} {\text{ to }}C_{N} { }} \right\}\), and that each piece sample observation is associated with one class and only one Difference category \(D = \left\{ {VHCC, .., VHE} \right\}\). For the FSE appliance, let \(P_{N, d}\) to be equal the proportion of observations from the map class \(C_{N}\) in the Difference category \(d\), and \(W_{d}\) equal the assigned weight. Equation (7) can be used to calculate the estimated accuracy \(P_{m}\) for map class \(C_{m}\):

$$ P_{N} = \sum d P_{N, d} \times W_{d} { } $$
(7)

Table 7 displays Different categories of confidence in classification and magnitude of errors. The results of validation analysis using the FSE approach for volcanic and glacial landform classification are shown in Table 8. As shown in Table 8, GEOBIA demonstrates satisfactory performance for landform-based object extraction, with FSEs of 0.9204, 0.917, 0.934, 0.933, 0.921, 0.904, 0.918, and 0.909 estimated for dacite lava, caldera, andesite lava, volcanic cone, volcanic tuff, glacial circus, glacial valley, and suspended valley, respectively.

Table 7 The Difference function and its respective default values.
Full size table
Table 8 Sample observation proportions for Difference categories and validation of object-based features for volcanic and glacial landforms.
Full size table

Quantitative methods

To evaluate our CNN segmentation models for volcanic and glacial landforms, we used seven evaluation indexes, including intersection over union (IOU) values, recall (RC), precision (PC), specificity (SP), F-measure (FM), accuracy (ACC), and kappa (KP), which are defined as:

$$ IOU = \frac{AO \cap EO}{{AO \cup EO}} = \frac{TP}{{TP + FP + FN}} $$
(8)
$$ Recall = \frac{TP}{{TP + FN}} $$
(9)
$$ Specificity = \frac{TN}{{TN + FP}} $$
(10)
$$ Precision = \frac{TP}{{TP + FP}} $$
(11)
$$ F{\text{-}}measure = 2 \times \frac{Precision \times Recall}{{Precision + Recall}} $$
(12)
$$ Accuracy = \frac{TP + TN}{{TP + TN + FN + FP}} $$
(13)
$$ Kappa = \frac{{TP + TN - TP_{expected} - TN_{expected} }}{{TP + TN + FN + FP - TP_{expected} - TN_{expected} }} $$
(14)
$$ TP_{expected} = \frac{{\left( {TP + FP} \right) \times \left( {TP + FN} \right)}}{TP + TN + FN + FP} $$
(15)
$$ TN_{expected} = \frac{{\left( {TN + FN} \right) \times \left( {TN + FP} \right)}}{TP + TN + FN + FP} $$

where \(AO\) is actual output; \(EO\) is on behalf of expected result; \(TP\), \(FP\), \(FN\), and \(TN\) are true positive, false positive, false negative, and true negative, respectively.

Table 9 illustrates the results of CNN segmentation models for volcanic and glacial landforms. As we see in Table 9, CNN segmentation models performed well with ACC of > 0.9600 for volcanic and glacial landforms. Based on Table 9, CNN segmentation models demonstrate the highest performance of these approaches for volcanic and glacial landforms mapping, so that the ACC 0.9685, 0.9780, 0.9614, 0.9767, 0.9675, 0.9718, 0.9600, and 0.9778 were estimated for dacite lava, caldera, andesite lava, volcanic cone, volcanic tuff, glacial circus, glacial valley, and suspended valley, respectively. In addition, CNN segmentation models were performed well with KP of > 0.9000, FM of > 0.9200, SP of > 0.9700, PC of > 0.9500, RC of > 0.8600, and IOU of > 0.8500 for volcanic and glacial landforms mapping (Table 9). We also used loss and accuracy operations in Python based Spyder software to evaluate the accuracy of the integrated GEOBIA-CNN classification results for volcanic and glacial landforms mapping (Table 10). According to Table 10, CNN segmentation models achieved an accuracy of > 0.9600.

Table 9 Results of evaluation indexes for volcanic and glacial landforms.
Full size table
Table 10 Results of loss and accuracy operations estimated using the Python based Spyder software for volcanic and glacial landforms.
Full size table

K-fold cross validation

K-fold cross validation is one of the validation methods for learning-based classification approaches. The results of classification can be validated by randomly assigning the initial dataset to different groups. Here, one set is used for validation and the other K-1 set is used for training65. This study employed fivefold cross validation to validate the results of the landform classification. The data is divided into five sets, and one set is used for validation and the other four for training. All five sets should be processed in the same way. The results of the fivefold cross validation are given in Table 11. As we see from Table 11, the integrated GEOBIA-CNN performed well (Cross validation accuracy > 0.9400) for volcanic and glacial landforms mapping.

Table 11 Fivefold cross validation accuracy for volcanic and glacial landforms.
Full size table

Location and geomorphological features of the study area

Sahand volcano is located in the north-west of Iran (Fig. 7). Geologically, Sahand volcano is very complex. The Paleocene and Miocene geologic deposits make up the majority of Sahand volcano. The Quaternary volcanic structures of Sahand volcano are aligned roughly parallel to the northwest-southeast trend of the Tabriz fault. There are 17 peaks higher than 3000 m in this volcanic complex, including Sahand's highest peak at 3707 m. Sahand's volcanic deposits cover an area of approximately 3000 km2, making it one of the most extensive post-collisional volcanic systems spanning eastern Anatolia, Armenia, and northwestern Iran66.

Figure 7