Spatial discontinuity often causes poor accuracy when a single model is used for the surface modeling of soil properties in complex geomorphic areas. Here we present a method for adaptive surface modeling of combined secondary variables to improve prediction accuracy during the interpolation of soil properties (ASM-SP). Using various secondary variables and multiple base interpolation models, ASM-SP was used to interpolate soil K^{+} in a typical complex geomorphic area (Qinghai Lake Basin, China). Five methods, including inverse distance weighting (IDW), ordinary kriging (OK), and OK combined with different secondary variables (e.g., OK-Landuse, OK-Geology, and OK-Soil), were used to validate the proposed method. The mean error (ME), mean absolute error (MAE), root mean square error (RMSE), mean relative error (MRE), and accuracy (AC) were used as evaluation indicators. Results showed that: (1) The OK interpolation result is spatially smooth and has a weak bull's-eye effect, and the IDW has a stronger ‘bull’s-eye’ effect, relatively. They both have obvious deficiencies in depicting spatial variability of soil K^{+}. (2) The methods incorporating combinations of different secondary variables (e.g., ASM-SP, OK-Landuse, OK-Geology, and OK-Soil) were associated with lower estimation bias. Compared with IDW, OK, OK-Landuse, OK-Geology, and OK-Soil, the accuracy of ASM-SP increased by 13.63%, 10.85%, 9.98%, 8.32%, and 7.66%, respectively. Furthermore, ASM-SP was more stable, with lower MEs, MAEs, RMSEs, and MREs. (3) ASM-SP presents more details than others in the abrupt boundary, which can render the result consistent with the true secondary variables. In conclusion, ASM-SP can not only consider the nonlinear relationship between secondary variables and soil properties, but can also adaptively combine the advantages of multiple models, which contributes to making the spatial interpolation of soil K^{+} more reasonable.

Scientific management and utilization of soil resources is predicated on correct understanding of the continuous changes in regional soil properties. Spatial interpolation is the main method used to evaluate continuous changes in soil properties [

In recent years, some machine learning methods have been applied to the fields of data mining and spatial interpolation and have demonstrated their predictive accuracy; for example, artificial neural networks (ANN), random forest (RF), and support vector machine (SVM). Furthermore, ANN and SVM have been applied to daily minimum air temperature and rainfall data in some subjects [

In addition, a range of studies have demonstrated that interpolation accuracy and mapping quality can be effectively improved by the use of secondary variables as supplementary information [

In order to solve the global model and secondary variable problems that had long troubled the interpolation method, this study aimed to address some of the outstanding issues, with an overall goal of improving the prediction accuracy of the single interpolation model in areas with complex landforms, using soil K^{+} as an example. We applied analysis of variance (ANOVA) to select secondary variables closely related to the spatial variation of soil K^{+}, integrated secondary variables, and constructed a series of soil property interpolation models. To deal with the discontinuity and spatial variation of soil properties in areas with complex landforms, error surfaces were constructed to enable adaptive partitioning of interpolation surfaces for screening suitable base interpolation models. This paper optimized the screened base interpolation models, and built and coordinated multi-model integration interpolation methods (different combinations of interpolation models were selected for different areas) to realize a high precision simulation of soil properties. We evaluated the performance of the different spatial interpolation methods IDW, OK, OK-Landuse, OK-Geology, OK-Soil, and ASM-SP, and analyzed their predictive capabilities in terms of soil K^{+} maps.

The study area (36°38′–37°29′ N, 99°52′–100°50′ E) is located in the southeast part of the Qinghai Lake Basin, on the Tibetan Plateau, China (^{2}, with an altitude ranging between 3043 and 4516 m, and is characterized by complex landforms, including mountains, hills, tablelands, and plains. Abundant agricultural and husbandry activities are carried out in the area.

The study area is characterized by 6 soil types (^{+} in secondary variables using 110 training samples (

Field sampling of surface soil (0–30 cm) at 110 sampling points was carried out in September 2013 to supplement map data. The mean distance between soil sampling locations was approximately 6.74 km. The sampling sites were designed to cover the whole area and include different landscapes. In order to ensure rational distribution of the sampling points across the different geo-environments, a spatially stratified sampling strategy was applied based on landscape types [^{+} of the samples was measured using sodium hydroxide melting analysis [

The secondary variables were compiled in ArcGIS 10.2, and converted to a resolution ratio of 30 m through resampling. Since the study area covers a comparatively large range of landscape types, and the number of samples was relatively small, the spatial distribution of samples was uneven (

IDW is a deterministic method for multivariate interpolation using a known scattered set of points. Values assigned to unknown points are calculated with a weighted average of the values available at known points. Weights are usually inversely proportional to the power of distance [_{i}

Kriging interpolation is considered the best unbiased linear estimation method [_{i}

As a kind of geostatistical model [_{mn},y_{mn}^{+} at location (_{mn}_{mn},y_{mn}_{mn},y_{mn}_{mn}^{+}. We assumed that _{mn}_{mn},y_{mn}_{mn},y_{mn}^{+} means of the relevant secondary variable as the final interpolated values of OK with secondary variable for the soil K^{+}; that is, the mean was modified with surface modeling of residuals. See

A series of interpolation surfaces of soil properties were generated from the base interpolation models to calculate simulation errors for soil sampling points. The error surface, derived from linear interpolation, was used to determine whether the error of each raster cell exceeded a threshold value. Raster cells below the threshold value were clustered to determine the spatial range of applicability of each interpolation model after multiple iterations. The individual steps are shown in

_{i}_{0}_{0}

_{i}_{i}

_{i1}_{i2}_{ik}_{i}

Independent validation was applied to assess interpolation accuracy. The soil K^{+} sample data were randomly split into two groups, one of which was used for interpolation and the other for validation. A total of 90 soil K^{+} sample points were used for interpolation and the remaining 20 were used for validation.

We assessed the accuracy of the different interpolation methods by comparing the mean error (ME), mean absolute error (MAE), mean relative error (MRE), root mean square error (RMSE), and accuracy (AC) of predicted and measured values. The specific equations used are as follows:

Based on fitted nugget, sill, and range values, the semi-variogram model was selected for analysis of spatial correlation. Other models were considered, including exponential, spherical, Bessel, circular, and Gaussian, while exponential and K-Bessel models were selected for the OK and base interpolation models as they better fitted the data/residuals (

The spatial correlation of residuals showed good performance after removal of the local mean within the different secondary variables (

The secondary variables used for each method were analyzed by ANOVA. The soil K^{+} data were grouped into classes in order to compare soil K^{+} for the different secondary variables. For example, in terms of soil type, the soil K^{+} data were grouped into five classes: alpine meadow soil, chestnut soil, flow sandy soil, meadow marsh soil, and semi-fixed sandy soil, with 32, 54, 10, 6, and 8 samples in each, respectively. The soil K^{+} variances between and within soil types were determined by ANOVA using SPSS 21.0 for Windows.

The ANOVA results comparing the influence of different secondary variables on Soil K^{+} are shown in ^{+}, with significance at the 0.01 level. However, grassland type is poorly correlated with soil K^{+} (significance level of 0.2). This is mainly due to the larger degree of fragmentation of the soil map of grassland types, and the limited number of sample points for some grassland, with some subtypes of grassland having just 1 or 2 sampling points (

The ASM-SP was constructed in three steps: first, a number of base interpolation models were produced (e.g., OK-Landuse, OK-Soil, and OK-Geology); second, the base interpolation models were partitioned by an adaptive method; third, the base interpolation models were combined using a popular combination scheme. The models OK-Landuse, OK-Soil, and OK-Geology were used as the base interpolation models. Adaptive partitioning was conducted on the base interpolation models using the method described in

^{+} for each geological factor and obtain mean surface _{mn}^{+} was correlated to the secondary variables, based on measured values of soil K^{+} (

^{+} was subtracted from the measured value to calculate the residuals of soil K^{+}. The residuals were then interpolated by OK to obtain the residual surface _{mn},y_{mn}

_{mn},y_{mn}^{+} that integrates the secondary variables, which is the base interpolation surface to be integrated.

Based on the method for constructing error surfaces outlined in

On the basis of raster cell optimization, interpolation results of raster cells with the minimum error were selected as the optimal raster cell to be integrated.

The accuracy of ASM-SP for simulating the spatial variation of soil K^{+} was evaluated by comparing the simulation effectiveness of six interpolation methods, namely OK-Landuse, OK-Geology, OK-Soil, IDW, OK, and ASM-SP. Five evaluation indexes, ME, MAE, RMSE, MRE, and AC, were used to independently validate the models (^{+} boundaries as they vary with the changing geo-environment. Second, based on given accuracy thresholds, ASM-SP adaptively screens the optimal prediction area of multiple interpolation models and regroups them in an optimized way. The other methods, OK-Landuse, OK-Geology, and OK-Soil, only consider the influence of secondary variables on the spatial variance of soil K^{+}, but do not further screen and optimize the interpolation results. Thus, they are inferior to ASM-SP in terms of interpolation accuracy.

The predictive capabilities of the six interpolation methods in terms of the soil K^{+} maps are compared in ^{+} distribution, but the accuracy of small scale variations is low. Also, a relatively strong ‘bull’s-eye’ effect is created in areas with greater or fewer sampling points. The simulation surface of OK is smoother and its interpolation range is at an intermediate level. Owing to the smoothing effect of kriging, the range of variation in soil K^{+} is narrower than the true value, which is what has been found in other studies [^{+} and has a moderate interpolation range (1.31–2.38), and can give more details of soil K^{+} distribution in different secondary variables, especially in the abrupt boundary. In contrast, soil K^{+} values of OK and IDW interpolation map did not have the discrete information. The method has stronger adaptability to the spatial interpolation of soil properties in areas with complex landforms, which allowed it to describe the patterns of spatial variation in soil properties in the study area more accurately.

Unlike more traditional spatial interpolation methods (e.g., IDW and OK), which use one interpolation model to train data sets, the ASM-SP method uses a series of base interpolation models and constructs error surfaces to adaptively screen and regroup the interpolation models in an optimized way. Its interpolation accuracy is usually higher than that of a single interpolation model [

The sample data used to predict soil properties cannot usually provide the complete information for individual interpolation models, requiring assumptions to be made about different conditions. In other words, it is difficult for a single interpolation model to accurately describe the spatial variance of soil properties across the whole study area. For instance, using sampling data for one soil property, a number of interpolation models might share similar interpolation accuracies, with no optimal interpolation. The accuracy of spatial interpolation of soil properties can be well improved by effectively combining the advantages of multiple base interpolation models.

The sample data used to predict soil properties often cannot accurately express patterns of spatial variation. However, the integration of multiple models is able to provide a better approximation than use of a single model. For example, the patterns of spatial variance in soil K^{+} in dry farmland differ greatly in areas with chernozem and clay soils. Therefore, if land use type is the only secondary variable used in the spatial interpolation of soil K^{+} (e.g., in OK-Landuse), it is usually impossible to achieve a relatively high prediction accuracy. An effective solution is to integrate a series of spatial interpolation methods (e.g., OK-Landuse, OK-Soil, OK-Geology, etc.) to realize simultaneous approximation.

Based on the above, it is clear that the interpolation results derived from the ASM-SP method provide a better physical explanation of the spatial variation in soil properties. Also, the simulation accuracy of ASM-SP is greatly enhanced compared with OK, OK-Landuse, OK-Soil, OK-Geology etc. Thus, ASM-SP is a more suitable method for application in areas with complex landforms.

Different land uses, soil types, and geology all influence the spatial variation of soil properties. Previous research has also demonstrated that there is a relatively strong spatial correlation between secondary variables and the spatial variation of soil properties [

In this study, we compared spatial interpolation models that integrate secondary variables as the secondary variables (e.g., ASM-SP) and spatial interpolation models that do not incorporate any secondary variables (e.g., IDW and OK). The results indicated that an appropriate integration of secondary variables can effectively improve the spatial interpolation accuracy of soil properties. This supports the conclusion of Goovaerts (1999) that CoKriging interpolation combining secondary variables usually achieves a better simulation effect than OK. However, as pointed out by [

Affected by secondary variables, the spatial distribution of soil properties is subject to problems such as spatial discontinuity and variability. It is difficult for a single global interpolation model to fully explain the spatial instability of spatial variables of soil properties, especially in areas with complex landforms. Using soil K^{+} as a case study, we proposed a kind of adaptive surface modeling that combines secondary variables (ASM-SP). Compared with methods such as OK and OK-Landuse, OK-Soil, and OK-Geology that also combine secondary variables, ASM-SP is able to depict the spatial variation of soil properties in areas with complex landforms more accurately, and reduce simulation errors more effectively, owing to its integration of multiple base interpolation models. In addition, since ASM-SP combines secondary variables and its simulation surface better accords with geographical laws, it provides detailed information about the spatial variation of soil properties that is more accurate and reasonable. This provides greater opportunity for physical explanation of the spatial variance characteristics of soil properties. However, ASM-SP is based on error minimization surfaces; therefore, there is a risk of over-fitting, which will be addressed in future work.

The interpolation accuracy of soil properties in areas with complex landforms has two main challenges. First, there is a non-linear relationship between the soil properties of sampling points and the secondary variables, and the fitting precision of conventional linear models is rather limited. Second, the selected interpolation model must have relatively high simulation accuracy and, preferably, provide the optimal interpolation. However, in reality, every interpolation model has advantages and disadvantages. Even though it is possible to find a global optimum interpolation model through adequate data exploration and analysis, a simple global model is unable to explain the spatial instability of soil property spatial variables. A feasible solution is to combine secondary variables to integrate multiple models, so that different combinations of interpolation models can be selected for different areas. Soil K^{+} is comparatively representative of soil properties that vary severely within a short horizontal distance. The ASM-SP method would also be applicable to the interpolation of other soil properties (e.g., soil P, PH, Ca, Mg, and Zn). Previously, we verified the advantages of an ensemble learning algorithm in the serial integration of multiple models [

This study was supported by the National Natural Science Foundation of China (Grant No. 41601405). We are grateful to the Qinghai Environmental Monitoring Center for providing topsoil sampling approval. Thanks to the China Soil Investigation Office and the Bureau of Geological Exploration & Development of Qinghai Province for providing secondary datasets.

Conceived and designed the experiments: Liu Wei; performed the experiments: Yan Da-Peng and Wang Sheng-Li; analyzed the data: Liu Wei and Zhang Hai-Rong; contributed reagents/materials/analysis tools: Wang Sheng-Li; wrote the paper: Liu Wei and Zhang Hai-Rong.

The authors declare no conflict of interest.

Location of the study area, showing sample sites (circles) and elevation (shading).

Characteristics of the study area: (

Adaptive partitioning process (

Semi-variograms of soil K^{+} residuals for: (

Mean surface _{mn}^{+} for different secondary variables: (

Residual surfaces r(x_{mn},y_{mn}) of soil K^{+} for different secondary variables: (

Error surfaces of base interpolation models: (

The raster cell optimization process (‘a’ and ‘b’ are different models of raster interpolation, ‘c’ and ‘d’ are the interpolation error, ‘e’ is the optimal raster cell mosaic result).

Regional distribution of optimized base interpolation models.

Comparison of soil K^{+} maps constructed using different interpolation methods: (

Descriptive statistical characteristics of soil K^{+} content in different secondary variables.

Secondary Variable | Subtype | Number | Mean | Standard Error | Area/km^{2} |
Area Proportion/% |
---|---|---|---|---|---|---|

Soil | Alpine meadow soil | 32 | 1.98 | 0.14 | 420.47 | 20.81 |

Chestnut soil | 54 | 2.01 | 0.18 | 1360.14 | 67.31 | |

Flow sandy soil | 10 | 1.72 | 0.12 | 144.76 | 7.16 | |

Meadow marsh soil | 6 | 1.84 | 0.03 | 31.9 | 1.58 | |

Semi-fixed sandy soil | 8 | 1.50 | 0.07 | 63.4 | 3.14 | |

Geology | Alluvial terrace | 8 | 2.04 | 0.14 | 71.25 | 3.53 |

Denudate high terrace | 10 | 2.15 | 0.07 | 266.73 | 13.22 | |

Diluvial plain | 13 | 2.10 | 0.13 | 515.24 | 25.53 | |

Hilly | 3 | 2.14 | 0.05 | 3.76 | 0.19 | |

Lacustrine plain | 20 | 1.94 | 0.17 | 333.33 | 16.52 | |

Lake beach | 5 | 1.84 | 0.09 | 143.99 | 7.14 | |

Large rolling alpine | 10 | 1.89 | 0.12 | 132.64 | 6.57 | |

Middle rolling alpine | 4 | 1.91 | 0.10 | 5.63 | 0.28 | |

Sand dune | 14 | 1.63 | 0.14 | 193.15 | 9.57 | |

Small rolling alpine | 14 | 2.05 | 0.12 | 287.08 | 14.23 | |

Valley plain | 9 | 1.96 | 0.08 | 65.22 | 3.23 | |

Land use | Cropland | 10 | 2.14 | 0.08 | 77.16 | 3.83 |

Grassland | 41 | 1.99 | 0.13 | 1172.65 | 58.17 | |

Meadowland | 25 | 2.02 | 0.12 | 417.44 | 20.71 | |

Potential arable land | 16 | 1.88 | 0.14 | 229.32 | 11.38 | |

Scrubland | 0 | 1.91 | 0.22 | 1.18 | 0.05 | |

Swamp meadowland | 5 | 1.84 | 0.04 | 32.09 | 1.59 | |

Unused land | 13 | 1.64 | 0.09 | 86.43 | 4.29 | |

Grassland | Achnatherum splendens | 37 | 1.93 | 0.17 | 719.58 | 35.59 |

Artemisaarenariadc | 2 | 1.49 | 0.07 | 31.83 | 1.57 | |

Blysmus sinocompressus | 5 | 1.93 | 0.14 | 30.00 | 1.48 | |

Bush cinqefoil | 18 | 2.06 | 0.14 | 517.19 | 25.58 | |

Coarse beak carex | 2 | 1.86 | 0.04 | 20.10 | 0.99 | |

Elymus nutans | 3 | 1.73 | 0.08 | 18.49 | 0.91 | |

Ephedra | 1 | 1.50 | 0 | 2.49 | 0.12 | |

Gravel | 4 | 1.68 | 0.10 | 135.38 | 6.70 | |

Iris ensata thunb | 1 | 1.96 | 0 | 34.47 | 1.70 | |

Leymus | 6 | 1.94 | 0.27 | 28.95 | 1.43 | |

Kobresia humilis | 4 | 2.02 | 0.08 | 28.30 | 1.40 | |

Koeleria tibetica | 4 | 1.80 | 0.10 | 24.45 | 1.21 | |

Kobresia capillifolia | 7 | 2.05 | 0.08 | 157.90 | 7.81 | |

Kobresia myosuroides | 3 | 2.16 | 0.06 | 82.99 | 4.11 | |

Salix oritrepha | 2 | 2.02 | 0.05 | 16.33 | 0.81 | |

Serpent grass | 2 | 1.94 | 0.08 | 4.29 | 0.21 | |

Stipa krylovii | 3 | 1.82 | 0.05 | 51.77 | 2.56 | |

Stipa purpurea | 5 | 2.14 | 0.07 | 111.51 | 5.52 | |

Water bai zhi | 1 | 2.08 | 0 | 5.64 | 0.28 |

Semi-variogram models.

Parameter | Residue of OK_Landuse | Residue of OK_Soil | Residue of OK_Geology | OK |
---|---|---|---|---|

Model | K-Bessel | K-Bessel | Exponential | Exponential |

Range/10 km | 1.1984 | 1.2169 | 1.1984 | 2.5058 |

Nugget ( |
0.0204 | 0.03124 | 0.1866 | 0.2483 |

Sill ( |
0.4842 | 0.5043 | 0.4783 | 0.6012 |

0.0421 | 0.0619 | 0.3901 | 0.4130 |

ANOVA analysis for testing the significance of secondary variables on soil K^{+} variance.

Geo-Factors | Soil Property | Sources of Variance | Degree of Freedom | Sum of Variance | Mean Variance | ||
---|---|---|---|---|---|---|---|

Geology type | Soil K^{+} |
In-group | 9 | 1.033 | 0.115 | 2.856 | 0.005 |

Between groups | 101 | 4.060 | 0.04 | ||||

Total | 110 | 5.093 | |||||

Soil type | Soil K^{+} |
In-group | 4 | 0.722 | 0.181 | 4.378 | 0.003 |

Between groups | 106 | 4.371 | 0.041 | ||||

Total | 110 | 5.093 | |||||

Land use type | Soil K^{+} |
In-group | 4 | 0.462 | 0.116 | 2.645 | 0.008 |

Between groups | 106 | 4.631 | 0.044 | ||||

Total | 110 | 5.093 | |||||

Grassland type | Soil K^{+} |
In-group | 16 | 0.934 | 0.058 | 1.319 | 0.202 |

Between groups | 94 | 4.159 | 0.044 | ||||

Total | 110 | 5.093 |

Comparison of the accuracy of OK, OK-Landuse, OK-Geology, OK-Soil, inverse distance weighting (IDW), and ASM-SP interpolation.

Evaluation Index | OK-Landuse | OK-Geology | OK-Soil | IDW | OK | ASM-SP |
---|---|---|---|---|---|---|

ME | 0.0030 | −0.0037 | 0.0024 | 0.0072 | 0.0093 | 0.0017 |

MAE | 0.0294 | 0.0301 | 0.0236 | 0.0362 | 0.0314 | 0.0072 |

RMSE | 0.0742 | 0.0672 | 0.0815 | 0.1637 | 0.1067 | 0.0586 |

MRE | 95.91% | 96.57% | 95.87% | 96.04% | 95.34% | 89.69% |

AC | 0.9047 | 0.9186 | 0.9242 | 0.8756 | 0.8976 | 0.9903 |