spatial-kfold

A Python Package for Spatial Resampling Toward More Reliable Cross-Validation in Spatial Studies.

Source code

đź”— spatial-kfold

Motivations

During my MSc thesis, I developed a Python package aimed at enhancing the reliability of cross-validation for spatial data. There seems to be a misconception regarding the appropriate choice of cross-validation strategy when conducting spatial predictions and evaluating machine learning algorithms. Many researchers often opt for random cross-validation or k-fold techniques, which fail to consider the spatial nature of the data and the presence of spatial autocorrelation Bahn & McGill, 2012. Consequently, this approach may lead to an overly optimistic assessment of the model’s performance and an overfitting issue. This is why I created this package, to address not only the issue of overfitting but also to provide a more reliable evaluation of model performance.

Description

spatial-kfold is a python library for performing spatial resampling to ensure more robust cross-validation in spatial studies. It offers spatial clustering and block resampling techniques with user-friendly parameters to customize the resampling. It enables users to conduct a “Leave Region Out” cross-validation, which can be useful for evaluating the model’s generalization to new locations as well as improving the reliability of feature selection Meyer et al., 2019 and hyperparameter tuning Schratz et al., 2019 in spatial studies.

Main Features

spatial-kfold allow to conduct “Leave Region Out” using two spatial resampling techniques:

  • 1- Spatial clustering with kmeans
  • 2- Spatial blocks
    • Random blocks
    • Continuous blocks
      • tb-lr : top-bottom, left-right
      • bt-rl : bottom-top, right-left

Installation

spatial-kfold can be installed from PyPI

pip install spatial-kfold

Spatial resampling

Spatially Clustered Folds.

Random and Spatial cross validation

k-fold vs spatial-kfold.
---
Background image srource: Photo by Lukasz Szmigiel on Unsplash 
---