Skip to content

spatial-kfold

License: GPL-3.0 pypi Downloads


spatial-kfold: A Python Package for Spatial Resampling Toward More Reliable Cross-Validation in Spatial Studies.

spatial-kfold is a python library for performing spatial resampling to ensure more robust cross-validation in spatial studies. It offers spatial clustering and block resampling technique with user-friendly parameters to customize the resampling. It enables users to conduct a "Leave Region Out" cross-validation, which can be useful for evaluating the model's generalization to new locations as well as improving the reliability of feature selection and hyperparameter tuning in spatial studies.

spatial-kfold can be integrated easily with scikit-learn's LeaveOneGroupOut cross-validation technique. This integration enables you to further leverage the resampled spatial data for performing feature selection and hyperparameter tuning.


Features

  • Conduct Leave Region Out using two spatial resampling techniques:
  • Spatial clustering with KMeans or BisectingKMeans
  • Spatial blocks

    • Random blocks
    • Continuous blocks

    • tb-lr (top-bottom, left-right)

    • bt-rl (bottom-top, right-left)
  • Integrates easily with scikit-learn’s LeaveOneGroupOut cross-validation to enable advanced feature selection and hyperparameter tuning.


Installation

You can install spatial-kfold directly from PyPI:

pip install spatial-kfold

Credits

This package was inspired by the following R packages:


Dependencies

This project relies on the following dependencies:


Citation

If you use My Package in your research or work, please cite it using the following entries:

  • MLA Style:

Ghariani, Walid. "spatial-kfold: A Python Package for Spatial Resampling Toward More Reliable Cross-Validation in Spatial Studies." 2023. GitHub, https://github.com/WalidGharianiEAGLE/spatial-kfold
- BibTex Style:

@Misc{spatial-kfold,
author = {Walid Ghariani},
title = {spatial-kfold: A Python Package for Spatial Resampling Toward More Reliable Cross-Validation in Spatial Studies},
howpublished = {GitHub},
year = {2023},
url = {https://github.com/WalidGharianiEAGLE/spatial-kfold}
}

Resources

A list of tutorials and resources mainly in R explaining the importance of spatial resampling and spatial cross validation


Bibliography

Meyer, H., Reudenbach, C., Wöllauer, S., Nauss, T. (2019): Importance of spatial predictor variable selection in machine learning applications - Moving from data reproduction to spatial prediction. Ecological Modelling. 411. https://doi.org/10.1016/j.ecolmodel.2019.108815

Schratz, Patrick, et al. "Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data." Ecological Modelling 406 (2019): 109-120. https://doi.org/10.1016/j.ecolmodel.2019.06.002

Schratz, Patrick, et al. "mlr3spatiotempcv: Spatiotemporal resampling methods for machine learning in R." arXiv preprint arXiv:2110.12674 (2021). https://arxiv.org/abs/2110.12674

Valavi, Roozbeh, et al. "blockCV: An r package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models." Biorxiv (2018): 357798. https://doi.org/10.1101/357798