In statistics and machine learning, Cross-validation is the process of estimating how well a trained model generalizes by assessing its performance on an unseen set of data.
One often makes use of cross-validation to aid in model choice. Typically, the training dataset is tackled using several different models, which are then compared using cross-validation on the (appropriately named) validation dataset. The best model is then chosen to be evaluated against the test dataset.
A common approach to cross-validation is k-fold cross validation, in which we define k different splits of training and validation data, assessing the model on each of those splits and combining the outcome.