{"id":292,"date":"2024-12-01T07:02:40","date_gmt":"2024-12-01T07:02:40","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2024\/12\/01\/model-validation-techniques-explained-a-visual-guide-with-code-examples-eb13bbdc8f88\/"},"modified":"2024-12-01T07:02:40","modified_gmt":"2024-12-01T07:02:40","slug":"model-validation-techniques-explained-a-visual-guide-with-code-examples-eb13bbdc8f88","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2024\/12\/01\/model-validation-techniques-explained-a-visual-guide-with-code-examples-eb13bbdc8f88\/","title":{"rendered":"Model Validation Techniques, Explained: A Visual Guide with Code Examples"},"content":{"rendered":"<p>    Model Validation Techniques, Explained: A Visual Guide with Code Examples<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<h4>MODEL EVALUATION &amp; OPTIMIZATION<\/h4>\n<h4>12 must-know methods to v<strong>alidate your machine\u00a0learning<\/strong><br \/>\n<\/h4>\n<p>Every day, machines make millions of predictions\u200a\u2014\u200afrom detecting objects in photos to helping doctors find diseases. But before trusting these predictions, we need to know if they\u2019re any good. After all, no one would want to use a machine that\u2019s wrong most of the\u00a0time!<\/p>\n<p>This is where validation comes in. Validation methods test machine predictions to measure their reliability. While this might sound simple, different validation approaches exist, each designed to handle specific challenges in machine learning.<\/p>\n<p>Here, I\u2019ve organized these validation techniques\u200a\u2014\u200aall 12 of them\u200a\u2014\u200ain a tree structure, showing how they evolved from basic concepts into more specialized ones. And of course, we will use clear visuals and a consistent dataset to show what each method does differently and why method selection matters.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AXQDe622Tw9GCKJ8N4b0QeQ.png?ssl=1\"><figcaption>All visuals: Author-created using Canva Pro. Optimized for mobile; may appear oversized on\u00a0desktop.<\/figcaption><\/figure>\n<h3>What is Model Validation?<\/h3>\n<p>Model validation is the process of testing how well a machine learning model works with data it hasn\u2019t seen or used during training. Basically, we use existing data to check the model\u2019s performance instead of using new data. This helps us identify problems before deploying the model for real\u00a0use.<\/p>\n<p>There are several validation methods, and each method has specific strengths and addresses different validation challenges:<\/p>\n<ol>\n<li>Different validation methods can produce different results, so choosing the right method\u00a0matters.<\/li>\n<li>Some validation techniques work better with specific types of data and\u00a0models.<\/li>\n<li>Using incorrect validation methods can give misleading results about the model\u2019s true performance.<\/li>\n<\/ol>\n<p>Here is a tree diagram showing how these validation methods relate to each\u00a0other:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A5eu6sk1xVs3aPGRi6TQW8Q.png?ssl=1\"><figcaption>The tree diagram shows which validation methods are connected to each\u00a0other.<\/figcaption><\/figure>\n<p>Next, we\u2019ll look at each validation method more closely by showing exactly how they work. To make everything easier to understand, we\u2019ll walk through clear examples that show how these methods work with real\u00a0data.<\/p>\n<h3>\ud83d\udcca \ud83d\udcc8 Our Running\u00a0Example<\/h3>\n<p>We will use the same example throughout to help you understand each testing method. While this dataset may not be appropriate for some validation methods, for education purpose, using this one example makes it easier to compare different methods and see how each one\u00a0works.<\/p>\n<h4>\ud83d\udcca The Golf Playing\u00a0Dataset<\/h4>\n<p>We\u2019ll work with this dataset that predicts whether someone will play golf based on weather conditions.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AwelFOPMREgLa27G37tI4Kg.png?ssl=1\"><figcaption>Columns: \u2018Overcast (one-hot-encoded into 3 columns)\u2019, \u2019Temperature\u2019 (in Fahrenheit), \u2018Humidity\u2019 (in %), \u2018Windy\u2019 (Yes\/No) and \u2018Play\u2019 (Yes\/No, target\u00a0feature)<\/figcaption><\/figure>\n<pre>import pandas as pd<br>import numpy as np<br><br># Load the dataset<br>dataset_dict = {<br>    'Outlook': ['sunny', 'sunny', 'overcast', 'rainy', 'rainy', 'rainy', 'overcast', <br>                'sunny', 'sunny', 'rainy', 'sunny', 'overcast', 'overcast', 'rainy',<br>                'sunny', 'overcast', 'rainy', 'sunny', 'sunny', 'rainy', 'overcast',<br>                'rainy', 'sunny', 'overcast', 'sunny', 'overcast', 'rainy', 'overcast'],<br>    'Temperature': [85.0, 80.0, 83.0, 70.0, 68.0, 65.0, 64.0, 72.0, 69.0, 75.0, 75.0,<br>                   72.0, 81.0, 71.0, 81.0, 74.0, 76.0, 78.0, 82.0, 67.0, 85.0, 73.0,<br>                   88.0, 77.0, 79.0, 80.0, 66.0, 84.0],<br>    'Humidity': [85.0, 90.0, 78.0, 96.0, 80.0, 70.0, 65.0, 95.0, 70.0, 80.0, 70.0,<br>                 90.0, 75.0, 80.0, 88.0, 92.0, 85.0, 75.0, 92.0, 90.0, 85.0, 88.0,<br>                 65.0, 70.0, 60.0, 95.0, 70.0, 78.0],<br>    'Wind': [False, True, False, False, False, True, True, False, False, False, True,<br>             True, False, True, True, False, False, True, False, True, True, False,<br>             True, False, False, True, False, False],<br>    'Play': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes',<br>             'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'No', 'No', 'Yes', 'Yes',<br>             'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes']<br>}<br><br>df = pd.DataFrame(dataset_dict)<br><br># Data preprocessing<br>df = pd.DataFrame(dataset_dict)<br>df = pd.get_dummies(df, columns=['Outlook'], prefix='', prefix_sep='', dtype=int)<br>df['Wind'] = df['Wind'].astype(int)<br><br># Set the label<br>X, y = df.drop('Play', axis=1), df['Play']<\/pre>\n<h4>\ud83d\udcc8 Our Model\u00a0Choice<\/h4>\n<p>We will use a <a href=\"https:\/\/towardsdatascience.com\/decision-tree-classifier-explained-a-visual-guide-with-code-examples-for-beginners-7c863f06a71e\">decision tree classifier<\/a> for all our tests. We picked this model because we can easily draw the resulting model as a tree structure, with each branch showing different decisions. To keep things simple and focus on how we test the model, we will use the default scikit-learn parameter with a fixed random_state.<\/p>\n<p>Let\u2019s be clear about these two terms we\u2019ll use: The decision tree classifier is our <strong>learning algorithm<\/strong>\u200a\u2014\u200ait\u2019s the method that finds patterns in our data. When we feed data into this algorithm, it creates a <strong>model<\/strong> (in this case, a tree with clear branches showing different decisions). This model is what we\u2019ll actually use to make predictions.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2Az3IEEu4FoNkZgIv1r-Mc2g.png?ssl=1\"><\/figure>\n<pre>from sklearn.tree import DecisionTreeClassifier, plot_tree<br>import matplotlib.pyplot as plt<br><br>dt = DecisionTreeClassifier(random_state=42)<\/pre>\n<p>Each time we split our data differently for validation, we\u2019ll get different models with different decision rules. Once our validation shows that our algorithm works reliably, we\u2019ll create one final model using all our data. This final model is the one we\u2019ll actually use to predict if someone will play golf or\u00a0not.<\/p>\n<p>With this setup ready, we can now focus on understanding how each validation method works and how it helps us make better predictions about golf playing based on weather conditions. Let\u2019s examine each validation method one at a\u00a0time.<\/p>\n<h3>Hold-out Methods<\/h3>\n<p>Hold-out methods are the most basic way to check how well our model works. In these methods, we basically save some of our data just for\u00a0testing.<\/p>\n<h4>Train-Test Split<\/h4>\n<p>This method is simple: we split our data into two parts. We use one part to train our model and the other part to test it. Before we split the data, we mix it up randomly so the order of our original data doesn\u2019t affect our\u00a0results.<\/p>\n<p>Both the training and test dataset size depends on our total dataset size, usually denoted by their ratio. To determine their size, you can follow this guideline:<\/p>\n<ul>\n<li>For small datasets (around 1,000\u201310,000 samples), use 80:20\u00a0ratio.<\/li>\n<li>For medium datasets (around 10,000\u2013100,000 samples), use 70:30\u00a0ratio.<\/li>\n<li>Large datasets (over 100,000 samples), use 90:10\u00a0ratio.<\/li>\n<\/ul>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2Adb9E6hy6oNFb6lZ7EDGzGA.png?ssl=1\"><\/figure>\n<pre>from sklearn.model_selection import train_test_split<br><br>### Simple Train-Test Split ###<br># Split data<br>X_train, X_test, y_train, y_test = train_test_split(<br>    X, y, test_size=0.2, random_state=42<br>)<br><br># Train and evaluate<br>dt.fit(X_train, y_train)<br>test_accuracy = dt.score(X_test, y_test)<br><br># Plot<br>plt.figure(figsize=(5, 5), dpi=300)<br>plot_tree(dt, feature_names=X.columns, filled=True, rounded=True)<br>plt.title(f'Train-Test Split (Test Accuracy: {test_accuracy:.3f})')<br>plt.tight_layout()<\/pre>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AtQS5_AvSM9nIi-pZA5VOhA.png?ssl=1\"><\/figure>\n<p>This method is easy to use, but it has some limitation\u200a\u2014\u200athe results can change a lot depending on how we randomly split the data. This is why we always need to try out different random_state to make sure that the result is consistent. Also, if we don\u2019t have much data to start with, we might not have enough to properly train or test our\u00a0model.<\/p>\n<h4>Train-Validation-Test Split<\/h4>\n<p>This method split our data into three parts. The middle part, called validation data, is being used to tune the parameters of the model and we\u2019re aiming to have the least amount of error\u00a0there.<\/p>\n<p>Since the validation results is considered many times during this tuning process, our model might start doing too well on this validation data (which is what we want). This is the reason of why we make the separate test set. We are only testing it once at the very end\u200a\u2014\u200ait gives us the truth of how well our model\u00a0works.<\/p>\n<p>Here are typical ways to split your\u00a0data:<\/p>\n<ul>\n<li>For smaller datasets (1,000\u201310,000 samples), use 60:20:20\u00a0ratio.<\/li>\n<li>For medium datasets (10,000\u2013100,000 samples), use 70:15:15\u00a0ratio.<\/li>\n<li>Large datasets (&gt; 100,000 samples), use 80:10:10\u00a0ratio.<\/li>\n<\/ul>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AGAhDHk64pYnjCocw4j1scw.png?ssl=1\"><\/figure>\n<pre>### Train-Validation-Test Split ###<br># First split: separate test set<br>X_temp, X_test, y_temp, y_test = train_test_split(<br>    X, y, test_size=0.2, random_state=42<br>)<br><br># Second split: separate validation set<br>X_train, X_val, y_train, y_val = train_test_split(<br>    X_temp, y_temp, test_size=0.25, random_state=42<br>)<br><br># Train and evaluate<br>dt.fit(X_train, y_train)<br>val_accuracy = dt.score(X_val, y_val)<br>test_accuracy = dt.score(X_test, y_test)<br><br># Plot<br>plt.figure(figsize=(5, 5), dpi=300)<br>plot_tree(dt, feature_names=X.columns, filled=True, rounded=True)<br>plt.title(f'Train-Val-Test SplitnValidation Accuracy: {val_accuracy:.3f}'<br>          f'nTest Accuracy: {test_accuracy:.3f}')<br>plt.tight_layout()<\/pre>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AW-aFdthd7C3kuJtotB4Wew.png?ssl=1\"><\/figure>\n<p>Hold-out methods work differently depending on how much data you have. They work really well when you have lots of data (&gt; 100,000). But when you have less data (&lt; 1,000) this method is not be the best. With smaller datasets, you might need to use more advanced validation methods to get a better understanding of how well your model really\u00a0works.<\/p>\n<h4>\ud83d\udcca Moving to Cross-validation<\/h4>\n<p>We just learned that hold-out methods might not work very well with small datasets. This is exactly the challenge we currently face\u2014 we only have 28 days of data. Following the hold-out principle, we\u2019ll keep 14 days of data separate for our final test. This leaves us with 14 days to work with for trying other validation methods.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AXVJxvpcXTZ2gjdqKVVJC8A.png?ssl=1\"><\/figure>\n<pre># Initial train-test split<br>X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, shuffle=False)<\/pre>\n<p>In the next part, we\u2019ll see how cross-validation methods can take these 14 days and split them up multiple times in different ways. This gives us a better idea of how well our model is really working, even with such limited\u00a0data.<\/p>\n<h3>Cross Validation<\/h3>\n<p>Cross-validation changes how we think about testing our models. Instead of testing our model just once with one split of data, we test it many times using different splits of the same data. This helps us understand much better how well our model really\u00a0works.<\/p>\n<p>The main idea of cross-validation is to test our model multiple times, and each time the training and test dataset come from different part of the our data. This helps prevent bias by one really good (or really bad) split of the\u00a0data.<\/p>\n<p>Here\u2019s why this matters: say our model gets 95% accuracy when we test it one way, but only 75% when we test it another way using the same data. Which number shows how good our model really is? Cross-validation helps us answer this question by giving us many test results instead of just one. This gives us a clearer picture of how well our model actually performs.<\/p>\n<h4>K-Fold Methods<\/h4>\n<p><strong><em>Basic K-Fold Cross-Validation<\/em><br \/><\/strong><em>K<\/em>-fold cross-validation fixes a big problem with basic splitting: relying too much on just one way of splitting the data. Instead of splitting the data once, <em>K<\/em>-fold splits the data into <em>K<\/em> equal parts. Then it tests the model multiple times, using a different part for testing each time while using all other parts for training.<\/p>\n<p>The number we pick for<em> K<\/em> changes how we test our model. Most people use 5 or 10 for <em>K<\/em>, but this can change based on how much data we have and what we need for our project. Let\u2019s say we use <em>K <\/em>= 3. This means we split our data into three equal parts. We then train and test our model three different times. Each time, 2\/3 of the data is used for training and 1\/3 for testing, but we rotate which part is being used for testing. This way, every piece of data gets used for both training and\u00a0testing.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2ALYHMeNK8CjdwDGIG4seDbQ.png?ssl=1\"><\/figure>\n<pre>from sklearn.model_selection import KFold, cross_val_score<br><br># Cross-validation strategy<br>cv = KFold(n_splits=3, shuffle=True, random_state=42)<br><br># Calculate cross-validation scores<br>scores = cross_val_score(dt, X_train, y_train, cv=cv)<br>print(f\"Validation accuracy: {scores.mean():.3f} \u00b1 {scores.std():.3f}\")<br><br># Plot trees for each split<br>plt.figure(figsize=(4, 3.5*cv.get_n_splits(X_train)))<br>for i, (train_idx, val_idx) in enumerate(cv.split(X_train, y_train)):<br>    # Train and visualize the tree for this split<br>    dt.fit(X_train.iloc[train_idx], y_train.iloc[train_idx])<br>    plt.subplot(cv.get_n_splits(X_train), 1, i+1)<br>    plot_tree(dt, feature_names=X_train.columns, impurity=False, filled=True, rounded=True)<br>    plt.title(f'Split {i+1} (Validation Accuracy: {scores[i]:.3f})nTrain indices: {train_idx}nValidation indices: {val_idx}')<br><br>plt.tight_layout()<\/pre>\n<p>Validation accuracy: 0.433 \u00b1\u00a00.047<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/390\/1%2AJ0cs8XqcLAwKgqZNdWgHXg.png?ssl=1\"><\/figure>\n<p>When we\u2019re done with all the rounds, we calculate the average performance from all <em>K<\/em> tests. This average gives us a more trustworthy measure of how well our model works. We can also learn about how stable our model is by looking at how much the results change between different rounds of\u00a0testing.<\/p>\n<p><strong><em>Stratified K-Fold<br \/><\/em><\/strong>Basic K-fold cross-validation usually works well, but it can run into problems when our data is unbalanced\u200a\u2014\u200ameaning we have a lot more of one type than others. For example, if we have 100 data points and 90 of them are type A while only 10 are type B, randomly splitting this data might give us pieces that don\u2019t have enough type B to test properly.<\/p>\n<p>Stratified K-fold fixes this by making sure each split has the same mix as our original data. If our full dataset has 10% type B, each split will also have about 10% type B. This makes our testing more reliable, especially when some types of data are much rarer than\u00a0others.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AFLXVAbVz4xPNeLZI30TyKw.png?ssl=1\"><\/figure>\n<pre>from sklearn.model_selection import StratifiedKFold, cross_val_score<br><br># Cross-validation strategy<br>cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=42)<br><br># Calculate cross-validation scores<br>scores = cross_val_score(dt, X_train, y_train, cv=cv)<br>print(f\"Validation accuracy: {scores.mean():.3f} \u00b1 {scores.std():.3f}\")<br><br># Plot trees for each split<br>plt.figure(figsize=(5, 4*cv.get_n_splits(X_train)))<br>for i, (train_idx, val_idx) in enumerate(cv.split(X_train, y_train)):<br>    # Train and visualize the tree for this split<br>    dt.fit(X_train.iloc[train_idx], y_train.iloc[train_idx])<br>    plt.subplot(cv.get_n_splits(X_train), 1, i+1)<br>    plot_tree(dt, feature_names=X_train.columns, impurity=False, filled=True, rounded=True)<br>    plt.title(f'Split {i+1} (Validation Accuracy: {scores[i]:.3f})nTrain indices: {train_idx}nValidation indices: {val_idx}')<br><br>plt.tight_layout()<\/pre>\n<p>Validation accuracy: 0.650 \u00b1\u00a00.071<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/390\/1%2AYYPVzen295lUd6GoLtDsGA.png?ssl=1\"><\/figure>\n<p>Keeping this balance helps in two ways. First, it makes sure each split properly represents what our data looks like. Second, it gives us more consistent test results\u00a0. This means that if we test our model multiple times, we\u2019ll most likely get similar results each\u00a0time.<\/p>\n<p><strong><em>Repeated K-Fold<br \/><\/em><\/strong>Sometimes, even when we use K-fold validation, our test results can change a lot between different random splits. Repeated K-fold solves this by running the entire K-fold process multiple times, using different random splits each\u00a0time.<\/p>\n<p>For example, let\u2019s say we run 5-fold cross-validation three times. This means our model goes through training and testing 15 times in total. By testing so many times, we can better tell which differences in results come from random chance and which ones show how well our model really performs. The downside is that all this extra testing takes more time to complete.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2ADYeue2cmeUd7_ADB2pMT3w.png?ssl=1\"><\/figure>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2ASt2Y8M_0WV2wlPQe9dXaSg.png?ssl=1\"><\/figure>\n<pre>from sklearn.model_selection import RepeatedKFold<br><br># Cross-validation strategy<br>n_splits = 3<br>cv = RepeatedKFold(n_splits=n_splits, n_repeats=2, random_state=42)<br><br># Calculate cross-validation scores<br>scores = cross_val_score(dt, X_train, y_train, cv=cv)<br>print(f\"Validation accuracy: {scores.mean():.3f} \u00b1 {scores.std():.3f}\")<br><br># Plot trees for each split<br>total_splits = cv.get_n_splits(X_train)  # Will be 6 (3 folds \u00d7 2 repetitions)<br>plt.figure(figsize=(5, 4*total_splits))<br>for i, (train_idx, val_idx) in enumerate(cv.split(X_train, y_train)):<br>   # Train and visualize the tree for this split<br>   dt.fit(X_train.iloc[train_idx], y_train.iloc[train_idx])<br>   <br>   # Calculate repetition and fold numbers<br>   repetition, fold = i \/\/ n_splits + 1, i % n_splits + 1<br>   <br>   plt.subplot(total_splits, 1, i+1)<br>   plot_tree(dt, feature_names=X_train.columns, impurity=False, filled=True, rounded=True)<br>   plt.title(f'Split {repetition}.{fold} (Validation Accuracy: {scores[i]:.3f})n'<br>            f'Train indices: {list(train_idx)}n'<br>            f'Validation indices: {list(val_idx)}')<br><br>plt.tight_layout()<\/pre>\n<p>Validation accuracy: 0.425 \u00b1\u00a00.107<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AWj1km8JVawdGYC7IlN_O6w.png?ssl=1\"><\/figure>\n<p>When we look at repeated K-fold results, since we have many sets of test results, we can do more than just calculate the average\u200a\u2014\u200awe can also figure out how confident we are in our results. This gives us a better understanding of how reliable our model really\u00a0is.<\/p>\n<p><strong><em>Repeated Stratified K-Fold<br \/><\/em><\/strong>This method combines two things we just learned about: keeping class balance (stratification) and running multiple rounds of testing (repetition). It keeps the right mix of different types of data while testing many times. This works especially well when we have a small dataset that\u2019s uneven\u200a\u2014\u200awhere we have a lot more of one type of data than\u00a0others.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2Avhc75uLkn6xl3rqy0JgPyQ.png?ssl=1\"><\/figure>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A0edJeA2P33kksEJem-eKKA.png?ssl=1\"><\/figure>\n<pre>from sklearn.model_selection import RepeatedStratifiedKFold<br><br># Cross-validation strategy<br>n_splits = 3<br>cv = RepeatedStratifiedKFold(n_splits=n_splits, n_repeats=2, random_state=42)<br><br># Calculate cross-validation scores<br>scores = cross_val_score(dt, X_train, y_train, cv=cv)<br>print(f\"Validation accuracy: {scores.mean():.3f} \u00b1 {scores.std():.3f}\")<br><br># Plot trees for each split<br>total_splits = cv.get_n_splits(X_train)  # Will be 6 (3 folds \u00d7 2 repetitions)<br>plt.figure(figsize=(5, 4*total_splits))<br>for i, (train_idx, val_idx) in enumerate(cv.split(X_train, y_train)):<br>   # Train and visualize the tree for this split<br>   dt.fit(X_train.iloc[train_idx], y_train.iloc[train_idx])<br>   <br>   # Calculate repetition and fold numbers<br>   repetition, fold = i \/\/ n_splits + 1, i % n_splits + 1<br>   <br>   plt.subplot(total_splits, 1, i+1)<br>   plot_tree(dt, feature_names=X_train.columns, impurity=False, filled=True, rounded=True)<br>   plt.title(f'Split {repetition}.{fold} (Validation Accuracy: {scores[i]:.3f})n'<br>            f'Train indices: {list(train_idx)}n'<br>            f'Validation indices: {list(val_idx)}')<br><br>plt.tight_layout()<\/pre>\n<p>Validation accuracy: 0.542 \u00b1\u00a00.167<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AxpyypKPfq33nwxh2bfWiDw.png?ssl=1\"><\/figure>\n<p>However, there\u2019s a trade-off: this method takes more time for our computer to run. Each time we repeat the whole process, it multiplies how long it takes to train our model. When deciding whether to use this method, we need to think about whether having more reliable results is worth the extra time it takes to run all these\u00a0tests.<\/p>\n<p><strong><em>Group K-Fold<br \/><\/em><\/strong>Sometimes our data naturally comes in groups that should stay together. Think about golf data where we have many measurements from the same golf course throughout the year. If we put some measurements from one golf course in training data and others in test data, we create a problem: our model would indirectly learn about the test data during training because it saw other measurements from the same\u00a0course.<\/p>\n<p>Group K-fold fixes this by keeping all data from the same group (like all measurements from one golf course) together in the same part when we split the data. This prevents our model from accidentally seeing information it shouldn\u2019t, which could make us think it performs better than it really\u00a0does.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A_4QyQrEB78TtIeo3x3Irbw.png?ssl=1\"><\/figure>\n<pre># Create groups <br>groups = ['Group 1', 'Group 4', 'Group 5', 'Group 3', 'Group 1', 'Group 2', 'Group 4', <br>          'Group 2', 'Group 6', 'Group 3', 'Group 6', 'Group 5', 'Group 1', 'Group 4', <br>          'Group 4', 'Group 3', 'Group 1', 'Group 5', 'Group 6', 'Group 2', 'Group 4', <br>          'Group 5', 'Group 1', 'Group 4', 'Group 5', 'Group 5', 'Group 2', 'Group 6']<br><br># Simple Train-Test Split<br>X_train, X_test, y_train, y_test, groups_train, groups_test = train_test_split(<br>    X, y, groups, test_size=0.5, shuffle=False<br>)<br><br># Cross-validation strategy<br>cv = GroupKFold(n_splits=3)<br><br># Calculate cross-validation scores<br>scores = cross_val_score(dt, X_train, y_train, cv=cv.split(X_train, y_train, groups=groups_train))<br>print(f\"Validation accuracy: {scores.mean():.3f} \u00b1 {scores.std():.3f}\")<br><br># Plot trees for each split<br>plt.figure(figsize=(4, 3.5*cv.get_n_splits(X_train)))<br>for i, (train_idx, val_idx) in enumerate(cv.split(X_train, y_train, groups=groups_train)):<br>   # Get the groups for this split<br>   train_groups = sorted(set(np.array(groups_train)[train_idx]))<br>   val_groups = sorted(set(np.array(groups_train)[val_idx]))<br>   <br>   # Train and visualize the tree for this split<br>   dt.fit(X_train.iloc[train_idx], y_train.iloc[train_idx])<br>   plt.subplot(cv.get_n_splits(X_train), 1, i+1)<br>   plot_tree(dt, feature_names=X_train.columns, impurity=False, filled=True, rounded=True)<br>   plt.title(f'Split {i+1} (Validation Accuracy: {scores[i]:.3f})n'<br>            f'Train indices: {train_idx} ({\", \".join(train_groups)})n'<br>            f'Validation indices: {val_idx} ({\", \".join(val_groups)})')<br><br>plt.tight_layout()<\/pre>\n<p>Validation accuracy: 0.417 \u00b1\u00a00.143<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/678\/1%2Akla7gGWR-wBDAiNwxiNQIw.png?ssl=1\"><\/figure>\n<p>This method can be important when working with data that naturally comes in groups, like multiple weather readings from the same golf course or data that was collected over time from the same location.<\/p>\n<p><strong><em>Time Series Split<br \/><\/em><\/strong>When we split data randomly in regular K-fold, we assume each piece of data doesn\u2019t affect the others. But this doesn\u2019t work well with data that changes over time, where what happened before affects what happens next. Time series split changes K-fold to work better with this kind of time-ordered data.<\/p>\n<p>Instead of splitting data randomly, time series split uses data in order, from past to future. The training data only includes information from times before the testing data. This matches how we use models in real life, where we use past data to predict what will happen\u00a0next.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AAB664tyTLR3ReIbLbqvfgA.png?ssl=1\"><\/figure>\n<pre>from sklearn.model_selection import TimeSeriesSplit, cross_val_score<br><br># Cross-validation strategy<br>cv = TimeSeriesSplit(n_splits=3)<br><br># Calculate cross-validation scores<br>scores = cross_val_score(dt, X_train, y_train, cv=cv)<br>print(f\"Validation accuracy: {scores.mean():.3f} \u00b1 {scores.std():.3f}\")<br><br># Plot trees for each split<br>plt.figure(figsize=(4, 3.5*cv.get_n_splits(X_train)))<br>for i, (train_idx, val_idx) in enumerate(cv.split(X_train, y_train)):<br>   # Train and visualize the tree for this split<br>   dt.fit(X_train.iloc[train_idx], y_train.iloc[train_idx])<br>   plt.subplot(cv.get_n_splits(X_train), 1, i+1)<br>   plot_tree(dt, feature_names=X_train.columns, impurity=False, filled=True, rounded=True)<br>   plt.title(f'Split {i+1} (Validation Accuracy: {scores[i]:.3f})n'<br>            f'Train indices: {train_idx}n'<br>            f'Validation indices: {val_idx}')<br><br>plt.tight_layout()<\/pre>\n<p>Validation accuracy: 0.556 \u00b1\u00a00.157<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/390\/1%2AkFX84_X763sGP0AO09S08w.png?ssl=1\"><\/figure>\n<p>For example, with <em>K<\/em>=3 and our golf data, we might train using weather data from January and February to predict March\u2019s golf playing patterns. Then we\u2019d train using January through March to predict April, and so on. By only going forward in time, this method gives us a more realistic idea of how well our model will work when predicting future golf playing patterns based on\u00a0weather.<\/p>\n<h4>Leave-Out Methods<\/h4>\n<p><strong><em>Leave-One-Out Cross-Validation (LOOCV)<br \/><\/em><\/strong>Leave-One-Out Cross-Validation (LOOCV) is the most thorough validation method. It uses just <em>one<\/em> sample for testing and all other samples for training. The validation is repeated until every single piece of data has been used for\u00a0testing.<\/p>\n<p>Let\u2019s say we have 100 days of golf weather data. LOOCV would train and test the model 100 times. Each time, it uses 99 days for training and 1 day for testing. This method removes any randomness in testing\u200a\u2014\u200aif you run LOOCV on the same data multiple times, you\u2019ll always get the same\u00a0results.<\/p>\n<p>However, LOOCV takes a lot of computing time. If you have <em>N<\/em> pieces of data, you need to train your model <em>N<\/em> times. With large datasets or complex models, this might take too long to be practical. Some simpler models, like linear ones, have shortcuts that make LOOCV faster, but this isn\u2019t true for all\u00a0models.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AknPVGG4PJoKXaBeHIZVzdg.png?ssl=1\"><\/figure>\n<pre>from sklearn.model_selection import LeaveOneOut<br><br># Cross-validation strategy<br>cv = LeaveOneOut()<br><br># Calculate cross-validation scores<br>scores = cross_val_score(dt, X_train, y_train, cv=cv)<br>print(f\"Validation accuracy: {scores.mean():.3f} \u00b1 {scores.std():.3f}\")<br><br># Plot trees for each split<br>plt.figure(figsize=(4, 3.5*cv.get_n_splits(X_train)))<br>for i, (train_idx, val_idx) in enumerate(cv.split(X_train, y_train)):<br>   # Train and visualize the tree for this split<br>   dt.fit(X_train.iloc[train_idx], y_train.iloc[train_idx])<br>   plt.subplot(cv.get_n_splits(X_train), 1, i+1)<br>   plot_tree(dt, feature_names=X_train.columns, impurity=False, filled=True, rounded=True)<br>   plt.title(f'Split {i+1} (Validation Accuracy: {scores[i]:.3f})n'<br>            f'Train indices: {train_idx}n'<br>            f'Validation indices: {val_idx}')<br><br>plt.tight_layout()<\/pre>\n<p>Validation accuracy: 0.429 \u00b1\u00a00.495<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2Agv73zWJr5O7POVugbKn93A.png?ssl=1\"><\/figure>\n<p>LOOCV works really well when we don\u2019t have much data and need to make the most of every piece we have. Since the result depend on every single data, the results can change a lot if our data has noise or unusual values in\u00a0it.<\/p>\n<p><strong><em>Leave-P-Out Cross-Validation<br \/><\/em><\/strong>Leave-P-Out builds on the idea of Leave-One-Out, but instead of testing with just one piece of data, it tests with P pieces at a time. This creates a balance between Leave-One-Out and K-fold validation. The number we choose for P changes how we test the model and how long it\u00a0takes.<\/p>\n<p>The main problem with Leave-P-Out is how quickly the number of possible test combinations grows. For example, if we have 100 days of golf weather data and we want to test with 5 days at a time (P=5), there are millions of different possible ways to choose those 5 days. Testing all these combinations takes too much time when we have lots of data or when we use a larger number for\u00a0P.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AfyhKz9X4SYR9MfGk_MlqJQ.png?ssl=1\"><\/figure>\n<pre>from sklearn.model_selection import LeavePOut, cross_val_score<br><br># Cross-validation strategy<br>cv = LeavePOut(p=3)<br><br># Calculate cross-validation scores (using all splits for accuracy)<br>scores = cross_val_score(dt, X_train, y_train, cv=cv)<br>print(f\"Validation accuracy: {scores.mean():.3f} \u00b1 {scores.std():.3f}\")<br><br># Plot first 15 trees<br>n_trees = 15<br>plt.figure(figsize=(4, 3.5*n_trees))<br>for i, (train_idx, val_idx) in enumerate(cv.split(X_train, y_train)):<br>   if i &gt;= n_trees:<br>       break<br>       <br>   # Train and visualize the tree for this split<br>   dt.fit(X_train.iloc[train_idx], y_train.iloc[train_idx])<br>   plt.subplot(n_trees, 1, i+1)<br>   plot_tree(dt, feature_names=X_train.columns, impurity=False, filled=True, rounded=True)<br>   plt.title(f'Split {i+1} (Validation Accuracy: {scores[i]:.3f})n'<br>            f'Train indices: {train_idx}n'<br>            f'Validation indices: {val_idx}')<br><br>plt.tight_layout()<\/pre>\n<p>Validation accuracy: 0.441 \u00b1\u00a00.254<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A12GiNBD3Lg9nlvB5LzI5tA.png?ssl=1\"><\/figure>\n<p>Because of these practical limits, Leave-P-Out is mostly used in special cases where we need very thorough testing and have a small enough dataset to make it work. It\u2019s especially useful in research projects where getting the most accurate test results matters more than how long the testing\u00a0takes.<\/p>\n<h4>Random Methods<\/h4>\n<p><strong><em>ShuffleSplit Cross-Validation<br \/><\/em><\/strong>ShuffleSplit works differently from other validation methods by using completely random splits. Instead of splitting data in an organized way like K-fold, or testing every possible combination like Leave-P-Out, ShuffleSplit creates random training and testing splits each\u00a0time.<\/p>\n<p>What makes ShuffleSplit different from K-fold is that the splits don\u2019t follow any pattern. In K-fold, each piece of data gets used exactly once for testing. But in ShuffleSplit, a single day of golf weather data might be used for testing several times, or might not be used for testing at all. This randomness gives us a different way to understand how well our model performs.<\/p>\n<p>ShuffleSplit works especially well with large datasets where K-fold might take too long to run. We can choose how many times we want to test, no matter how much data we have. We can also control how big each split should be. This lets us find a good balance between thorough testing and the time it takes to\u00a0run.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A2kCPiZFxmpSujpk0ctU_oA.png?ssl=1\"><\/figure>\n<pre>from sklearn.model_selection import ShuffleSplit, train_test_split<br><br># Cross-validation strategy<br>cv = ShuffleSplit(n_splits=3, test_size=0.2, random_state=41)<br><br># Calculate cross-validation scores<br>scores = cross_val_score(dt, X_train, y_train, cv=cv)<br>print(f\"Validation accuracy: {scores.mean():.3f} \u00b1 {scores.std():.3f}\")<br><br># Plot trees for each split<br>plt.figure(figsize=(4, 3.5*cv.get_n_splits(X_train)))<br>for i, (train_idx, val_idx) in enumerate(cv.split(X_train, y_train)):<br>   # Train and visualize the tree for this split<br>   dt.fit(X_train.iloc[train_idx], y_train.iloc[train_idx])<br>   plt.subplot(cv.get_n_splits(X_train), 1, i+1)<br>   plot_tree(dt, feature_names=X_train.columns, impurity=False, filled=True, rounded=True)<br>   plt.title(f'Split {i+1} (Validation Accuracy: {scores[i]:.3f})n'<br>            f'Train indices: {train_idx}n'<br>            f'Validation indices: {val_idx}')<br><br>plt.tight_layout()<\/pre>\n<p>Validation accuracy: 0.333 \u00b1\u00a00.272<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/390\/1%2AGNQ-Ub_p62rq0q0PyKuU7g.png?ssl=1\"><\/figure>\n<p>Since ShuffleSplit can create as many random splits as we want, it\u2019s useful when we want to see how our model\u2019s performance changes with different random splits, or when we need more tests to be confident about our\u00a0results.<\/p>\n<p><strong><em>Stratified ShuffleSplit<br \/><\/em><\/strong>Stratified ShuffleSplit combines random splitting with keeping the right mix of different types of data. Like Stratified K-fold, it makes sure each split has about the same percentage of each type of data as the full\u00a0dataset.<\/p>\n<p>This method gives us the best of both worlds: the freedom of random splitting and the fairness of keeping data balanced. For example, if our golf dataset has 70% \u201cyes\u201d days and 30% \u201cno\u201d days for playing golf, each random split will try to keep this same 70\u201330 mix. This is especially useful when we have uneven data, where random splitting might accidentally create test sets that don\u2019t represent our data\u00a0well.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AKwlmsszsLhvrDzgeIkaCqA.png?ssl=1\"><\/figure>\n<pre>from sklearn.model_selection import StratifiedShuffleSplit, train_test_split<br><br># Cross-validation strategy<br>cv = StratifiedShuffleSplit(n_splits=3, test_size=0.2, random_state=41)<br><br># Calculate cross-validation scores<br>scores = cross_val_score(dt, X_train, y_train, cv=cv)<br>print(f\"Validation accuracy: {scores.mean():.3f} \u00b1 {scores.std():.3f}\")<br><br># Plot trees for each split<br>plt.figure(figsize=(4, 3.5*cv.get_n_splits(X_train)))<br>for i, (train_idx, val_idx) in enumerate(cv.split(X_train, y_train)):<br>   # Train and visualize the tree for this split<br>   dt.fit(X_train.iloc[train_idx], y_train.iloc[train_idx])<br>   plt.subplot(cv.get_n_splits(X_train), 1, i+1)<br>   plot_tree(dt, feature_names=X_train.columns, impurity=False, filled=True, rounded=True)<br>   plt.title(f'Split {i+1} (Validation Accuracy: {scores[i]:.3f})n'<br>            f'Train indices: {train_idx}n'<br>            f'Validation indices: {val_idx}')<br><br>plt.tight_layout()<\/pre>\n<p>Validation accuracy: 0.556 \u00b1\u00a00.157<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/484\/1%2Ar-0U28_1PfJE2AFyBwLIGw.png?ssl=1\"><\/figure>\n<p>However, trying to keep both the random nature of the splits and the right mix of data types can be tricky. The method sometimes has to make small compromises between being perfectly random and keeping perfect proportions. In real use, these small trade-offs rarely cause problems, and having balanced test sets is usually matters more than having perfectly random\u00a0splits.<\/p>\n<h4>\ud83c\udf1f Validation Techniques Summarized &amp; Code\u00a0Summary<\/h4>\n<p>To summarize, model validation methods fall into two main categories: hold-out methods and cross-validation methods:<\/p>\n<p><strong>Hold-out Methods<br \/><\/strong>\u00b7 Train-Test Split: The simplest approach, dividing data into two parts<br \/>\u00b7 Train-Validation-Test Split: A three-way split for more complex model development<\/p>\n<p><strong>Cross-validation Methods<br \/><\/strong>Cross-validation methods make better use of available data through multiple rounds of validation:<\/p>\n<p><em>K-Fold Methods<br \/><\/em>Rather than a single split, these methods divide data into K parts:<br \/>\u00b7 Basic K-Fold: Rotates through different test sets<br \/>\u00b7 Stratified K-Fold: Maintains class balance across splits<br \/>\u00b7 Group K-Fold: Preserves data grouping<br \/>\u00b7 Time Series Split: Respects temporal order<br \/>\u00b7 Repeated K-Fold<br \/>\u00b7 Repeated Stratified K-Fold<\/p>\n<p><em>Leave-Out Methods<br \/><\/em>These methods take validation to the extreme:<br \/>\u00b7 Leave-P-Out: Tests on P data points at a time<br \/>\u00b7 Leave-One-Out: Tests on single data\u00a0points<\/p>\n<p><em>Random Methods<br \/><\/em>These introduce controlled randomness:<br \/>\u00b7 ShuffleSplit: Creates random splits repeatedly<br \/>\u00b7 Stratified ShuffleSplit: Random splits with balanced\u00a0classes<\/p>\n<pre>import pandas as pd<br>import numpy as np<br>from sklearn.tree import DecisionTreeClassifier<br>from sklearn.model_selection import (<br>    # Hold-out methods<br>    train_test_split,<br>    # K-Fold methods <br>    KFold,                   # Basic k-fold<br>    StratifiedKFold,         # Maintains class balance<br>    GroupKFold,              # For grouped data<br>    TimeSeriesSplit,         # Temporal data<br>    RepeatedKFold,           # Multiple runs<br>    RepeatedStratifiedKFold, # Multiple runs with class balance<br>    # Leave-out methods<br>    LeaveOneOut,             # Single test point<br>    LeavePOut,               # P test points<br>    # Random methods<br>    ShuffleSplit,           # Random train-test splits<br>    StratifiedShuffleSplit, # Random splits with class balance<br>    cross_val_score         # Calculate validation score<br>)<br><br><br># Load the dataset<br>dataset_dict = {<br>    'Outlook': ['sunny', 'sunny', 'overcast', 'rainy', 'rainy', 'rainy', 'overcast', <br>                'sunny', 'sunny', 'rainy', 'sunny', 'overcast', 'overcast', 'rainy',<br>                'sunny', 'overcast', 'rainy', 'sunny', 'sunny', 'rainy', 'overcast',<br>                'rainy', 'sunny', 'overcast', 'sunny', 'overcast', 'rainy', 'overcast'],<br>    'Temperature': [85.0, 80.0, 83.0, 70.0, 68.0, 65.0, 64.0, 72.0, 69.0, 75.0, 75.0,<br>                   72.0, 81.0, 71.0, 81.0, 74.0, 76.0, 78.0, 82.0, 67.0, 85.0, 73.0,<br>                   88.0, 77.0, 79.0, 80.0, 66.0, 84.0],<br>    'Humidity': [85.0, 90.0, 78.0, 96.0, 80.0, 70.0, 65.0, 95.0, 70.0, 80.0, 70.0,<br>                 90.0, 75.0, 80.0, 88.0, 92.0, 85.0, 75.0, 92.0, 90.0, 85.0, 88.0,<br>                 65.0, 70.0, 60.0, 95.0, 70.0, 78.0],<br>    'Wind': [False, True, False, False, False, True, True, False, False, False, True,<br>             True, False, True, True, False, False, True, False, True, True, False,<br>             True, False, False, True, False, False],<br>    'Play': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes',<br>             'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'No', 'No', 'Yes', 'Yes',<br>             'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes']<br>}<br><br>df = pd.DataFrame(dataset_dict)<br><br># Data preprocessing<br>df = pd.DataFrame(dataset_dict)<br>df = pd.get_dummies(df, columns=['Outlook'], prefix='', prefix_sep='', dtype=int)<br>df['Wind'] = df['Wind'].astype(int)<br><br># Set the label<br>X, y = df.drop('Play', axis=1), df['Play']<br><br>## Simple Train-Test Split<br>X_train, X_test, y_train, y_test = train_test_split(<br>    X, y, test_size=0.5, shuffle=False,<br>)<br><br>## Train-Test-Validation Split<br># First split: separate test set<br># X_temp, X_test, y_temp, y_test = train_test_split(<br>#    X, y, test_size=0.2, random_state=42<br># )<br># Second split: separate validation set<br># X_train, X_val, y_train, y_val = train_test_split(<br>#    X_temp, y_temp, test_size=0.25, random_state=42<br># )<br><br># Create model<br>dt = DecisionTreeClassifier(random_state=42)<br><br># Select validation method<br>#cv = KFold(n_splits=3, shuffle=True, random_state=42)<br>#cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=42)<br>#cv = GroupKFold(n_splits=3) # Requires groups parameter<br>#cv = TimeSeriesSplit(n_splits=3)<br>#cv = RepeatedKFold(n_splits=3, n_repeats=2, random_state=42)<br>#cv = RepeatedStratifiedKFold(n_splits=3, n_repeats=2, random_state=42)<br>cv = LeaveOneOut()<br>#cv = LeavePOut(p=3)<br>#cv = ShuffleSplit(n_splits=3, test_size=0.2, random_state=42)<br>#cv = StratifiedShuffleSplit(n_splits=3, test_size=0.3, random_state=42)<br><br># Calculate and print scores<br>scores = cross_val_score(dt, X_train, y_train, cv=cv)<br>print(f\"Validation accuracy: {scores.mean():.3f} \u00b1 {scores.std():.3f}\")<br><br># Final Fit &amp; Test<br>dt.fit(X_train, y_train)<br>test_accuracy = dt.score(X_test, y_test)<br>print(f\"Test accuracy: {test_accuracy:.3f}\")<\/pre>\n<p>Validation accuracy: 0.429 \u00b1 0.495<br \/>Test accuracy: 0.714<\/p>\n<p><strong><em>Comment on the result above:<\/em><\/strong> The large gap between validation and test accuracy, along with the very high standard deviation in validation scores, suggests our model\u2019s performance is unstable. This inconsistency likely comes from using LeaveOneOut validation on our small weather dataset\u200a\u2014\u200atesting on single data points causes performance to vary dramatically. A different validation method using larger validation sets might give us more reliable\u00a0results.<\/p>\n<h3>Choosing the Right Validation Method<\/h3>\n<p>Choosing how to validate your model isn\u2019t simple\u200a\u2014\u200adifferent situations need different approaches. Understanding which method to use can mean the difference between getting reliable or misleading results. Here are some aspect that you should consider when choosing the validation method:<\/p>\n<h4>1. Dataset\u00a0Size<\/h4>\n<p>The size of your dataset strongly influences which validation method works best. Let\u2019s look at different sizes:<\/p>\n<p><strong>Large Datasets (More than 100,000 samples)<\/strong><br \/>When you have large datasets, the amount of time to test becomes one of the main consideration. Simple hold-out validation (splitting data once into training and testing) often works well because you have enough data for reliable testing. If you need to use cross-validation, using just 3 folds or using ShuffleSplit with fewer rounds can give good results without taking too long to\u00a0run.<\/p>\n<p><strong><em>Medium Datasets (1,000 to 100,000 samples)<\/em><\/strong><br \/>For medium-sized datasets, regular K-fold cross-validation works best. Using 5 or 10 folds gives a good balance between reliable results and reasonable computing time. This amount of data is usually enough to create representative splits but not so much that testing takes too\u00a0long.<\/p>\n<p><strong><em>Small Datasets (Less than 1,000 samples)<\/em><\/strong><br \/>Small datasets, like our example of 28 days of golf records, need more careful testing. Leave-One-Out Cross-Validation or Repeated K-fold with more folds can actually work well in this case. Even though these methods take longer to run, they help us get the most reliable results when we don\u2019t have much data to work\u00a0with.<\/p>\n<h4>2. Computational Resource<\/h4>\n<p>When choosing a validation method, we need to think about our computing resources. There\u2019s a three-way balance between dataset size, how complex our model is, and which validation method we\u00a0use:<\/p>\n<p><strong>Fast Training Models<\/strong><br \/>Simple models like decision trees, logistic regression, and linear SVM can use more thorough validation methods like Leave-One-Out Cross-Validation or Repeated Stratified K-fold because they train quickly. Since each training round takes just seconds or minutes, we can afford to run many validation iterations. Even running LOOCV with its N training rounds might be practical for these algorithms.<\/p>\n<p><strong>Resource-Heavy Models<\/strong><br \/>Deep neural networks, random forests with many trees, or gradient boosting models take much longer to train. When using these models, more intensive validation methods like Repeated K-fold or Leave-P-Out might not be practical. We might need to choose simpler methods like basic K-fold or ShuffleSplit to keep testing time reasonable.<\/p>\n<p><strong>Memory Considerations<\/strong><br \/>Some methods like K-fold need to track multiple splits of data at once. ShuffleSplit can help with memory limitations since it handles one random split at a time. For large datasets with complex models (like deep neural networks that need lots of memory), simpler hold-out methods might be necessary. If we still need thorough validation with limited memory, we could use Time Series Split since it naturally processes data in sequence rather than needing all splits in memory at\u00a0once.<\/p>\n<p>When resources are limited, using a simpler validation method that we can run properly (like basic K-fold) is better than trying to run a more complex method (like Leave-P-Out) that we can\u2019t complete properly.<\/p>\n<h4>3. Class Distribution<\/h4>\n<p>Class imbalance strongly affects how we should validate our model. With unbalanced data, stratified validation methods become essential. Methods like Stratified K-fold and Stratified ShuffleSplit make sure each testing split has about the same mix of classes as our full dataset. Without using these stratified methods, some test sets might end up with no particular class at all, making it impossible to properly test how well our model makes prediction.<\/p>\n<h4>4. Time\u00a0Series<\/h4>\n<p>When working with data that changes over time, we need special validation approaches. Regular random splitting methods don\u2019t work well because time order matters.<strong> <\/strong>With time series data, we must use methods like Time Series Split that respect time\u00a0order.<\/p>\n<h4>5. Group Dependencies<\/h4>\n<p>Many datasets contain natural groups of related data. These connections in our data need special handling when we validate our models. When data points are related, we need to use methods like Group K-fold to prevent our model from accidentally learning things it shouldn\u2019t.<\/p>\n<h4>Practical Guidelines<\/h4>\n<p>This flowchart will help you select the most appropriate validation method for your data. The steps below outline a clear process for choosing the best validation approach, assuming you have sufficient computing resources.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AsJ4D7qnGYVXGIujxxzzKRw.png?ssl=1\"><\/figure>\n<h3>Final Remarks<\/h3>\n<p>Model validation is essential for building reliable machine learning models. After exploring many validation methods, from simple train-test splits to complex cross-validation approaches, we\u2019ve learned that there is always a suitable validation method for whatever data you\u00a0have.<\/p>\n<p>While machine learning keeps changing with new methods and tools, these basic rules of validation stay the same. When you understand these principles well, I believe you\u2019ll build models that people can trust and rely\u00a0on.<\/p>\n<h4>Further Reading<\/h4>\n<p>For a detailed explanation of the <a href=\"https:\/\/scikit-learn.org\/stable\/api\/sklearn.model_selection.html\">validation methods in <\/a><a href=\"https:\/\/scikit-learn.org\/stable\/api\/sklearn.model_selection.html\">scikit-learn<\/a>, readers can refer to the official documentation, which provides comprehensive information on its usage and parameters.<\/p>\n<h4>Technical Environment<\/h4>\n<p>This article uses Python 3.7 and scikit-learn 1.5. While the concepts discussed are generally applicable, specific code implementations may vary slightly with different versions.<\/p>\n<h4>About the Illustrations<\/h4>\n<p>Unless otherwise noted, all images are created by the author, incorporating licensed design elements from Canva\u00a0Pro.<\/p>\n<p>\ud835\ude4e\ud835\ude5a\ud835\ude5a \ud835\ude62\ud835\ude64\ud835\ude67\ud835\ude5a \ud835\ude48\ud835\ude64\ud835\ude59\ud835\ude5a\ud835\ude61 \ud835\ude40\ud835\ude6b\ud835\ude56\ud835\ude61\ud835\ude6a\ud835\ude56\ud835\ude69\ud835\ude5e\ud835\ude64\ud835\ude63 &amp; \ud835\ude4a\ud835\ude65\ud835\ude69\ud835\ude5e\ud835\ude62\ud835\ude5e\ud835\ude6f\ud835\ude56\ud835\ude69\ud835\ude5e\ud835\ude64\ud835\ude63 \ud835\ude62\ud835\ude5a\ud835\ude69\ud835\ude5d\ud835\ude64\ud835\ude59\ud835\ude68 \ud835\ude5d\ud835\ude5a\ud835\ude67\ud835\ude5a:<\/p>\n<p><a href=\"https:\/\/medium.com\/@samybaladram\/list\/331287896864\">Model Evaluation &amp; Optimization<\/a><\/p>\n<p>\ud835\ude54\ud835\ude64\ud835\ude6a \ud835\ude62\ud835\ude5e\ud835\ude5c\ud835\ude5d\ud835\ude69 \ud835\ude56\ud835\ude61\ud835\ude68\ud835\ude64 \ud835\ude61\ud835\ude5e\ud835\ude60\ud835\ude5a:<\/p>\n<ul>\n<li><a href=\"https:\/\/medium.com\/@samybaladram\/list\/b3586f0a772c\">Classification Algorithms<\/a><\/li>\n<li><a href=\"https:\/\/medium.com\/@samybaladram\/list\/673fc83cd7db\">Ensemble Learning<\/a><\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/medium.com\/_\/stat?event=post.clientViewed&amp;referrerSource=full_rss&amp;postId=eb13bbdc8f88\" width=\"1\" height=\"1\" alt=\"\"><\/p>\n<hr>\n<p><a href=\"https:\/\/towardsdatascience.com\/model-validation-techniques-explained-a-visual-guide-with-code-examples-eb13bbdc8f88\">Model Validation Techniques, Explained: A Visual Guide with Code Examples<\/a> was originally published in <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a> on Medium, where people are continuing the conversation by highlighting and responding to this story.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Samy Baladram<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/medium.com\/m\/global-identity-2?redirectUrl=https%3A%2F%2Ftowardsdatascience.com%2Fmodel-validation-techniques-explained-a-visual-guide-with-code-examples-eb13bbdc8f88\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Model Validation Techniques, Explained: A Visual Guide with Code Examples MODEL EVALUATION &amp; OPTIMIZATION 12 must-know methods to validate your machine\u00a0learning Every day, machines make millions of predictions\u200a\u2014\u200afrom detecting objects in photos to helping doctors find diseases. But before trusting these predictions, we need to know if they\u2019re any good. After all, no one would [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,275,70,158,274,273],"tags":[272,103,276],"class_list":["post-292","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-crossvalidation","category-machine-learning","category-tips-and-tricks","category-train-test-split","category-validation","tag-methods","tag-model","tag-validation"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/292"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=292"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/292\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=292"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=292"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=292"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}