{"id":2264,"date":"2025-03-07T07:00:50","date_gmt":"2025-03-07T07:00:50","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/03\/07\/how-to-spot-and-prevent-model-drift-before-it-impacts-your-business\/"},"modified":"2025-03-07T07:00:50","modified_gmt":"2025-03-07T07:00:50","slug":"how-to-spot-and-prevent-model-drift-before-it-impacts-your-business","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/03\/07\/how-to-spot-and-prevent-model-drift-before-it-impacts-your-business\/","title":{"rendered":"How to Spot and Prevent Model Drift Before it Impacts Your Business"},"content":{"rendered":"<p>    How to Spot and Prevent Model Drift Before it Impacts Your Business<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<p class=\"wp-block-paragraph\">Despite the AI hype, many tech companies still rely heavily on machine learning to power critical applications, from personalized recommendations to fraud detection.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">I\u2019ve seen firsthand how undetected drifts can result in significant costs \u2014 missed fraud detection, lost revenue, and suboptimal business outcomes, just to name a few. So, it\u2019s crucial to have robust monitoring in place if your company has deployed or plans to deploy machine learning models into production.<\/p>\n<p class=\"wp-block-paragraph\">Undetected <a href=\"https:\/\/towardsdatascience.com\/tag\/model-drift\/\" title=\"Model Drift\">Model Drift<\/a> can lead to significant financial losses, operational inefficiencies, and even damage to a company\u2019s reputation. To mitigate these risks, it\u2019s important to have effective model monitoring, which involves:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Tracking model performance<\/li>\n<li class=\"wp-block-list-item\">Monitoring feature distributions<\/li>\n<li class=\"wp-block-list-item\">Detecting both univariate and multivariate drifts<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">A well-implemented monitoring system can help identify issues early, saving considerable time, money, and resources.<\/p>\n<p class=\"wp-block-paragraph\">In this comprehensive guide, I\u2019ll provide a framework on how to think about and implement effective <a href=\"https:\/\/towardsdatascience.com\/tag\/model-monitoring\/\" title=\"Model Monitoring\">Model Monitoring<\/a>, helping you stay ahead of potential issues and ensure stability and reliability of your models in production.<\/p>\n<h2 class=\"wp-block-heading\">What\u2019s the difference between feature drift and score drift?<\/h2>\n<p class=\"wp-block-paragraph\">Score drift refers to a gradual change in the distribution of model scores. If left unchecked, this could lead to a <strong>decline in model performance<\/strong>, making the model less accurate over time.<\/p>\n<p class=\"wp-block-paragraph\">On the other hand, feature drift occurs when one or more features experience changes in the distribution. These changes in feature values can affect the underlying relationships that the model has learned, and ultimately lead to inaccurate model predictions.<\/p>\n<h2 class=\"wp-block-heading\">Simulating score shifts<\/h2>\n<p class=\"wp-block-paragraph\">To model real-world fraud detection challenges, I created a synthetic dataset with five financial transaction features.<\/p>\n<p class=\"wp-block-paragraph\">The <strong>reference dataset<\/strong> represents the original distribution, while the <strong>production dataset<\/strong> introduces shifts to simulate an increase in <strong>high-value transactions without PIN verification on newer accounts,<\/strong> indicating an increase in fraud.<\/p>\n<p class=\"wp-block-paragraph\">Each feature has different underlying distributions:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>Transaction Amount:<\/strong> Log-normal distribution (right-skewed with a long tail)<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Account Age (months):<\/strong> clipped normal distribution between 0 to 60 (assuming a 5-year-old company)<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Time Since Last Transaction<\/strong>: Exponential distribution<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Transaction Count<\/strong>: Poisson distribution<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Entered PIN:<\/strong> Binomial distribution.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">To approximate model scores, I randomly assigned weights to these features and applied a sigmoid function to constrain predictions between 0 to 1. This mimics how a logistic regression fraud model generates risk scores.<\/p>\n<p class=\"wp-block-paragraph\">As shown in the plot below:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>Drifted features:<\/strong> <em>Transaction Amount, Account Age, Transaction Count, and Entered PIN<\/em> all experienced shifts in distribution, scale, or relationships.<\/li>\n<\/ul>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"e1dedf\" data-has-transparency=\"true\" style=\"--dominant-color: #e1dedf;\" fetchpriority=\"high\" decoding=\"async\" width=\"768\" height=\"863\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/03\/Model-drift-1.png?resize=768%2C863&#038;ssl=1\" alt=\"\" class=\"wp-image-598830 has-transparency\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/03\/Model-drift-1.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/03\/Model-drift-1-267x300.png 267w\" sizes=\"(max-width: 768px) 100vw, 768px\"><figcaption class=\"wp-element-caption\">Distribution of drifted features (image by author)<\/figcaption><\/figure>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>Stable feature:<\/strong> <em>Time Since Last Transaction<\/em> remained unchanged.<\/li>\n<\/ul>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" data-dominant-color=\"e0dfe1\" data-has-transparency=\"true\" style=\"--dominant-color: #e0dfe1;\" decoding=\"async\" width=\"564\" height=\"646\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/03\/Model-drift-2.png?resize=564%2C646&#038;ssl=1\" alt=\"\" class=\"wp-image-598831 has-transparency\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/03\/Model-drift-2.png 564w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/03\/Model-drift-2-262x300.png 262w\" sizes=\"(max-width: 564px) 100vw, 564px\"><figcaption class=\"wp-element-caption\">Distribution of stable feature (image by author)<\/figcaption><\/figure>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>Drifted scores:<\/strong> As a result of the drifted features, the distribution in model scores has also changed.<\/li>\n<\/ul>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" data-dominant-color=\"e1dfe1\" data-has-transparency=\"true\" style=\"--dominant-color: #e1dfe1;\" decoding=\"async\" width=\"564\" height=\"646\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/03\/Model-drift-3.png?resize=564%2C646&#038;ssl=1\" alt=\"\" class=\"wp-image-598832 has-transparency\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/03\/Model-drift-3.png 564w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/03\/Model-drift-3-262x300.png 262w\" sizes=\"(max-width: 564px) 100vw, 564px\"><figcaption class=\"wp-element-caption\">Distribution of model scores (image by author)<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">This setup allows us to analyze how feature drift impacts model scores in production.<\/p>\n<h3 class=\"wp-block-heading\">Detecting model score drift using PSI<\/h3>\n<p class=\"wp-block-paragraph\">To monitor model scores, I used population stability index (PSI) to measure how much model score distribution has shifted over time.<\/p>\n<p class=\"wp-block-paragraph\">PSI works by binning continuous model scores and comparing the proportion of scores in each bin between the reference and production datasets. It compares the differences in proportions and their logarithmic ratios to compute a single summary statistic to quantify the drift.<\/p>\n<p class=\"wp-block-paragraph\"><strong><a href=\"https:\/\/towardsdatascience.com\/tag\/python\/\" title=\"Python\">Python<\/a> implementation:<\/strong><\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Define function to calculate PSI given two datasets\ndef calculate_psi(reference, production, bins=10):\n  # Discretize scores into bins\n  min_val, max_val = 0, 1\n  bin_edges = np.linspace(min_val, max_val, bins + 1)\n\n  # Calculate proportions in each bin\n  ref_counts, _ = np.histogram(reference, bins=bin_edges)\n  prod_counts, _ = np.histogram(production, bins=bin_edges)\n\n  ref_proportions = ref_counts \/ len(reference)\n  prod_proportions = prod_counts \/ len(production)\n  \n  # Avoid division by zero\n  ref_proportions = np.clip(ref_proportions, 1e-8, 1)\n  prod_proportions = np.clip(prod_proportions, 1e-8, 1)\n\n  # Calculate PSI for each bin\n  psi = np.sum((ref_proportions - prod_proportions) * np.log(ref_proportions \/ prod_proportions))\n\n  return psi\n  \n# Calculate PSI\npsi_value = calculate_psi(ref_data['model_score'], prod_data['model_score'], bins=10)\nprint(f\"PSI Value: {psi_value}\")<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Below is a summary of how to interpret PSI values:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>PSI &lt; 0.1<\/strong>: No drift, or very minor drift (distributions are almost identical).\n<\/li>\n<li class=\"wp-block-list-item\">\n<strong>0.1 \u2264 PSI &lt; 0.25<\/strong>: Some drift. The distributions are somewhat different.\n<\/li>\n<li class=\"wp-block-list-item\">\n<strong>0.25 \u2264 PSI &lt; 0.5<\/strong>: Moderate drift. A noticeable shift between the reference and production distributions.\n<\/li>\n<li class=\"wp-block-list-item\">\n<strong>PSI \u2265 0.5: <\/strong>Significant drift. There is a large shift, indicating that the distribution in production has changed substantially from the reference data.<\/li>\n<\/ul>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"e2d5dd\" data-has-transparency=\"true\" style=\"--dominant-color: #e2d5dd;\" loading=\"lazy\" decoding=\"async\" width=\"848\" height=\"544\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/03\/Model-drift-4.png?resize=848%2C544&#038;ssl=1\" alt=\"\" class=\"wp-image-598833 has-transparency\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/03\/Model-drift-4.png 848w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/03\/Model-drift-4-300x192.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/03\/Model-drift-4-768x493.png 768w\" sizes=\"auto, (max-width: 848px) 100vw, 848px\"><figcaption class=\"wp-element-caption\">Histogram of model score distributions (image by author)<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">The <strong>PSI value of 0.6374<\/strong> suggests a significant drift between our reference and production datasets. This aligns with the histogram of model score distributions, which visually confirms the shift towards higher scores in production \u2014 <strong>indicating an increase in risky transactions.<\/strong><\/p>\n<h2 class=\"wp-block-heading\">Detecting feature drift<\/h2>\n<p class=\"wp-block-paragraph\"><strong>Kolmogorov-Smirnov test for numeric features<\/strong><\/p>\n<p class=\"wp-block-paragraph\">The Kolmogorov-Smirnov (K-S) test is my preferred method for detecting drift in numeric features, because it is <strong>non-parametric,<\/strong> meaning it doesn\u2019t assume a normal distribution.<\/p>\n<p class=\"wp-block-paragraph\">The test compares a feature\u2019s distribution in the reference and production datasets by measuring the maximum difference between the empirical cumulative distribution functions (ECDFs). The resulting K-S statistic ranges from 0 to 1:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">0 indicates no difference between the two distributions.<\/li>\n<li class=\"wp-block-list-item\">Values closer to 1 suggest a greater shift.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\"><strong>Python implementation:<\/strong><\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Create an empty dataframe\nks_results = pd.DataFrame(columns=['Feature', 'KS Statistic', 'p-value', 'Drift Detected'])\n\n# Loop through all features and perform the K-S test\nfor col in numeric_cols:\n    ks_stat, p_value = ks_2samp(ref_data[col], prod_data[col])\n    drift_detected = p_value &lt; 0.05\n\t\t\n\t\t# Store results in the dataframe\n    ks_results = pd.concat([\n        ks_results,\n        pd.DataFrame({\n            'Feature': [col],\n            'KS Statistic': [ks_stat],\n            'p-value': [p_value],\n            'Drift Detected': [drift_detected]\n        })\n    ], ignore_index=True)\n\n<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Below are ECDF charts of the four numeric features in our dataset:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"e8e7e8\" data-has-transparency=\"true\" style=\"--dominant-color: #e8e7e8;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"681\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/03\/Model-drift-5-1024x681.png?resize=1024%2C681&#038;ssl=1\" alt=\"\" class=\"wp-image-598834 has-transparency\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/03\/Model-drift-5-1024x681.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/03\/Model-drift-5-300x200.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/03\/Model-drift-5-768x511.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/03\/Model-drift-5.png 1150w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">ECDFs of four numeric features (image by author)<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">Let\u2019s look at the account age feature as an example: the x-axis represents account age (0-50 months), while the y-axis shows the ECDF for both reference and production datasets. The production dataset skews towards newer accounts, as it has a larger proportion of observations with lower account ages.<\/p>\n<p class=\"wp-block-paragraph\"><strong>Chi-Square test for categorical features<\/strong><\/p>\n<p class=\"wp-block-paragraph\">To detect shifts in categorical and boolean features, I like to use the Chi-Square test.<\/p>\n<p class=\"wp-block-paragraph\">This test compares the frequency distribution of a categorical feature in the reference and production datasets, and returns two values:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>Chi-Square statistic:<\/strong> A higher value indicates a greater shift between the reference and production datasets.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>P-value<\/strong>: A p-value below 0.05 suggests that the difference between the reference and production datasets is statistically significant, indicating potential feature drift.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\"><strong>Python implementation:<\/strong><\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Create empty dataframe with corresponding column names\nchi2_results = pd.DataFrame(columns=['Feature', 'Chi-Square Statistic', 'p-value', 'Drift Detected'])\n\nfor col in categorical_cols:\n    # Get normalized value counts for both reference and production datasets\n    ref_counts = ref_data[col].value_counts(normalize=True)\n    prod_counts = prod_data[col].value_counts(normalize=True)\n\n    # Ensure all categories are represented in both\n    all_categories = set(ref_counts.index).union(set(prod_counts.index))\n    ref_counts = ref_counts.reindex(all_categories, fill_value=0)\n    prod_counts = prod_counts.reindex(all_categories, fill_value=0)\n\n    # Create contingency table\n    contingency_table = np.array([ref_counts * len(ref_data), prod_counts * len(prod_data)])\n\n    # Perform Chi-Square test\n    chi2_stat, p_value, _, _ = chi2_contingency(contingency_table)\n    drift_detected = p_value &lt; 0.05\n\n    # Store results in chi2_results dataframe\n    chi2_results = pd.concat([\n        chi2_results,\n        pd.DataFrame({\n            'Feature': [col],\n            'Chi-Square Statistic': [chi2_stat],\n            'p-value': [p_value],\n            'Drift Detected': [drift_detected]\n        })\n    ], ignore_index=True)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">The Chi-Square statistic of 57.31 with a p-value of 3.72e-14 confirms a large shift in our categorical feature, <code>Entered PIN<\/code>. This finding aligns with the histogram below, which visually illustrates the shift:<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"e1e0e2\" data-has-transparency=\"true\" style=\"--dominant-color: #e1e0e2;\" loading=\"lazy\" decoding=\"async\" width=\"375\" height=\"433\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/03\/Model-drift-6.png?resize=375%2C433&#038;ssl=1\" alt=\"\" class=\"wp-image-598835 has-transparency\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/03\/Model-drift-6.png 375w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/03\/Model-drift-6-260x300.png 260w\" sizes=\"auto, (max-width: 375px) 100vw, 375px\"><figcaption class=\"wp-element-caption\">Distribution of categorical feature (image by author)<\/figcaption><\/figure>\n<h2 class=\"wp-block-heading\">Detecting multivariate shifts<\/h2>\n<p class=\"wp-block-paragraph\"><strong><a href=\"https:\/\/towardsdatascience.com\/tag\/spearman-correlation\/\" title=\"Spearman Correlation\">Spearman Correlation<\/a> for shifts in pairwise interactions<\/strong><\/p>\n<p class=\"wp-block-paragraph\">In addition to monitoring individual feature shifts, it\u2019s important to track <strong>shifts in relationships or interactions between features<\/strong>, known as multivariate shifts. Even if the distributions of individual features remain stable, multivariate shifts can signal meaningful differences in the data.<\/p>\n<p class=\"wp-block-paragraph\">By default, Pandas\u2019 <code>.corr()<\/code> function calculates Pearson correlation, which only captures linear relationships between variables. However, <strong>relationships between features are often non-linear<\/strong> yet still follow a consistent trend.<\/p>\n<p class=\"wp-block-paragraph\">To capture this, we use <strong>Spearman correlation<\/strong> to measure <strong>monotonic relationships<\/strong> between features. It captures whether features <strong>change together<\/strong> in a consistent direction, even if their relationship isn\u2019t strictly linear.<\/p>\n<p class=\"wp-block-paragraph\">To assess shifts in feature relationships, we compare:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>Reference correlation<\/strong> (<code>ref_corr<\/code>): Captures historical feature relationships in the reference dataset.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Production correlation<\/strong> (<code>prod_corr<\/code>): Captures new feature relationships in production.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Absolute difference in correlation<\/strong>: Measures how much feature relationships have shifted between the reference and production datasets. <strong>Higher values indicate more significant shifts.<\/strong>\n<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\"><strong>Python implementation:<\/strong><\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Calculate correlation matrices\nref_corr = ref_data.corr(method='spearman')\nprod_corr = prod_data.corr(method='spearman')\n\n# Calculate correlation difference\ncorr_diff = abs(ref_corr - prod_corr)<\/code><\/pre>\n<p class=\"wp-block-paragraph\"><strong>Example: Change in correlation<\/strong><\/p>\n<p class=\"wp-block-paragraph\">Now, let\u2019s look at the correlation between <code>transaction_amount<\/code> and <code>account_age_in_months<\/code>:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">In <code>ref_corr<\/code>, the correlation is 0.00095, indicating a weak relationship between the two features.<\/li>\n<li class=\"wp-block-list-item\">In <code>prod_corr<\/code>, the correlation is -0.0325, indicating a weak negative correlation.<\/li>\n<li class=\"wp-block-list-item\">Absolute difference in the Spearman correlation is 0.0335, which is a small but noticeable shift.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">The absolute difference in correlation indicates a <strong>shift<\/strong> in the relationship between <code>transaction_amount<\/code> and <code>account_age_in_months<\/code>.<\/p>\n<p class=\"wp-block-paragraph\">There used to be no relationship between these two features, but the production dataset indicates that there is now a weak negative correlation, meaning that newer accounts have higher transaction amounts. This is spot on!<\/p>\n<p class=\"wp-block-paragraph\"><strong>Autoencoder for complex, high-dimensional multivariate shifts<\/strong><\/p>\n<p class=\"wp-block-paragraph\">In addition to monitoring pairwise interactions, we can also look for shifts across more dimensions in the data.<\/p>\n<p class=\"wp-block-paragraph\">Autoencoders are powerful tools for detecting <strong>high-dimensional multivariate shifts<\/strong>, where multiple features collectively change in ways that may not be apparent from looking at individual feature distributions or pairwise correlations.<\/p>\n<p class=\"wp-block-paragraph\">An autoencoder is a neural network that learns a compressed representation of data through two components:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>Encoder:<\/strong> Compresses input data into a lower-dimensional representation.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Decoder<\/strong>: Reconstructs the original input from the compressed representation.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">To detect shifts, we compare the <strong>reconstructed output<\/strong> to the <strong>original input<\/strong> and compute the <strong>reconstruction loss<\/strong>.<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>Low reconstruction loss<\/strong> \u2192 The autoencoder successfully reconstructs the data, meaning the new observations are similar to what it has seen and learned.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>High reconstruction loss<\/strong> \u2192 The production data deviates significantly from the learned patterns, indicating potential drift.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">Unlike traditional drift metrics that focus on <strong>individual features or pairwise relationships<\/strong>, autoencoders capture <strong>complex, non-linear dependencies<\/strong> across multiple variables simultaneously.<\/p>\n<p class=\"wp-block-paragraph\"><strong>Python implementation:<\/strong><\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">ref_features = ref_data[numeric_cols + categorical_cols]\nprod_features = prod_data[numeric_cols + categorical_cols]\n\n# Normalize the data\nscaler = StandardScaler()\nref_scaled = scaler.fit_transform(ref_features)\nprod_scaled = scaler.transform(prod_features)\n\n# Split reference data into train and validation\nnp.random.shuffle(ref_scaled)\ntrain_size = int(0.8 * len(ref_scaled))\ntrain_data = ref_scaled[:train_size]\nval_data = ref_scaled[train_size:]\n\n# Build autoencoder\ninput_dim = ref_features.shape[1]\nencoding_dim = 3 \n# Input layer\ninput_layer = Input(shape=(input_dim, ))\n# Encoder\nencoded = Dense(8, activation=\"relu\")(input_layer)\nencoded = Dense(encoding_dim, activation=\"relu\")(encoded)\n# Decoder\ndecoded = Dense(8, activation=\"relu\")(encoded)\ndecoded = Dense(input_dim, activation=\"linear\")(decoded)\n# Autoencoder\nautoencoder = Model(input_layer, decoded)\nautoencoder.compile(optimizer=\"adam\", loss=\"mse\")\n\n# Train autoencoder\nhistory = autoencoder.fit(\n    train_data, train_data,\n    epochs=50,\n    batch_size=64,\n    shuffle=True,\n    validation_data=(val_data, val_data),\n    verbose=0\n)\n\n# Calculate reconstruction error\nref_pred = autoencoder.predict(ref_scaled, verbose=0)\nprod_pred = autoencoder.predict(prod_scaled, verbose=0)\n\nref_mse = np.mean(np.power(ref_scaled - ref_pred, 2), axis=1)\nprod_mse = np.mean(np.power(prod_scaled - prod_pred, 2), axis=1)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">The charts below show the distribution of reconstruction loss between both datasets.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"e1e1e3\" data-has-transparency=\"true\" style=\"--dominant-color: #e1e1e3;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"418\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/03\/Model-drift-7-1024x418.png?resize=1024%2C418&#038;ssl=1\" alt=\"\" class=\"wp-image-598836 has-transparency\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/03\/Model-drift-7-1024x418.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/03\/Model-drift-7-300x123.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/03\/Model-drift-7-768x314.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/03\/Model-drift-7.png 1146w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Distribution of reconstruction loss between actuals and predictions (image by author)<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">The production dataset has a higher mean reconstruction error than that of the reference dataset, indicating a shift in the overall data. This aligns with the changes in the production dataset with a higher number of newer accounts with high-value transactions.<\/p>\n<h2 class=\"wp-block-heading\">Summarizing<\/h2>\n<p class=\"wp-block-paragraph\">Model monitoring is an essential, yet often overlooked, responsibility for data scientists and machine learning engineers.<\/p>\n<p class=\"wp-block-paragraph\">All the statistical methods led to the same conclusion, which aligns with the observed shifts in the data: they detected <strong>a trend in production towards newer accounts making higher-value transactions<\/strong>. This shift resulted in higher model scores, signaling an increase in potential fraud.<\/p>\n<p class=\"wp-block-paragraph\">In this post, I covered techniques for detecting drift on three different levels:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>Model score drift:<\/strong> Using <strong>Population Stability Index<\/strong> (PSI)<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Individual feature drift<\/strong>: Using <strong>Kolmogorov-Smirnov tes<\/strong>t for numeric features and <strong>Chi-Square test<\/strong> for categorical features<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Multivariate drift<\/strong>: Using <strong>Spearman correlation<\/strong> for pairwise interactions and <strong>autoencoders<\/strong> for high-dimensional, multivariate shifts.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">These are just a few of the techniques I rely on for comprehensive monitoring \u2014 there are plenty of other equally valid statistical methods that can also detect drift effectively.<\/p>\n<p class=\"wp-block-paragraph\">Detected shifts often point to underlying issues that warrant further investigation. The root cause could be as serious as a data collection bug, or as minor as a time change like daylight savings time adjustments.<\/p>\n<p class=\"wp-block-paragraph\">There are also fantastic python packages, like <a href=\"http:\/\/evidently.ai\/\">evidently.ai<\/a>, that automate many of these comparisons. However, I believe there\u2019s significant value in deeply understanding the statistical techniques behind drift detection, rather than relying solely on these tools.<\/p>\n<p class=\"wp-block-paragraph\">What\u2019s the model monitoring process like at places you\u2019ve worked?<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<p class=\"wp-block-paragraph\"><strong>Want to build your AI skills?<\/strong><\/p>\n<p class=\"wp-block-paragraph\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f449-1f3fb.png?ssl=1\" alt=\"\ud83d\udc49\ud83c\udffb\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"> I run the <a href=\"http:\/\/aiweekender.substack.com\/\"><strong>AI Weekender<\/strong><\/a><strong> and<\/strong> write weekly blog posts on data science, AI weekend projects, career advice for professionals in data.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h3 class=\"wp-block-heading\">Resources<\/h3>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Jupyter Notebook: <a href=\"https:\/\/colab.research.google.com\/drive\/1qQFKjg3wLWmj2z4w6_U_xsqRREaB2sBP?authuser=3#scrollTo=EdpoxjNY_CUX\">https:\/\/colab.research.google.com\/drive\/1qQFKjg3wLWmj2z4w6_U_xsqRREaB2sBP?authuser=3#scrollTo=EdpoxjNY_CUX<\/a>\n<\/li>\n<\/ul>\n<p>The post <a href=\"https:\/\/towardsdatascience.com\/how-to-spot-and-prevent-model-drift-before-it-impacts-your-business\/\">How to Spot and Prevent Model Drift Before it Impacts Your Business<\/a> appeared first on <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a>.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Claudia Ng<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/towardsdatascience.com\/how-to-spot-and-prevent-model-drift-before-it-impacts-your-business\/\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>How to Spot and Prevent Model Drift Before it Impacts Your Business Despite the AI hype, many tech companies still rely heavily on machine learning to power critical applications, from personalized recommendations to fraud detection.\u00a0 I\u2019ve seen firsthand how undetected drifts can result in significant costs \u2014 missed fraud detection, lost revenue, and suboptimal business [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,70,1951,1952,157,1953,238],"tags":[738,103,1954],"class_list":["post-2264","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-machine-learning","category-model-drift","category-model-monitoring","category-python","category-spearman-correlation","category-statistics","tag-drift","tag-model","tag-monitoring"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/2264"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=2264"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/2264\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=2264"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=2264"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=2264"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}