{"id":3719,"date":"2025-05-10T07:02:46","date_gmt":"2025-05-10T07:02:46","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/05\/10\/time-series-forecasting-made-simple-part-2-customizing-baseline-models\/"},"modified":"2025-05-10T07:02:46","modified_gmt":"2025-05-10T07:02:46","slug":"time-series-forecasting-made-simple-part-2-customizing-baseline-models","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/05\/10\/time-series-forecasting-made-simple-part-2-customizing-baseline-models\/","title":{"rendered":"Time Series Forecasting Made Simple (Part 2): Customizing Baseline Models"},"content":{"rendered":"<p>    Time Series Forecasting Made Simple (Part 2): Customizing Baseline Models<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<p class=\"wp-block-paragraph\"><mdspan datatext=\"el1746819645016\" class=\"mdspan-comment\">Thank<\/mdspan> you for the kind response to Part 1, it\u2019s been encouraging to see so many readers interested in time series forecasting.<\/p>\n<p class=\"wp-block-paragraph\">In <a href=\"https:\/\/towardsdatascience.com\/time-series-forecasting-made-simple-part-1-decomposition-baseline-models\/\"><strong>Part 1 of this series<\/strong><\/a>, we broke down time series data into trend, seasonality, and noise, discussed when to use additive versus multiplicative models, and built a Seasonal Naive baseline forecast using Daily Temperature Data. We evaluated its performance using MAPE (Mean Absolute Percentage Error), which came out to 28.23%.<\/p>\n<p class=\"wp-block-paragraph\">While the Seasonal Naive model captured the broad seasonal pattern, we also saw that it may not be the best fit for this dataset, as it doesn\u2019t account for subtle shifts in seasonality or long-term trends. This highlights the need to go beyond basic baselines and customize forecasting models to better reflect the underlying data for improved accuracy.<\/p>\n<p class=\"wp-block-paragraph\">When we applied the Seasonal Naive baseline model, we didn\u2019t account for the trend or use any mathematical formulas, we simply predicted each value based on the same day from the previous year.<\/p>\n<p class=\"wp-block-paragraph\">First, let\u2019s take a look at the table below, which outlines some common baseline models and when to use each one.<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/baseline-model-table.png?ssl=1\" alt=\"\" class=\"wp-image-602928\"><figcaption class=\"wp-element-caption\"><strong>Table:<\/strong> Common baseline forecasting models, their descriptions, and when to use each based on data patterns.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">These are some of the most commonly used baseline models across various industries.<\/p>\n<p class=\"wp-block-paragraph\">But what if the data shows both <strong>trend and seasonality<\/strong>? In such cases, these simple baseline models might not be enough. As we saw in <strong>Part 1<\/strong>, the <strong>Seasonal Naive model<\/strong> struggled to fully capture the patterns in the data, resulting in a MAPE of <strong>28.23%<\/strong>.<\/p>\n<p class=\"wp-block-paragraph\">So, should we jump straight to <strong>ARIMA<\/strong> or another complex forecasting model?<\/p>\n<p class=\"wp-block-paragraph\"><strong>Not necessarily.<\/strong><\/p>\n<p class=\"wp-block-paragraph\">Before reaching for advanced tools, we can first build our baseline model based on the structure of the data. This helps us build a stronger benchmark\u200a\u2014\u200aand often, it\u2019s enough to decide whether a more sophisticated model is even needed.<\/p>\n<p class=\"wp-block-paragraph\">Now that we have examined the structure of the data, which clearly includes both trend and seasonality, we can build a baseline model that takes both components into account.<\/p>\n<p class=\"wp-block-paragraph\">In Part 1, we used the <strong>seasonal decompose<\/strong> method in Python to visualize the trend and seasonality in our data. Now, we\u2019ll take this a step further by actually extracting the trend and seasonal components from that decomposition and using them to build a <strong>baseline forecast<\/strong>.<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/decomposition.png?ssl=1\" alt=\"\" class=\"wp-image-602932\"><figcaption class=\"wp-element-caption\">Decomposition of daily temperatures showing trend, seasonal cycles and random fluctuations.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">But before we get started, let\u2019s see how the seasonal decompose method figures out the trend and seasonality in our data.<\/p>\n<p class=\"wp-block-paragraph\">Before using the built-in function, let\u2019s take a small sample from our temperature data and manually go through how the seasonal_decompose method separates trend, seasonality and residuals.<\/p>\n<p class=\"wp-block-paragraph\">This will help us understand what\u2019s really happening behind the scenes.<\/p>\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/sample-temperatures-data.png?ssl=1\" alt=\"\" class=\"wp-image-602933\"><figcaption class=\"wp-element-caption\">Sample from Temperatures Data<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">Here, we consider a 14-day sample from the temperature dataset to better understand how decomposition works step by step.<\/p>\n<p class=\"wp-block-paragraph\">We already know that this dataset follows an additive structure, which means each observed value is made up of three parts:<\/p>\n<p class=\"wp-block-paragraph\">Observed Value = Trend + Seasonality + Residual.<\/p>\n<p class=\"wp-block-paragraph\">First, let\u2019s look at how the trend is calculated for this sample.<br \/>We\u2019ll use a 3-day centered moving average, which means each value is averaged with its immediate neighbor on both sides. This helps smooth out day-to-day variations in the data.<\/p>\n<p class=\"wp-block-paragraph\">For example, to calculate the trend for February 1, 1981:<br \/>Trend = (20.7 + 17.9 + 18.8) \/ 3<br \/>         = 19.13<\/p>\n<p class=\"wp-block-paragraph\">This way, we calculate the trend component for all 14 days in the sample.<\/p>\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/trend-expl-in-blog-part2-1.png?ssl=1\" alt=\"\" class=\"wp-image-603090\"><\/figure>\n<p class=\"wp-block-paragraph\">Here\u2019s the table showing the 3-day centered moving average trend values for each day in our 14-day sample.<\/p>\n<p class=\"wp-block-paragraph\">As we can see, the trend values for the first and last dates are \u2018NaN\u2019 because there aren\u2019t enough neighboring values to calculate a centered average at those points.<\/p>\n<p class=\"wp-block-paragraph\">We\u2019ll revisit those missing values once we finish computing the seasonality and residual components.<\/p>\n<p class=\"wp-block-paragraph\">Before we dive into seasonality, there\u2019s something we said earlier that we should come back to. We mentioned that using a 3-day centered moving average helps in smoothing out day to day variations in the data \u2014 but what does that really mean?<br \/>Let\u2019s look at a quick example to make it clearer.<\/p>\n<p class=\"wp-block-paragraph\">We\u2019ve already discussed that the trend reflects the overall direction the data is moving in.<\/p>\n<p class=\"wp-block-paragraph\">Temperatures are generally higher in summer and lower in winter, that\u2019s the broad seasonal pattern we expect.<\/p>\n<p class=\"wp-block-paragraph\">But even within summer, temperatures don\u2019t stay exactly the same every day. Some days might be slightly cooler or warmer than others. These are natural daily fluctuations, not signs of sudden climate shifts.<\/p>\n<p class=\"wp-block-paragraph\">The moving average helps us smooth out these short-term ups and downs so we can focus on the bigger picture, the underlying trend across time.<\/p>\n<p class=\"wp-block-paragraph\">Since we\u2019re working with a small sample here, the trend may not stand out clearly just yet.<\/p>\n<p class=\"wp-block-paragraph\">But if you look at the full decomposition plot above, you can see how the trend captures the overall direction the data is moving in, gradually rising, falling or staying steady over time.<\/p>\n<p class=\"wp-block-paragraph\">Now that we\u2019ve calculated the trend, it\u2019s time to move on to the next component: seasonality.<\/p>\n<p class=\"wp-block-paragraph\">We know that in an additive model:<br \/>Observed Value = Trend + Seasonality + Residual<\/p>\n<p class=\"wp-block-paragraph\">To isolate seasonality, we start by subtracting the trend from the observed values:<br \/>Observed Value \u2013 Trend = Seasonality + Residual<\/p>\n<p class=\"wp-block-paragraph\">The result is known as the detrended series \u2014 a combination of the seasonal pattern and any remaining random noise.<\/p>\n<p class=\"wp-block-paragraph\">Let\u2019s take January 2, 1981 as an example.<\/p>\n<p class=\"wp-block-paragraph\">Observed temperature: 17.9\u00b0C<\/p>\n<p class=\"wp-block-paragraph\">Trend: 19.13\u00b0C<\/p>\n<p class=\"wp-block-paragraph\">So, the detrended value is:<\/p>\n<p class=\"wp-block-paragraph\">Detrended = 17.9 \u2013 19.1 = -1.23<\/p>\n<p class=\"wp-block-paragraph\">In the same way, we calculate the detrended values for all the dates in our sample.<\/p>\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/detrend.png?ssl=1\" alt=\"\" class=\"wp-image-603224\"><\/figure>\n<p class=\"wp-block-paragraph\">The table above shows the detrended values for each date in our 14-day sample.<\/p>\n<p class=\"wp-block-paragraph\">Since we\u2019re working with 14 consecutive days, we\u2019ll assume a weekly seasonality and assign a Day Index (from 1 to 7) to each date based on its position in that 7-day cycle.<\/p>\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/detrend-1.png?ssl=1\" alt=\"\" class=\"wp-image-603228\"><\/figure>\n<p class=\"wp-block-paragraph\">Now, to estimate seasonality, we take the average of the detrended values that share the same Day Index.<\/p>\n<p class=\"has-text-align-left wp-block-paragraph\">Let\u2019s calculate the seasonality for January 2, 1981. The Day Index for this date is 2, and the other date in our sample with the same index is January 9, 1981. To estimate the seasonal effect for this index, we take the average of the detrended values from both days. This seasonal effect will then be assigned to every date with Index 2 in our cycle.<\/p>\n<p class=\"wp-block-paragraph\">for January 2, 1981: Detrended value = -1.2 and<br \/>for January 9, 1981: Detrended value = 2.1<\/p>\n<p class=\"wp-block-paragraph\">Average of both values = (-1.2 + 2.1)\/2 <br \/>                                    = 0.45<\/p>\n<p class=\"wp-block-paragraph\">So, 0.45 is the estimated seasonality for all dates with Index 2.<br \/>We repeat this process for each index to calculate the full set of seasonality components.<\/p>\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/detrend-2.png?ssl=1\" alt=\"\" class=\"wp-image-603260\"><\/figure>\n<p class=\"wp-block-paragraph\">Here are the values of seasonality for all the dates and these seasonal values reflect the recurring pattern across the week. For example, days with Index 2 tend to be around 0.45<sup>o<\/sup>C warmer than the trend on average, while days with Index 4 tend to be 1.05<sup>o<\/sup>C cooler.<\/p>\n<p class=\"wp-block-paragraph\"><strong>Note<\/strong>: When we say that days with Index 2 tend to be around +0.45\u00b0C warmer than the trend on average, we mean that dates like Jan 2 and Jan 9 tend to be about 0.45\u00b0C above their own trend value, not compared to the overall dataset trend, but to the local trend specific to each day.<\/p>\n<p class=\"wp-block-paragraph\">Now that we\u2019ve calculated the seasonal components for each day, you might notice something interesting: even the dates where the trend (and therefore detrended value) was missing, like the first and last dates in our sample \u2014 still received a seasonality value.<\/p>\n<p class=\"wp-block-paragraph\">This is because seasonality is assigned based on the Day Index, which follows a repeating cycle (like 1 to 7 in our weekly example).<br \/>So, if January 1 has a missing trend but shares the same index as, say, January 8, it inherits the same seasonal effect that was calculated using valid data from that index group.<\/p>\n<p class=\"wp-block-paragraph\">In other words, seasonality doesn\u2019t depend on the availability of trend for that specific day, but rather on the pattern observed across all days with the same position in the cycle.<\/p>\n<p class=\"wp-block-paragraph\">Now we calculate the residual, based on the additive decomposition structure we know that:<br \/>Observed Value = Trend + Seasonality + Residual<br \/>\u2026which means:<br \/>Residual = Observed Value \u2013 Trend \u2013 Seasonality<\/p>\n<p class=\"wp-block-paragraph\">You might be wondering, if the detrended values we used to calculate seasonality already had residuals in them, how can we separate them now? The answer comes from averaging. When we group the detrended values by their seasonal position, like Day Index, the random noise tends to cancel itself out. What we\u2019re left with is the repeating seasonal signal. In small datasets this might not be very noticeable, but in larger datasets, the effect is much more clear. And now, with both trend and seasonality removed, what remains is the residual.<\/p>\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/detrend-3.png?ssl=1\" alt=\"\" class=\"wp-image-603261\"><\/figure>\n<p class=\"wp-block-paragraph\">We can observe that residuals are not calculated for the first and last dates, since the trend wasn\u2019t available there due to the centered moving average.<\/p>\n<p class=\"wp-block-paragraph\">Let\u2019s take a look at the final decomposition table for our 14-day sample. This brings together the observed temperatures, the extracted trend and seasonality components, and the resulting residuals.<\/p>\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/detrend-4.png?ssl=1\" alt=\"\" class=\"wp-image-603262\"><\/figure>\n<p class=\"wp-block-paragraph\">Now that we\u2019ve calculated the trend, seasonality, and residuals for our sample, let\u2019s come back to the missing values we mentioned earlier. If you look at the decomposition plot for the full dataset, titled <strong>\u201cDecomposition of daily temperatures showing trend, seasonal cycles, and random fluctuations\u201d<\/strong>, you\u2019ll notice that the trend line doesn\u2019t appear right at the beginning of the series. The same applies to residuals. This happens because calculating the trend requires enough data before and after each point, so the first few and last few values don\u2019t have a defined trend. That\u2019s also why we see missing residuals at the edges. But in large datasets, these missing values make up only a small portion and don\u2019t affect the overall interpretation. You can still clearly see the trend and patterns over time. In our small 14-day sample, these gaps feel more noticeable, but in real-world time series data, this is completely normal and expected.<\/p>\n<p class=\"wp-block-paragraph\">Now that we\u2019ve understood how seasonal_decompose works, let\u2019s take a quick look at the code we used to apply it to the temperature data and extract the trend and seasonality components.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">import pandas as pd\nimport matplotlib.pyplot as plt\nfrom statsmodels.tsa.seasonal import seasonal_decompose\n\n# Load the dataset\ndf = pd.read_csv(\"minimum daily temperatures data.csv\")\n\n# Convert 'Date' to datetime and set as index\ndf['Date'] = pd.to_datetime(df['Date'], dayfirst=True)\ndf.set_index('Date', inplace=True)\n\n# Set a regular daily frequency and fill missing values using forward fill\ndf = df.asfreq('D')\ndf['Temp'].fillna(method='ffill', inplace=True)\n\n# Decompose the daily series (365-day seasonality for yearly patterns)\ndecomposition = seasonal_decompose(df['Temp'], model='additive', period=365)\n\n# Plot the decomposed components\ndecomposition.plot()\nplt.suptitle('Decomposition of Daily Minimum Temperatures (Daily)', fontsize=14)\nplt.tight_layout()\nplt.show()<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Let\u2019s focus on this part of the code:<\/p>\n<p class=\"wp-block-paragraph\"><code>decomposition = seasonal_decompose(df['Temp'], model='additive', period=365)<\/code><\/p>\n<p class=\"wp-block-paragraph\">In this line, we\u2019re telling the function what data to use (<code>df['Temp']<\/code>), which model to apply (<code>additive<\/code>), and the seasonal period to consider (<code>365<\/code>), which matches the yearly cycle in our daily temperature data.<\/p>\n<p class=\"wp-block-paragraph\">Here, we set <code>period=365<\/code> based on the structure of the data. This means the trend is calculated using a 365-day centered moving average, which takes 182 values before and after each point. The seasonality is calculated using a 365-day seasonal index, where all January 1st values across years are grouped and averaged, all January 2nd values are grouped, and so on.<\/p>\n<p class=\"wp-block-paragraph\">When using <code>seasonal_decompose<\/code> in Python, we simply provide the <code>period<\/code>, and the function uses that value to determine how both the trend and seasonality should be calculated.<\/p>\n<p class=\"wp-block-paragraph\">In our earlier 14-day sample, we used a 3-day centered average just to make the math more understandable \u2014 but the underlying logic remains the same.<\/p>\n<p class=\"wp-block-paragraph\">Now that we\u2019ve explored how <code>seasonal_decompose<\/code> works and understood how it separates a time series into trend, seasonality, and residuals, we\u2019re ready to build a baseline forecasting model.<br \/>This model will be constructed by simply adding the extracted trend and seasonality components, essentially assuming that the residual (or noise) is zero.<\/p>\n<p class=\"wp-block-paragraph\">Once we generate these baseline forecasts, we\u2019ll evaluate how well they perform by comparing them to the actual observed values using MAPE (Mean Absolute Percentage Error).<\/p>\n<p class=\"wp-block-paragraph\">Here, we\u2019re ignoring the residuals because we\u2019re building a simple baseline model that serves as a benchmark. The goal is to test whether more advanced algorithms are truly necessary.<br \/>We\u2019re primarily interested in seeing how much of the variation in the data can be explained using just the trend and seasonality components.<\/p>\n<p class=\"wp-block-paragraph\">Now we\u2019ll build a baseline forecast by extracting the trend and seasonality components using Python\u2019s <code>seasonal_decompose<\/code>.<\/p>\n<p class=\"wp-block-paragraph\"><strong>Code:<\/strong><\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">import pandas as pd\nimport matplotlib.pyplot as plt\nfrom statsmodels.tsa.seasonal import seasonal_decompose\nfrom sklearn.metrics import mean_absolute_percentage_error\n\n# Load the dataset\ndf = pd.read_csv(\"\/minimum daily temperatures data.csv\")\n\n# Convert 'Date' to datetime and set as index\ndf['Date'] = pd.to_datetime(df['Date'], dayfirst=True)\ndf.set_index('Date', inplace=True)\n\n# Set a regular daily frequency and fill missing values using forward fill\ndf = df.asfreq('D')\ndf['Temp'].fillna(method='ffill', inplace=True)\n\n# Split into training (all years except final) and testing (final year)\ntrain = df[df.index.year &lt; df.index.year.max()]\ntest = df[df.index.year == df.index.year.max()]\n\n# Decompose training data only\ndecomposition = seasonal_decompose(train['Temp'], model='additive', period=365)\n\n# Extract components\ntrend = decomposition.trend\nseasonal = decomposition.seasonal\n\n# Use last full year of seasonal values from training to repeat for test\nseasonal_values = seasonal[-365:].values\nseasonal_test = pd.Series(seasonal_values[:len(test)], index=test.index)\n\n# Extend last valid trend value as constant across the test period\ntrend_last = trend.dropna().iloc[-1]\ntrend_test = pd.Series(trend_last, index=test.index)\n\n# Create baseline forecast\nbaseline_forecast = trend_test + seasonal_test\n\n# Evaluate using MAPE\nactual = test['Temp']\nmask = actual &gt; 1e-3  # avoid division errors on near-zero values\nmape = mean_absolute_percentage_error(actual[mask], baseline_forecast[mask])\nprint(f\"MAPE for Baseline Model on Final Year: {mape:.2%}\")\n\n# Plot actual vs. forecast\nplt.figure(figsize=(12, 5))\nplt.plot(actual.index, actual, label='Actual', linewidth=2)\nplt.plot(actual.index, baseline_forecast, label='Baseline Forecast', linestyle='--')\nplt.title('Baseline Forecast vs. Actual (Final Year)')\nplt.xlabel('Date')\nplt.ylabel('Temperature (\u00b0C)')\nplt.legend()\nplt.tight_layout()\nplt.show()\n\n\nMAPE for Baseline Model on Final Year: 21.21%\n<\/code><\/pre>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" height=\"424\" width=\"1024\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/baseline-vs-actual-final-year-1024x424.png?resize=1024%2C424&#038;ssl=1\" alt=\"\" class=\"wp-image-603381\"><\/figure>\n<p class=\"wp-block-paragraph\">In the code above, we first split the data by using the first 9 years as the training set and the final year as the test set.<\/p>\n<p class=\"wp-block-paragraph\">We then applied <code>seasonal_decompose<\/code> to the training data to extract the trend and seasonality components.<\/p>\n<p class=\"wp-block-paragraph\">Since the seasonal pattern repeats every year, we took the last 365 seasonal values and applied them to the test period.<\/p>\n<p class=\"wp-block-paragraph\">For the trend, we assumed it remains constant and used the last observed trend value from the training set across all dates in the test year.<\/p>\n<p class=\"wp-block-paragraph\">Finally, we added the trend and seasonality components to build the baseline forecast, compared it with the actual values from the test set, and evaluated the model using Mean Absolute Percentage Error (MAPE).<\/p>\n<p class=\"wp-block-paragraph\">We got a MAPE of 21.21% with our baseline model. In Part 1, the seasonal naive approach gave us 28.23%, so we\u2019ve improved by about 7%.<\/p>\n<p class=\"wp-block-paragraph\">What we\u2019ve built here is not a custom baseline model \u2014 it\u2019s a <strong>standard decomposition-based baseline<\/strong>. <\/p>\n<p class=\"wp-block-paragraph\">Let\u2019s now see how we can come up with our own custom baseline for this temperature data.<\/p>\n<p class=\"wp-block-paragraph\">Now let\u2019s consider the average of temperatures grouped by each day and using them forecast the temperatures for final year.<\/p>\n<p class=\"wp-block-paragraph\">You might be wondering how we even come up with that idea for a custom baseline in the first place. Honestly, it starts by simply looking at the data. If we can spot a pattern, like a seasonal trend or something that repeats over time, we can build a simple rule around it.<\/p>\n<p class=\"wp-block-paragraph\">That\u2019s really what a custom baseline is about \u2014 using what we understand from the data to make a reasonable prediction. And often, even small, intuitive ideas can work surprisingly well.<\/p>\n<p class=\"wp-block-paragraph\">Now let\u2019s use Python to calculate the average temperature for each day of the year.<\/p>\n<p class=\"wp-block-paragraph\"><strong>Code:<\/strong><\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Create a new column 'day_of_year' representing which day (1 to 365) each date falls on\ntrain[\"day_of_year\"] = train.index.dayofyear\ntest[\"day_of_year\"] = test.index.dayofyear\n\n# Group the training data by 'day_of_year' and calculate the mean temperature for each day (averaged across all years)\ndaily_avg = train.groupby(\"day_of_year\")[\"Temp\"].mean()\n\n# Use the learned seasonal pattern to forecast test data by mapping test days to the corresponding daily average\nday_avg_forecast = test[\"day_of_year\"].map(daily_avg)\n\n# Evaluate the performance of this seasonal baseline forecast using Mean Absolute Percentage Error (MAPE)\nmape_day_avg = mean_absolute_percentage_error(test[\"Temp\"], day_avg_forecast)\nround(mape_day_avg * 100, 2)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">To build this custom baseline, we looked at how the temperature typically behaves on each day of the year, averaging across all the training years. Then, we used those daily averages to make predictions for the test set. It\u2019s a simple way to capture the seasonal pattern that tends to repeat every year.<\/p>\n<p class=\"wp-block-paragraph\">This custom baseline gave us a MAPE of 21.17%, which shows how well it captures the seasonal trend in the data.<\/p>\n<p class=\"wp-block-paragraph\">Now, let\u2019s see if we can build another custom baseline that captures patterns in the data more effectively and serves as a stronger benchmark.<\/p>\n<p class=\"wp-block-paragraph\">Now that we\u2019ve used the day-of-year average method for our first custom baseline, you might start wondering what happens in leap years. If we simply number the days from 1 to 365 and take the average, we could end up misled, especially around February 29.<\/p>\n<p class=\"wp-block-paragraph\">You might be wondering if a single date really matters. In time series analysis, every moment counts. It may not feel that important right now since we\u2019re working with a simple dataset, but in real-world situations, small details like this can have a big impact. Many industries pay close attention to these patterns, and even a one-day difference can affect decisions. That\u2019s why we\u2019re starting with a simple dataset, to help us understand these ideas clearly before applying them to more complex problems.<\/p>\n<p class=\"wp-block-paragraph\">Now let\u2019s build a custom baseline using calendar-day averages by looking at how the temperature usually behaves on each (month, day) across years.<\/p>\n<p class=\"wp-block-paragraph\">It\u2019s a simple way to capture the seasonal rhythm of the year based on the actual calendar.<\/p>\n<p class=\"wp-block-paragraph\"><strong>Code:<\/strong><\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Extract the 'month' and 'day' from the datetime index in both training and test sets\ntrain[\"month\"] = train.index.month\ntrain[\"day\"] = train.index.day\ntest[\"month\"] = test.index.month\ntest[\"day\"] = test.index.day\n\n\n# Group the training data by each (month, day) pair and calculate the average temperature for each calendar day\ncalendar_day_avg = train.groupby([\"month\", \"day\"])[\"Temp\"].mean()\n\n\n# Forecast test values by mapping each test row's (month, day) to the average from training data\ncalendar_day_forecast = test.apply(\n    lambda row: calendar_day_avg.get((row[\"month\"], row[\"day\"]), np.nan), axis=1\n)\n\n# Evaluate the forecast using Mean Absolute Percentage Error (MAPE)\nmape_calendar_day = mean_absolute_percentage_error(test[\"Temp\"], calendar_day_forecast)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Using this method, we achieved a MAPE of 21.09%.<\/p>\n<p class=\"wp-block-paragraph\">Now let\u2019s see if we can combine two methods to build a more refined custom baseline. We have already created a calendar-based month-day average baseline. This time we will blend it with the previous day\u2019s actual temperature. The forecasted value will be based 70 percent on the calendar day average and 30 percent on the previous day\u2019s temperature, creating a more balanced and adaptive prediction.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Create a column with the previous day's temperature \ndf[\"Prev_Temp\"] = df[\"Temp\"].shift(1)\n\n# Add the previous day's temperature to the test set\ntest[\"Prev_Temp\"] = df.loc[test.index, \"Prev_Temp\"]\n\n# Create a blended forecast by combining calendar-day average and previous day's temperature\n# 70% weight to seasonal calendar-day average, 30% to previous day temperature\n\nblended_forecast = 0.7 * calendar_day_forecast.values + 0.3 * test[\"Prev_Temp\"].values\n\n# Handle missing values by replacing NaNs with the average of calendar-day forecasts\nblended_forecast = np.nan_to_num(blended_forecast, nan=np.nanmean(calendar_day_forecast))\n\n# Evaluate the forecast using MAPE\nmape_blended = mean_absolute_percentage_error(test[\"Temp\"], blended_forecast)\n<\/code><\/pre>\n<p class=\"wp-block-paragraph\">We can call this a blended custom baseline model. Using this approach, we achieved a MAPE of 18.73%.<\/p>\n<p class=\"wp-block-paragraph\">Let\u2019s take a moment to summarize what we\u2019ve applied to this dataset so far using a simple table.<\/p>\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/final-table-for-blog-part2-1.png?ssl=1\" alt=\"\" class=\"wp-image-603639\"><\/figure>\n<p class=\"wp-block-paragraph\">In Part 1, we used the seasonal naive method as our baseline. In this blog, we explored how the <code>seasonal_decompose<\/code> function in Python works and built a baseline model by extracting its trend and seasonality components. We then created our first custom baseline using a simple idea based on the day of the year and later improved it by using calendar day averages. Finally, we built a blended custom baseline by combining the calendar average with the previous day\u2019s temperature, which led to even better forecasting results.<\/p>\n<p class=\"wp-block-paragraph\">In this blog, we used a simple daily temperature dataset to understand how <a href=\"https:\/\/towardsdatascience.com\/tag\/custom-baseline-models\/\" title=\"custom baseline models\">custom baseline models<\/a> work. Since it\u2019s a univariate dataset, it contains only a time column and a target variable. However, real-world time series data is often much more complex and typically multivariate, with multiple influencing factors. Before we explore how to build custom baselines for such complex datasets, we need to understand another important decomposition method called STL decomposition. We also need a solid grasp of univariate forecasting models like ARIMA and SARIMA. These models are essential because they form the foundation for understanding and building more advanced multivariate time series models.<\/p>\n<p class=\"wp-block-paragraph\">In Part 1, I mentioned that we would explore the foundations of ARIMA in this part as well. However, as I\u2019m also learning and wanted to keep things focused and digestible, I wasn\u2019t able to fit everything into one blog. To make the learning process smoother, we\u2019ll take it one topic at a time. <\/p>\n<p class=\"wp-block-paragraph\">In Part 3, we\u2019ll explore STL decomposition and continue building on what we\u2019ve learned so far.<\/p>\n<p class=\"wp-block-paragraph\"><strong>Dataset and License<\/strong><br \/>The dataset used in this article \u2014 <em>\u201cDaily Minimum Temperatures in Melbourne\u201d<\/em> \u2014 is available on <a href=\"https:\/\/www.kaggle.com\/datasets\/samfaraday\/daily-minimum-temperatures-in-me\">Kaggle<\/a> and is shared under the <strong>Community Data License Agreement \u2013 Permissive, Version 1.0 (CDLA-Permissive 1.0)<\/strong>.<br \/>This is an open license that permits commercial use with proper attribution. You can read the full license<a href=\"https:\/\/cdla.dev\/permissive-1-0\/\"> here<\/a>.<\/p>\n<p class=\"wp-block-paragraph\">I hope you found this part helpful and easy to follow.<br \/>Thanks for reading and see you in Part 3!<\/p>\n<p>The post <a href=\"https:\/\/towardsdatascience.com\/time-series-forecasting-made-simple-part-2-customizing-baseline-models\/\">Time Series Forecasting Made Simple (Part 2): Customizing Baseline Models<\/a> appeared first on <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a>.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Nikhil Dasari<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/towardsdatascience.com\/time-series-forecasting-made-simple-part-2-customizing-baseline-models\/\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Time Series Forecasting Made Simple (Part 2): Customizing Baseline Models Thank you for the kind response to Part 1, it\u2019s been encouraging to see so many readers interested in time series forecasting. In Part 1 of this series, we broke down time series data into trend, seasonality, and noise, discussed when to use additive versus [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,2633,211,83,157,1615,353],"tags":[2634,84,73],"class_list":["post-3719","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-custom-baseline-models","category-data-analysis","category-data-science","category-python","category-seasonality","category-time-series-forecasting","tag-baseline","tag-data","tag-models"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/3719"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=3719"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/3719\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=3719"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=3719"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=3719"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}