{"id":3873,"date":"2025-05-16T07:02:29","date_gmt":"2025-05-16T07:02:29","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/05\/16\/understanding-random-forest-using-python-scikit-learn\/"},"modified":"2025-05-16T07:02:29","modified_gmt":"2025-05-16T07:02:29","slug":"understanding-random-forest-using-python-scikit-learn","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/05\/16\/understanding-random-forest-using-python-scikit-learn\/","title":{"rendered":"Understanding Random Forest using Python (scikit-learn)"},"content":{"rendered":"<p>    Understanding Random Forest using Python (scikit-learn)<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<p class=\"wp-block-paragraph\"><mdspan datatext=\"el1747270837364\" class=\"mdspan-comment\">Decision<\/mdspan> trees are a popular supervised learning algorithm with benefits that include being able to be used for both regression and classification as well as being easy to interpret. However, decision trees aren\u2019t the most performant algorithm and are prone to overfitting due to small variations in the training data. This can result in a completely different tree. This is why people often turn to ensemble models like Bagged Trees and Random Forests. These consist of multiple decision trees trained on bootstrapped data and aggregated to achieve better predictive performance than any single tree could offer. This tutorial includes the following:\u00a0<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">What is Bagging<\/li>\n<li class=\"wp-block-list-item\">What Makes Random Forests Different<\/li>\n<li class=\"wp-block-list-item\">Training and Tuning a Random Forest using Scikit-Learn<\/li>\n<li class=\"wp-block-list-item\">Calculating and Interpreting Feature Importance<\/li>\n<li class=\"wp-block-list-item\">Visualizing Individual Decision Trees in a Random Forest<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">As always, the code used in this tutorial is available on my <a href=\"https:\/\/github.com\/mGalarnyk\/Python_Tutorials\/blob\/master\/Sklearn\/CART\/Random_Forest\/RandomForestUsingPython.ipynb\">GitHub<\/a>. A <a href=\"https:\/\/youtu.be\/R9tJeEgHyeo\">video version<\/a> of this tutorial is also available on my YouTube channel for those who prefer to follow along visually. With that, let\u2019s get started!<\/p>\n<h2 class=\"wp-block-heading\">What is Bagging (Bootstrap Aggregating)<\/h2>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" height=\"467\" width=\"1024\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/Bagging-1-1024x467.png?resize=1024%2C467&#038;ssl=1\" alt=\"\" class=\"wp-image-604068\"><figcaption class=\"wp-element-caption\"><strong>B<\/strong>ootstrap + <strong>agg<\/strong>rega<strong>ting<\/strong> = Bagging. Image by Michael Galarnyk.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">Random forests can be categorized as bagging algorithms (<strong>b<\/strong>ootstrap <strong>agg<\/strong>regat<strong>ing)<\/strong>. Bagging consists of two steps:<\/p>\n<p class=\"wp-block-paragraph\">1.) Bootstrap sampling: Create multiple training sets by randomly drawing samples with replacement from the original dataset. These new training sets, called bootstrapped datasets, typically contain the same number of rows as the original dataset, but individual rows may appear multiple times or not at all. On average, each bootstrapped dataset contains about 63.2% of the unique rows from the original data. The remaining ~36.8% of rows are left out and can be used for out-of-bag (OOB) evaluation. For more on this concept, see my <a href=\"https:\/\/towardsdatascience.com\/understanding-sampling-with-and-without-replacement-python-7aff8f47ebe4\/\">sampling with and without replacement blog post<\/a>.<\/p>\n<p class=\"wp-block-paragraph\">2.) Aggregating predictions: Each bootstrapped dataset is used to train a different decision tree model. The final prediction is made by combining the outputs of all individual trees. For classification, this is typically done through majority voting. For regression, predictions are averaged.<\/p>\n<p class=\"wp-block-paragraph\">Training each tree on a different bootstrapped sample introduces variation across trees. While this doesn\u2019t fully eliminate correlation\u2014especially when certain features dominate\u2014it helps reduce overfitting when combined with aggregation. Averaging the predictions of many such trees reduces the overall <strong>variance<\/strong> of the ensemble, improving generalization.<\/p>\n<h2 class=\"wp-block-heading\">What Makes Random Forests Different<\/h2>\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/AD_4nXdxSzrJ8EbM32JIq4zBNQK2SAIpsB3p_hv6sJxyq6XaCKt6B7pz_4XpiPWiwsTzQh4iwUQZZSeU6b4svfHSZ2IwjgwPCdQ0pfqS4ojSGUnRuiD8TIV3W4Lk2dHMyVFTxh4QCjRztA.png?ssl=1\" alt=\"\" class=\"wp-image-604088\"><figcaption class=\"wp-element-caption\">In contrast to some other bagged trees algorithms, for each decision tree in random forests, only a subset of features is randomly selected at each decision node and the best split feature from the subset is used. Image by Michael Galarnyk.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">Suppose there\u2019s a single strong feature in your dataset. In <a href=\"https:\/\/youtu.be\/urb2wRxnGz4?si=voTNstvcYQMLdlNJ\">bagged trees<\/a>, each tree may repeatedly split on that feature, leading to correlated trees and less benefit from aggregation. Random Forests reduce this issue by introducing further randomness. Specifically, they change how splits are selected during training:<\/p>\n<p class=\"wp-block-paragraph\">1). Create N bootstrapped datasets. Note that while bootstrapping is commonly used in Random Forests, it is not strictly necessary because step 2 (random feature selection) introduces sufficient diversity among the trees.<\/p>\n<p class=\"wp-block-paragraph\">2). For each tree, at each node, a random subset of features is selected as candidates, and the best split is chosen from that subset. In scikit-learn, this is controlled by the <code>max_features<\/code> parameter, which defaults to <code>'sqrt'<\/code> for classifiers and <code>1<\/code> for regressors (equivalent to bagged trees).<\/p>\n<p class=\"wp-block-paragraph\">3). Aggregating predictions: vote for classification and average for regression.<\/p>\n<p class=\"wp-block-paragraph\">Note: Random Forests use <a href=\"https:\/\/towardsdatascience.com\/understanding-sampling-with-and-without-replacement-python-7aff8f47ebe4\/\">sampling with replacement for bootstrapped datasets and sampling without replacement<\/a> for selecting a subset of features.\u00a0<\/p>\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/AD_4nXeJs7z-C1yKyMmpFLDOoA3O8Ac0CnoKVDq0WqRy0db2vMGKGSXhcRhXnFoW33Obn3e_lsaO_moUGCmlZpkNPqsLi5k_ffjkfifB026LEavIoP60kci8rzaeIh6n7Su5zvMXZ-qO2-2zdpBnoFjSGugRob1r.png?ssl=1\" alt=\"\" class=\"wp-image-604090\"><figcaption class=\"wp-element-caption\">Sampling with replacement procedure. Image by Michael Galarnyk<\/figcaption><\/figure>\n<h3 class=\"wp-block-heading\">Out-of-Bag (OOB) Score<\/h3>\n<p class=\"wp-block-paragraph\">Because ~36.8% of training data is excluded from any given tree, you can use this holdout portion to evaluate that tree\u2019s predictions. Scikit-learn allows this via the oob_score=True parameter, providing an efficient way to estimate generalization error. You\u2019ll see this parameter used in the training example later in the tutorial.<\/p>\n<h2 class=\"wp-block-heading\">Training and Tuning a Random Forest in Scikit-Learn<\/h2>\n<p class=\"wp-block-paragraph\">Random Forests remain a strong baseline for tabular data thanks to their simplicity, interpretability, and ability to <a href=\"https:\/\/www.anyscale.com\/blog\/how-to-speed-up-scikit-learn-model-training\">parallelize<\/a> since each tree is trained independently. This section demonstrates how to load data, <a href=\"https:\/\/youtu.be\/rCevxk3jeKs?si=SCzxap0-l3vBSrvM\">perform a train test split<\/a>, train a baseline model, tune hyperparameters using grid search, and evaluate the final model on the test set.<\/p>\n<h3 class=\"wp-block-heading\">Step 1: Train a Baseline Model<\/h3>\n<p class=\"wp-block-paragraph\">Before tuning, it\u2019s good practice to train a baseline model using reasonable defaults. This gives you an initial sense of performance and lets you validate generalization using the out-of-bag (OOB) score, which is built into bagging-based models like Random Forests. This example uses the House Sales in King County dataset (CCO 1.0 Universal License), which contains property sales from the Seattle area between May 2014 and May 2015. This approach allows us to reserve the test set for final evaluation after tuning.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-&lt;a href=\" https: title=\"Python\">Python\"&gt;# Import libraries\n\n# Some imports are only used later in the tutorial\nimport matplotlib.pyplot as plt\n\nimport numpy as np\n\nimport pandas as pd\n\n# Dataset: Breast Cancer Wisconsin (Diagnostic)\n# Source: UCI Machine Learning Repository\n# License: CC BY 4.0\nfrom sklearn.datasets import load_breast_cancer\n\nfrom sklearn.ensemble import RandomForestClassifier\n\nfrom sklearn.ensemble import RandomForestRegressor\n\nfrom sklearn.inspection import permutation_importance\n\nfrom sklearn.model_selection import GridSearchCV, train_test_split\n\nfrom sklearn import tree\n\n# Load dataset\n# Dataset: House Sales in King County (May 2014\u2013May 2015)\n# License CC0 1.0 Universal\nurl = 'https:\/\/raw.githubusercontent.com\/mGalarnyk\/Tutorial_Data\/master\/King_County\/kingCountyHouseData.csv'\n\ndf = pd.read_csv(url)\n\ncolumns = ['bedrooms',\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0'bathrooms',\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0'sqft_living',\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0'sqft_lot',\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0'floors',\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0'waterfront',\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0'view',\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0'condition',\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0'grade',\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0'sqft_above',\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0'sqft_basement',\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0'yr_built',\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0'yr_renovated',\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0'lat',\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0'long',\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0'sqft_living15',\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0'sqft_lot15',\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0'price']\n\ndf = df[columns]\n\n# Define features and target\n\nX = df.drop(columns='price')\n\ny = df['price']\n\n# Train\/test split\n\nX_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)\n\n# Train baseline <a href=\"https:\/\/towardsdatascience.com\/tag\/random-forest\/\" title=\"Random Forest\">Random Forest<\/a>\n\nreg = RandomForestRegressor(\n\n\u00a0\u00a0\u00a0\u00a0n_estimators=100,\u00a0 \u00a0 \u00a0 \u00a0 # number of trees\n\n\u00a0\u00a0\u00a0\u00a0max_features=1\/3,\u00a0 \u00a0 \u00a0 \u00a0 # fraction of features considered at each split\n\n\u00a0\u00a0\u00a0\u00a0oob_score=True,\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 # enables out-of-bag evaluation\n\n\u00a0\u00a0\u00a0\u00a0random_state=0\n\n)\n\nreg.fit(X_train, y_train)\n\n# Evaluate baseline performance using OOB score\n\nprint(f\"Baseline OOB score: {reg.oob_score_:.3f}\")<\/code><\/pre>\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/AD_4nXeu5QApkjaUIPs5Y2N-vC2-rdPTNz0zLSNamD3MW39ZnlIuChnGgp2_JAOiTIgaIf2-zeZAmtFSZjy-mQCkuGFOuj5wGCOQWdr2PEo-UH9pXztZPMSObGrnqtOrtCdzaClTIEvWgA.png?ssl=1\" alt=\"\" class=\"wp-image-604081\"><\/figure>\n<h3 class=\"wp-block-heading\">Step 2: Tune Hyperparameters with Grid Search<\/h3>\n<p class=\"wp-block-paragraph\">While the baseline model gives a strong starting point, performance can often be improved by tuning key hyperparameters. Grid search cross-validation, as implemented by <code>GridSearchCV<\/code>, systematically explores combinations of hyperparameters and uses cross-validation to evaluate each one, selecting the configuration with the highest validation performance.The most commonly tuned hyperparameters include:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<code>n_estimators<\/code>: The number of decision trees in the forest. More trees can improve accuracy but increase training time.<\/li>\n<li class=\"wp-block-list-item\">\n<code>max_features<\/code>: The number of features to consider when looking for the best split. Lower values reduce correlation between trees.<\/li>\n<li class=\"wp-block-list-item\">\n<code>max_depth<\/code>: The maximum depth of each tree. Shallower trees are faster but may underfit.<\/li>\n<li class=\"wp-block-list-item\">\n<code>min_samples_split<\/code>: The minimum number of samples required to split an internal node. Higher values can reduce overfitting.<\/li>\n<li class=\"wp-block-list-item\">\n<code>min_samples_leaf<\/code>: The minimum number of samples required to be at a leaf node. Helps control tree size.<\/li>\n<li class=\"wp-block-list-item\">\n<code>bootstrap<\/code>: Whether bootstrap samples are used when building trees. If False, the whole dataset is used.<\/li>\n<\/ul>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">param_grid = {\n\n\u00a0\u00a0\u00a0\u00a0'n_estimators': [100],\n\n\u00a0\u00a0\u00a0\u00a0'max_features': ['sqrt', 'log2', None],\n\n\u00a0\u00a0\u00a0\u00a0'max_depth': [None, 5, 10, 20],\n\n\u00a0\u00a0\u00a0\u00a0'min_samples_split': [2, 5],\n\n\u00a0\u00a0\u00a0\u00a0'min_samples_leaf': [1, 2]\n\n}\n\n# Initialize model\n\nrf = RandomForestRegressor(random_state=0, oob_score=True)\n\ngrid_search = GridSearchCV(\n\n\u00a0\u00a0\u00a0\u00a0estimator=rf,\n\n\u00a0\u00a0\u00a0\u00a0param_grid=param_grid,\n\n\u00a0\u00a0\u00a0\u00a0cv=5, \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 # 5-fold cross-validation\n\n\u00a0\u00a0\u00a0\u00a0scoring='r2', \u00a0 \u00a0 # evaluation metric\n\n\u00a0\u00a0\u00a0\u00a0n_jobs=-1 \u00a0 \u00a0 \u00a0 \u00a0 # use all available CPU cores\n\n)\n\ngrid_search.fit(X_train, y_train)\n\nprint(f\"Best parameters: {grid_search.best_params_}\")\n\nprint(f\"Best R^2 score: {grid_search.best_score_:.3f}\")<\/code><\/pre>\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/AD_4nXeT2kvJd2Q7J-uuTQVChf2dfS0XbfScKqsDbx2RGmcSpIZUGpqdFhyh3MOWh_ymGALMo8VJs0vjlLvKz54S9EQOlYAb6HHBYg7b7qoyv3WFEcMkutWmkZcJl3DKClEzZdeFmHCXlw.png?ssl=1\" alt=\"\" class=\"wp-image-604082\"><\/figure>\n<h3 class=\"wp-block-heading\">Step 3: Evaluate Final Model on Test Set<\/h3>\n<p class=\"wp-block-paragraph\">Now that we\u2019ve selected the best-performing model based on cross-validation, we can evaluate it on the held-out test set to estimate its generalization performance.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Evaluate final model on test set\n\nbest_model = grid_search.best_estimator_\n\nprint(f\"Test R^2 score (final model): {best_model.score(X_test, y_test):.3f}\")<\/code><\/pre>\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/AD_4nXdk4jzZD6S1_6CN05h6ZZF3zTbFqbsf4p2cU0eJRojCsWdZyAREvMyy9nReNOBWO-y04fmWNr0pn__1GAIAwVMA-5rUnBrFrR4j2U_n-a1aWrobhYTNoXiqHSECDUd5-JW1ODQxvw.png?ssl=1\" alt=\"\" class=\"wp-image-604080\"><\/figure>\n<h2 class=\"wp-block-heading\">Calculating Random Forest Feature Importance<\/h2>\n<p class=\"wp-block-paragraph\">One of the key advantages of Random Forests is their interpretability \u2014 something that large language models (LLMs) often lack. While LLMs are powerful, they typically function as black boxes and can <a href=\"https:\/\/youtu.be\/2v18R02mq8I?si=oeJadtZT3ytFmTE8\">exhibit biases that are difficult to identify<\/a>. In contrast, scikit-learn supports two main methods for measuring feature importance in Random Forests: Mean Decrease in Impurity and Permutation Importance.<\/p>\n<p class=\"wp-block-paragraph\">1). Mean Decrease in Impurity (MDI): Also known as Gini importance, this method calculates the total reduction in impurity brought by each feature across all trees. This is fast and built into the model via <code>reg.feature_importances_<\/code>. However, impurity-based feature importances can be misleading, especially for features with high cardinality (many unique values), as these features are more likely to be chosen simply because they provide more potential split points.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">importances = reg.feature_importances_\n\nfeature_names = X.columns\n\nsorted_idx = np.argsort(importances)[::-1]\n\nfor i in sorted_idx:\n\n\u00a0\u00a0\u00a0\u00a0print(f\"{feature_names[i]}: {importances[i]:.3f}\")<\/code><\/pre>\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/AD_4nXcBgUsFONPKRmPEk7yvNJYTC7NAo-UbcMKBdONU1gXN3prQ02toUp-v5oujpWNtg-x_LRfkZ9wFR-TlP8spyOBlZ6N6mL-YnPe6R4aoOH8fHaGrai9WHVkrbWBU62kpwpSJhG0jkg.png?ssl=1\" alt=\"\" class=\"wp-image-604083\"><\/figure>\n<p class=\"wp-block-paragraph\">2). Permutation Importance: This method assesses the decrease in model performance when a single feature\u2019s values are randomly shuffled. Unlike MDI, it accounts for feature interactions and correlation. It is more reliable but also more computationally expensive.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Perform permutation importance on the test set\n\nperm_importance = permutation_importance(reg, X_test, y_test, n_repeats=10, random_state=0)\n\nsorted_idx = perm_importance.importances_mean.argsort()[::-1]\n\nfor i in sorted_idx:\n\n\u00a0\u00a0\u00a0\u00a0print(f\"{X.columns[i]}: {perm_importance.importances_mean[i]:.3f}\")<\/code><\/pre>\n<p class=\"wp-block-paragraph\"><img fetchpriority=\"high\" decoding=\"async\" width=\"214\" height=\"376\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXcgRkkAgBLSiSTRFuX8q0TlMe1I5SsCWSCfRMkXiAA_zlwqT6Okf5wRCknY1M4ps4v-kNDXl0_FgkVd4egBc93nec58VBEN1s0cCFVG5D0FYiIFbEFgVu7xOnqaBNdML7sHiyD0?key=MJNz7w8xYykgqioGAsk1Pg\"><\/p>\n<p class=\"wp-block-paragraph\">It is important to note that our geographic features lat and long are also useful for visualization as the plot below shows. It\u2019s likely that companies like Zillow leverage location information extensively in their valuation models.<\/p>\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/AD_4nXc-_FsaMfuDfXO9-KqJM6Pfm-zpZOOz_fw79YsgPDb10gwG8b3mYYVieN-6c4agITiaNkaatp1TUrQIPsyOyuR5kf9KHULeAZAD0pgg8Ae2oJ4K16Vw79F9CscBTDcqF7r7Rmr.png?ssl=1\" alt=\"\" class=\"wp-image-604089\"><figcaption class=\"wp-element-caption\">Housing Price percentile for King County. Image by Michael Galarnyk.<\/figcaption><\/figure>\n<h2 class=\"wp-block-heading\">Visualizing Individual Decision Trees in a Random Forest<\/h2>\n<p class=\"wp-block-paragraph\">A Random Forest consists of multiple decision trees\u2014one for each estimator specified via the <code>n_estimators <\/code>parameter. After training the model, you can access these individual trees through the .estimators_ attribute. Visualizing a few of these trees can help illustrate how differently each one splits the data due to bootstrapped training samples and random feature selection at each split. While the earlier example used a RandomForestRegressor, here we demonstrate this visualization using a RandomForestClassifier trained on the Breast Cancer Wisconsin dataset (CC BY 4.0 license) to highlight Random Forests\u2019 versatility for both regression and classification tasks. <a href=\"https:\/\/www.youtube.com\/embed\/X8UeOrsUKQ4\">This short video<\/a> demonstrates what 100 trained estimators from this dataset look like.<\/p>\n<h3 class=\"wp-block-heading\">Fit a Random Forest Model using Scikit-Learn<\/h3>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Load the Breast Cancer (Diagnostic) Dataset\n\ndata = load_breast_cancer()\n\ndf = pd.DataFrame(data.data, columns=data.feature_names)\n\ndf['target'] = data.target\n\n# Arrange Data into Features Matrix and Target Vector\n\nX = df.loc[:, df.columns != 'target']\n\ny = df.loc[:, 'target'].values\n\n# Split the data into training and testing sets\n\nX_train, X_test, Y_train, Y_test = train_test_split(X, y, random_state=0)\n\n# Random Forests in `scikit-learn` (with N = 100)\n\nrf = RandomForestClassifier(n_estimators=100,\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0random_state=0)\n\nrf.fit(X_train, Y_train)<\/code><\/pre>\n<h3 class=\"wp-block-heading\">Plotting Individual Estimators (decision trees) from a Random Forest using Matplotlib<\/h3>\n<p class=\"wp-block-paragraph\">You can now view all the individual trees from the fitted model.\u00a0<\/p>\n<p class=\"wp-block-paragraph\"><code>rf.estimators_<\/code><\/p>\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/AD_4nXe_h1wnJBIjZjFmZ6kRzXsjF24Pn38jzEq2DzUZakIZeAUJpoWhiHzI10rf6dWYCBIDQCPG6qbaQXxD9RyQOy_z4i9djyr7KeCQug0fvCeQADgHDxh5quO3cAeNu0iPPLPncd3srQ.png?ssl=1\" alt=\"\" class=\"wp-image-604085\"><\/figure>\n<p class=\"wp-block-paragraph\">You can now visualize individual trees. The code below visualizes the first decision tree.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">fn=data.feature_names\n\ncn=data.target_names\n\nfig, axes = plt.subplots(nrows = 1,ncols = 1,figsize = (4,4), dpi=800)\n\ntree.plot_tree(rf.estimators_[0],\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0feature_names = fn,\u00a0\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0class_names=cn,\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0filled = True);\n\nfig.savefig('rf_individualtree.png')<\/code><\/pre>\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/AD_4nXfw0lZAueYuvVSRYfB4PnIF8zrOM4lYuNOfV5JyCVABwerUNrbJReCEfg_QQfDfQO1YQgUnDPwKKHFt0YRkbrT3LxH4eZi5LnPtSYHJAw-_NW1E4ovQH4WZsADs8-f6KzWQlmg9ng.png?ssl=1\" alt=\"\" class=\"wp-image-604086\"><\/figure>\n<p class=\"wp-block-paragraph\">Although plotting many trees can be difficult to interpret, you may wish to explore the variety across estimators. The following example shows how to visualize the first five decision trees in the forest:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># This may not the best way to view each estimator as it is small\n\nfig, axes = plt.subplots(nrows=1, ncols=5, figsize=(10, 2), dpi=3000)\n\nfor index in range(5):\n\n\u00a0\u00a0\u00a0\u00a0tree.plot_tree(rf.estimators_[index],\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0feature_names=fn,\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0class_names=cn,\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0filled=True,\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0ax=axes[index])\n\n\u00a0\u00a0\u00a0\u00a0axes[index].set_title(f'Estimator: {index}', fontsize=11)\n\nfig.savefig('rf_5trees.png')<\/code><\/pre>\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/05\/AD_4nXfVbyCIL1rejZJJCN9Wt_I697LyTTCRHqIJ102GLq0hnQlL7B-QEprQPTP4-jvfWtMYMvnjsiMFIRdxyHWjjxlPEyyM90FW19OyFSZGjwfq1pScUbaPzaAn-NRm9onGWfI6QHeHoA.png?ssl=1\" alt=\"\" class=\"wp-image-604087\"><\/figure>\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n<p class=\"wp-block-paragraph\">Random forests consist of multiple decision trees trained on bootstrapped data in order to achieve better predictive performance than could be obtained from any of the individual decision trees. If you have questions or thoughts on the tutorial, feel free to reach out through <a href=\"https:\/\/youtu.be\/R9tJeEgHyeo?si=_TD53gsapwTk3VLk\">YouTube<\/a> or <a href=\"https:\/\/twitter.com\/GalarnykMichael\">X<\/a>.<\/p>\n<p>The post <a href=\"https:\/\/towardsdatascience.com\/understanding-random-forest-using-python-scikit-learn\/\">Understanding Random Forest using Python (scikit-learn)<\/a> appeared first on <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a>.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Michael Galarnyk<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/towardsdatascience.com\/understanding-random-forest-using-python-scikit-learn\/\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Understanding Random Forest using Python (scikit-learn) Decision trees are a popular supervised learning algorithm with benefits that include being able to be used for both regression and classification as well as being easy to interpret. However, decision trees aren\u2019t the most performant algorithm and are prone to overfitting due to small variations in the training [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,83,82,67,70,157,604],"tags":[480,902,1211],"class_list":["post-3873","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-data-science","category-data-visualization","category-deep-dives","category-machine-learning","category-python","category-random-forest","tag-decision","tag-random","tag-trees"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/3873"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=3873"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/3873\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=3873"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=3873"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=3873"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}