{"id":1477,"date":"2025-01-28T07:03:22","date_gmt":"2025-01-28T07:03:22","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/01\/28\/build-a-decision-tree-in-polars-from-scratch-d48892926ecf\/"},"modified":"2025-01-28T07:03:22","modified_gmt":"2025-01-28T07:03:22","slug":"build-a-decision-tree-in-polars-from-scratch-d48892926ecf","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/01\/28\/build-a-decision-tree-in-polars-from-scratch-d48892926ecf\/","title":{"rendered":"Build a Decision Tree in Polars from Scratch"},"content":{"rendered":"<p>    Build a Decision Tree in Polars from Scratch<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<h4>Explore decision trees with polars\u00a0backend<\/h4>\n<figure><img decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1024\/0*x7Y4ENTz6DUj9ozZ\"><figcaption>Photo by <a href=\"https:\/\/unsplash.com\/@leolaub?utm_source=medium&amp;utm_medium=referral\">Leonard Laub<\/a> on\u00a0<a href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\">Unsplash<\/a><\/figcaption><\/figure>\n<p>Decision tree algorithms have always fascinated me. They are easy to implement and achieve good results on various classification and regression tasks. Combined with boosting, decision trees are still state-of-the-art in many applications.<\/p>\n<p>Frameworks such as sklearn, lightgbm, xgboost and catboost have done a very good job until today. However, in the past few months, I have been missing support for arrow datasets. While lightgbm has recently added support for that, it is still missing in most other frameworks. The arrow data format could be a perfect match for decision trees since it has a columnar structure optimized for efficient data processing. Pandas already added support for that and also polars uses the advantages.<\/p>\n<p>Polars has shown some significant performance advantages over most other data frameworks. It uses the data efficiently and avoids copying the data unnecessarily. It also provides a streaming engine that allows the processing of larger data than memory. This is why I decided to use polars as a backend for building a decision tree from\u00a0scratch.<\/p>\n<p>The goal is to explore the advantages of using polars for decision trees in terms of memory and runtime. And, of course, learning more about polars, efficiently defining expressions, and the streaming engine.<\/p>\n<p>The code for the implementation can be found in this <a href=\"https:\/\/github.com\/tocab\/efficient-trees\">repository<\/a>.<\/p>\n<h4>Code overview<\/h4>\n<p>To get a first overview of the code, I will show the structure of the DecisionTreeClassifier first:<\/p>\n<pre>import pickle<br>from typing import Iterable, List, Union<br><br>import polars as pl<br><br><br>class DecisionTreeClassifier:<br><br>    def __init__(self, streaming=False, max_depth=None, categorical_columns=None):<br>        ...<br><br>    def save_model(self, path: str) -&gt; None:<br>        ...<br><br>    def load_model(self, path: str) -&gt; None:<br>        ...<br><br>    def apply_categorical_mappings(self, data: Union[pl.DataFrame, pl.LazyFrame]) -&gt; Union[pl.DataFrame, pl.LazyFrame]:<br>        ...<br><br>    def fit(self, data: Union[pl.DataFrame, pl.LazyFrame], target_name: str) -&gt; None:<br>        ...<br><br>    def predict_many(self, data: Union[pl.DataFrame, pl.LazyFrame]) -&gt; List[Union[int, float]]:<br>        ...<br><br>    def predict(self, data: Iterable[dict]):<br>        ...<br><br>    def get_majority_class(self, df: Union[pl.DataFrame, pl.LazyFrame], target_name: str) -&gt; str:<br>        ...<br><br>    def _build_tree(<br>        self,<br>        data: Union[pl.DataFrame, pl.LazyFrame],<br>        feature_names: list[str],<br>        target_name: str,<br>        unique_targets: list[int],<br>        depth: int,<br>    ) -&gt; dict:<br>        ...<\/pre>\n<p>The first important thing can be seen in the imports. It was important for me to keep the import section clean and with as few dependencies as possible. This was successful with only having dependencies to polars, pickle, and\u00a0typing.<\/p>\n<p>The init method allows to define if the polars streaming engine should be used. Also, the max_depth of the tree can be set here. Another feature in the definition of categorical columns. These are handled in a different way than numerical features using a target encoding.<\/p>\n<p>It is possible to save and load the decision tree model. It is represented as a nested dict and can be saved to disk as a pickled\u00a0file.<\/p>\n<p>The polars magic happens in the fit() and build_tree() methods. These accept both LazyFrames and DataFrames to have support for in-memory processing and streaming.<\/p>\n<p>There are two prediction methods available, predict() and predict_many(). The predict() method can be used on a small example size, and the data needs to be provided as a dict. If we have a big test set, it is more efficient to use the predict_many() method. Here, the data can be provided as a polars DataFrame or LazyFrame.<\/p>\n<h3>Fitting the\u00a0tree<\/h3>\n<p>To train the decision tree classifier, the fit() method needs to be\u00a0used.<\/p>\n<pre>def fit(self, data: Union[pl.DataFrame, pl.LazyFrame], target_name: str) -&gt; None:<br>    \"\"\"<br>    Fit method to train the decision tree.<br><br>    :param data: Polars DataFrame or LazyFrame containing the training data.<br>    :param target_name: Name of the target column<br>    \"\"\"<br>    columns = data.collect_schema().names()<br>    feature_names = [col for col in columns if col != target_name]<br><br>    # Shrink dtypes<br>    data = data.select(pl.all().shrink_dtype()).with_columns(<br>        pl.col(target_name).cast(pl.UInt64).shrink_dtype().alias(target_name)<br>    )<br><br>    # Prepare categorical columns with target encoding<br>    if self.categorical_columns:<br>        categorical_mappings = {}<br>        for categorical_column in self.categorical_columns:<br>            categorical_mappings[categorical_column] = {<br>                value: index<br>                for index, value in enumerate(<br>                    data.lazy()<br>                    .group_by(categorical_column)<br>                    .agg(pl.col(target_name).mean().alias(\"avg\"))<br>                    .sort(\"avg\")<br>                    .collect(streaming=self.streaming)[categorical_column]<br>                )<br>            }<br><br>        self.categorical_mappings = categorical_mappings<br>        data = self.apply_categorical_mappings(data)<br><br>    unique_targets = data.select(target_name).unique()<br>    if isinstance(unique_targets, pl.LazyFrame):<br>        unique_targets = unique_targets.collect(streaming=self.streaming)<br>    unique_targets = unique_targets[target_name].to_list()<br><br>    self.tree = self._build_tree(data, feature_names, target_name, unique_targets, depth=0)<\/pre>\n<p>It receives a polars LazyFrame or DataFrame that contains all features and the target column. To identify the target column, the target_name needs to be provided.<\/p>\n<p>Polars provides a convenient way to optimize the memory usage of the\u00a0data.<\/p>\n<pre>data.select(pl.all().shrink_dtype())<\/pre>\n<p>With that, all columns are selected and evaluated. It will convert the dtype to the smallest possible\u00a0value.<\/p>\n<h4>The categorical encoding<\/h4>\n<p>To encode categorical values, a target encoding is used. For that, all instances of a categorical feature will be aggregated, and the average target value will be calculated. Then, the instances are sorted by the average target value, and a rank is assigned. This rank will be used as the representation of the feature\u00a0value.<\/p>\n<pre>(<br>  data.lazy()<br>    .group_by(categorical_column)<br>    .agg(pl.col(target_name).mean().alias(\"avg\"))<br>    .sort(\"avg\")<br>    .collect(streaming=self.streaming)[categorical_column]<br>)<\/pre>\n<p>Since it is possible to provide polars DataFrames and LazyFrames, I use data.lazy() first. If the given data is a DataFrame, it will be converted to a LazyFrame. If it is already a LazyFrame, it only returns self. With that trick, it is possible to ensure that the data is processed in the same way for LazyFrames and DataFrames and that the collect() method can be used, which is only available for LazyFrames.<\/p>\n<p>To illustrate the outcome of the calculations in the different steps of the fitting process, I apply it to a dataset for heart disease prediction. It can be found on <a href=\"https:\/\/www.kaggle.com\/datasets\/colewelkins\/cardiovascular-disease\">Kaggle<\/a> and is published under the Database Contents\u00a0License.<\/p>\n<p>Here is an example of the categorical feature representation for the glucose\u00a0levels:<\/p>\n<pre>\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510<br>\u2502 rank \u2506 gluc \u2506 avg      \u2502<br>\u2502 ---  \u2506 ---  \u2506 ---      \u2502<br>\u2502 u32  \u2506 i8   \u2506 f64      \u2502<br>\u255e\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2561<br>\u2502 0    \u2506 1    \u2506 0.476139 \u2502<br>\u2502 1    \u2506 2    \u2506 0.586319 \u2502<br>\u2502 2    \u2506 3    \u2506 0.620972 \u2502<br>\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518<\/pre>\n<p>For each of the glucose levels, the probability of having a heart disease is calculated. This is sorted and then ranked so that each of the levels is mapped to a rank\u00a0value.<\/p>\n<h4>Getting the target\u00a0values<\/h4>\n<p>As the last part of the fit() method, the unique target values are determined.<\/p>\n<pre>unique_targets = data.select(target_name).unique()<br>if isinstance(unique_targets, pl.LazyFrame):<br>    unique_targets = unique_targets.collect(streaming=self.streaming)<br>unique_targets = unique_targets[target_name].to_list()<br><br>self.tree = self._build_tree(data, feature_names, target_name, unique_targets, depth=0)<\/pre>\n<p>This serves as the last preparation before calling the _build_tree() method recursively.<\/p>\n<h3>Building the\u00a0tree<\/h3>\n<p>After the data is prepared in the fit() method, the _build_tree() method is called. This is done recursively until a stopping criterion is met, e.g., the max depth of the tree is reached. The first call is executed from the fit() method with a depth of\u00a0zero.<\/p>\n<pre>def _build_tree(<br>        self,<br>        data: Union[pl.DataFrame, pl.LazyFrame],<br>        feature_names: list[str],<br>        target_name: str,<br>        unique_targets: list[int],<br>        depth: int,<br>) -&gt; dict:<br>    \"\"\"<br>    Builds the decision tree recursively.<br>    If max_depth is reached, returns a leaf node with the majority class.<br>    Otherwise, finds the best split and creates internal nodes for left and right children.<br><br>    :param data: The dataframe to evaluate.<br>    :param feature_names: Name of the feature columns.<br>    :param target_name: Name of the target column.<br>    :param unique_targets: unique target values.<br>    :param depth: The current depth of the tree.<br><br>    :return: A dictionary representing the node.<br>    \"\"\"<br>    if self.max_depth is not None and depth &gt;= self.max_depth:<br>        return {\"type\": \"leaf\", \"value\": self.get_majority_class(data, target_name)}<br><br>    # Make data lazy here to avoid that it is evaluated in each loop iteration.<br>    data = data.lazy()<br><br>    # Evaluate entropy per feature:<br>    information_gain_dfs = []<br>    for feature_name in feature_names:<br>        feature_data = data.select([feature_name, target_name]).filter(pl.col(feature_name).is_not_null())<br>        feature_data = feature_data.rename({feature_name: \"feature_value\"})<br><br>        # No streaming (yet)<br>        information_gain_df = (<br>            feature_data.group_by(\"feature_value\")<br>            .agg(<br>                [<br>                    pl.col(target_name)<br>                .filter(pl.col(target_name) == target_value)<br>                .len()<br>                .alias(f\"class_{target_value}_count\")<br>                    for target_value in unique_targets<br>                ]<br>                + [pl.col(target_name).len().alias(\"count_examples\")]<br>            )<br>            .sort(\"feature_value\")<br>            .select(<br>                [<br>                    pl.col(f\"class_{target_value}_count\").cum_sum().alias(f\"cum_sum_class_{target_value}_count\")<br>                    for target_value in unique_targets<br>                ]<br>                + [<br>                    pl.col(f\"class_{target_value}_count\").sum().alias(f\"sum_class_{target_value}_count\")<br>                    for target_value in unique_targets<br>                ]<br>                + [<br>                    pl.col(\"count_examples\").cum_sum().alias(\"cum_sum_count_examples\"),<br>                    pl.col(\"count_examples\").sum().alias(\"sum_count_examples\"),<br>                ]<br>                + [<br>                    # From previous select<br>                    pl.col(\"feature_value\"),<br>                ]<br>            )<br>            .filter(<br>                # At least one example available<br>                pl.col(\"sum_count_examples\")<br>                &gt; pl.col(\"cum_sum_count_examples\")<br>            )<br>            .select(<br>                [<br>                    (pl.col(f\"cum_sum_class_{target_value}_count\") \/ pl.col(\"cum_sum_count_examples\")).alias(<br>                        f\"left_proportion_class_{target_value}\"<br>                    )<br>                    for target_value in unique_targets<br>                ]<br>                + [<br>                    (<br>                            (pl.col(f\"sum_class_{target_value}_count\") - pl.col(f\"cum_sum_class_{target_value}_count\"))<br>                            \/ (pl.col(\"sum_count_examples\") - pl.col(\"cum_sum_count_examples\"))<br>                    ).alias(f\"right_proportion_class_{target_value}\")<br>                    for target_value in unique_targets<br>                ]<br>                + [<br>                    (pl.col(f\"sum_class_{target_value}_count\") \/ pl.col(\"sum_count_examples\")).alias(<br>                        f\"parent_proportion_class_{target_value}\"<br>                    )<br>                    for target_value in unique_targets<br>                ]<br>                + [<br>                    # From previous select<br>                    pl.col(\"cum_sum_count_examples\"),<br>                    pl.col(\"sum_count_examples\"),<br>                    pl.col(\"feature_value\"),<br>                ]<br>            )<br>            .select(<br>                (<br>                        -1<br>                        * pl.sum_horizontal(<br>                    [<br>                        (<br>                                pl.col(f\"left_proportion_class_{target_value}\")<br>                                * pl.col(f\"left_proportion_class_{target_value}\").log(base=2)<br>                        ).fill_nan(0.0)<br>                        for target_value in unique_targets<br>                    ]<br>                )<br>                ).alias(\"left_entropy\"),<br>                (<br>                        -1<br>                        * pl.sum_horizontal(<br>                    [<br>                        (<br>                                pl.col(f\"right_proportion_class_{target_value}\")<br>                                * pl.col(f\"right_proportion_class_{target_value}\").log(base=2)<br>                        ).fill_nan(0.0)<br>                        for target_value in unique_targets<br>                    ]<br>                )<br>                ).alias(\"right_entropy\"),<br>                (<br>                        -1<br>                        * pl.sum_horizontal(<br>                    [<br>                        (<br>                                pl.col(f\"parent_proportion_class_{target_value}\")<br>                                * pl.col(f\"parent_proportion_class_{target_value}\").log(base=2)<br>                        ).fill_nan(0.0)<br>                        for target_value in unique_targets<br>                    ]<br>                )<br>                ).alias(\"parent_entropy\"),<br>                # From previous select<br>                pl.col(\"cum_sum_count_examples\"),<br>                pl.col(\"sum_count_examples\"),<br>                pl.col(\"feature_value\"),<br>            )<br>            .select(<br>                (<br>                        pl.col(\"cum_sum_count_examples\") \/ pl.col(\"sum_count_examples\") * pl.col(\"left_entropy\")<br>                        + (pl.col(\"sum_count_examples\") - pl.col(\"cum_sum_count_examples\"))<br>                        \/ pl.col(\"sum_count_examples\")<br>                        * pl.col(\"right_entropy\")<br>                ).alias(\"child_entropy\"),<br>                # From previous select<br>                pl.col(\"parent_entropy\"),<br>                pl.col(\"feature_value\"),<br>            )<br>            .select(<br>                (pl.col(\"parent_entropy\") - pl.col(\"child_entropy\")).alias(\"information_gain\"),<br>                # From previous select<br>                pl.col(\"parent_entropy\"),<br>                pl.col(\"feature_value\"),<br>            )<br>            .filter(pl.col(\"information_gain\").is_not_nan())<br>            .sort(\"information_gain\", descending=True)<br>            .head(1)<br>            .with_columns(feature=pl.lit(feature_name))<br>        )<br>        information_gain_dfs.append(information_gain_df)<br><br>    if isinstance(information_gain_dfs[0], pl.LazyFrame):<br>        information_gain_dfs = pl.collect_all(information_gain_dfs, streaming=self.streaming)<br><br>    information_gain_dfs = pl.concat(information_gain_dfs, how=\"vertical_relaxed\").sort(<br>        \"information_gain\", descending=True<br>    )<br><br>    information_gain = 0<br>    if len(information_gain_dfs) &gt; 0:<br>        best_params = information_gain_dfs.row(0, named=True)<br>        information_gain = best_params[\"information_gain\"]<br><br>    if information_gain &gt; 0:<br>        left_mask = data.select(filter=pl.col(best_params[\"feature\"]) &lt;= best_params[\"feature_value\"])<br>        if isinstance(left_mask, pl.LazyFrame):<br>            left_mask = left_mask.collect(streaming=self.streaming)<br>        left_mask = left_mask[\"filter\"]<br><br>        # Split data<br>        left_df = data.filter(left_mask)<br>        right_df = data.filter(~left_mask)<br><br>        left_subtree = self._build_tree(left_df, feature_names, target_name, unique_targets, depth + 1)<br>        right_subtree = self._build_tree(right_df, feature_names, target_name, unique_targets, depth + 1)<br><br>        if isinstance(data, pl.LazyFrame):<br>            target_distribution = (<br>                data.select(target_name)<br>                .collect(streaming=self.streaming)[target_name]<br>                .value_counts()<br>                .sort(target_name)[\"count\"]<br>                .to_list()<br>            )<br>        else:<br>            target_distribution = data[target_name].value_counts().sort(target_name)[\"count\"].to_list()<br><br>        return {<br>            \"type\": \"node\",<br>            \"feature\": best_params[\"feature\"],<br>            \"threshold\": best_params[\"feature_value\"],<br>            \"information_gain\": best_params[\"information_gain\"],<br>            \"entropy\": best_params[\"parent_entropy\"],<br>            \"target_distribution\": target_distribution,<br>            \"left\": left_subtree,<br>            \"right\": right_subtree,<br>        }<br>    else:<br>        return {\"type\": \"leaf\", \"value\": self.get_majority_class(data, target_name)}<\/pre>\n<p>This method is the heart of building the trees and I will explain it step by step. First, when entering the method, it is checked if the max depth stopping criterion is\u00a0met.<\/p>\n<pre>if self.max_depth is not None and depth &gt;= self.max_depth:<br>    return {\"type\": \"leaf\", \"value\": self.get_majority_class(data, target_name)}<\/pre>\n<p>If the current depth is equal to or greater than the max_depth, a node of the type leaf will be returned. The value of the leaf corresponds to the majority class of the data. This is calculated as\u00a0follows:<\/p>\n<pre>def get_majority_class(self, df: Union[pl.DataFrame, pl.LazyFrame], target_name: str) -&gt; str:<br>    \"\"\"<br>    Returns the majority class of a dataframe.<br><br>    :param df: The dataframe to evaluate.<br>    :param target_name: Name of the target column.<br><br>    :return: majority class.<br>    \"\"\"<br>    majority_class = df.group_by(target_name).len().filter(pl.col(\"len\") == pl.col(\"len\").max()).select(target_name)<br>    if isinstance(majority_class, pl.LazyFrame):<br>        majority_class = majority_class.collect(streaming=self.streaming)<br>    return majority_class[target_name][0]<\/pre>\n<p>To get the majority class, the count of rows per target is determined by grouping over the target column and aggregating with len(). The target instance, which is present in most of the rows, is returned as the majority\u00a0class.<\/p>\n<h4>Information Gain as Splitting Criteria<\/h4>\n<p>To find a good split of the data, the information gain is\u00a0used.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AwvX6aHlQ4uHbvaFwN04DiA.png?ssl=1\"><figcaption>Equation 1\u200a\u2014\u200aCalculation of information gain. Image by\u00a0author.<\/figcaption><\/figure>\n<p>To get the information gain, the parent entropy and child entropy need to be calculated.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2Axvu-vWjfq-_l14EQV2ZhHA.png?ssl=1\"><figcaption>Equation 2\u200a\u2014\u200aCalculation of entropy. Image by\u00a0author.<\/figcaption><\/figure>\n<p>A good explanation of the interpretation of information gain can be found\u00a0<a href=\"https:\/\/victorzhou.com\/blog\/information-gain\/\">here<\/a>.<\/p>\n<h4>Calculating The Information Gain in\u00a0Polars<\/h4>\n<p>The information gain is calculated for each feature value that is present in a feature\u00a0column.<\/p>\n<pre>information_gain_df = (<br>    feature_data.group_by(\"feature_value\")<br>    .agg(<br>        [<br>         pl.col(target_name)<br>        .filter(pl.col(target_name) == target_value)<br>        .len()<br>        .alias(f\"class_{target_value}_count\")<br>            for target_value in unique_targets<br>        ]<br>        + [pl.col(target_name).len().alias(\"count_examples\")]<br>    )<br>    .sort(\"feature_value\")<\/pre>\n<p>The feature values are grouped, and the count of each of the target values is assigned to it. Additionally, the total count of rows for that feature value is saved as count_examples. In the last step, the data is sorted by feature_value. This is needed to calculate the splits in the next\u00a0step.<\/p>\n<p>For the heart disease dataset, after the first calculation step, the data looks like\u00a0this:<\/p>\n<pre>\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510<br>\u2502 feature_value \u2506 class_0_count \u2506 class_1_count \u2506 count_examples \u2502<br>\u2502 ---           \u2506 ---           \u2506 ---           \u2506 ---            \u2502<br>\u2502 i8            \u2506 u32           \u2506 u32           \u2506 u32            \u2502<br>\u255e\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2561<br>\u2502 29            \u2506 2             \u2506 0             \u2506 2              \u2502<br>\u2502 30            \u2506 1             \u2506 0             \u2506 1              \u2502<br>\u2502 39            \u2506 1068          \u2506 331           \u2506 1399           \u2502<br>\u2502 40            \u2506 975           \u2506 263           \u2506 1238           \u2502<br>\u2502 41            \u2506 1052          \u2506 438           \u2506 1490           \u2502<br>\u2502 \u2026             \u2506 \u2026             \u2506 \u2026             \u2506 \u2026              \u2502<br>\u2502 60            \u2506 1054          \u2506 1460          \u2506 2514           \u2502<br>\u2502 61            \u2506 695           \u2506 1408          \u2506 2103           \u2502<br>\u2502 62            \u2506 566           \u2506 1125          \u2506 1691           \u2502<br>\u2502 63            \u2506 572           \u2506 1517          \u2506 2089           \u2502<br>\u2502 64            \u2506 479           \u2506 1217          \u2506 1696           \u2502<br>\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518<\/pre>\n<p>Here, the feature age_years is processed. Class 0 stands for \u201cno heart disease,\u201d and class 1 stands for \u201cheart disease.\u201d The data is sorted by the age of years feature, and the columns contain the count of class 0, class 1, and the total count of examples with the respective feature\u00a0value.<\/p>\n<p>In the next step, the cumulative sum over the count of classes is calculated for each feature\u00a0value.<\/p>\n<pre>.select(<br>    [<br>        pl.col(f\"class_{target_value}_count\").cum_sum().alias(f\"cum_sum_class_{target_value}_count\")<br>        for target_value in unique_targets<br>    ]<br>    + [<br>        pl.col(f\"class_{target_value}_count\").sum().alias(f\"sum_class_{target_value}_count\")<br>        for target_value in unique_targets<br>    ]<br>    + [<br>        pl.col(\"count_examples\").cum_sum().alias(\"cum_sum_count_examples\"),<br>        pl.col(\"count_examples\").sum().alias(\"sum_count_examples\"),<br>    ]<br>    + [<br>        # From previous select<br>        pl.col(\"feature_value\"),<br>    ]<br>)<br>.filter(<br>    # At least one example available<br>    pl.col(\"sum_count_examples\")<br>    &gt; pl.col(\"cum_sum_count_examples\")<br>)<\/pre>\n<p>The intuition behind it is that when a split is executed over a specific feature value, it includes the count of target values from smaller feature values. To be able to calculate the proportion, the total sum of the target values is calculated. The same procedure is repeated for count_examples, where the cumulative sum and the total sum are calculated as\u00a0well.<\/p>\n<p>After the calculation, the data looks like\u00a0this:<\/p>\n<pre>\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510<br>\u2502 cum_sum_clas \u2506 cum_sum_cla \u2506 sum_class_0 \u2506 sum_class_1 \u2506 cum_sum_cou \u2506 sum_count_e \u2506 feature_val \u2502<br>\u2502 s_0_count    \u2506 ss_1_count  \u2506 _count      \u2506 _count      \u2506 nt_examples \u2506 xamples     \u2506 ue          \u2502<br>\u2502 ---          \u2506 ---         \u2506 ---         \u2506 ---         \u2506 ---         \u2506 ---         \u2506 ---         \u2502<br>\u2502 u32          \u2506 u32         \u2506 u32         \u2506 u32         \u2506 u32         \u2506 u32         \u2506 i8          \u2502<br>\u255e\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2561<br>\u2502 3            \u2506 0           \u2506 27717       \u2506 26847       \u2506 3           \u2506 54564       \u2506 29          \u2502<br>\u2502 4            \u2506 0           \u2506 27717       \u2506 26847       \u2506 4           \u2506 54564       \u2506 30          \u2502<br>\u2502 1097         \u2506 324         \u2506 27717       \u2506 26847       \u2506 1421        \u2506 54564       \u2506 39          \u2502<br>\u2502 2090         \u2506 595         \u2506 27717       \u2506 26847       \u2506 2685        \u2506 54564       \u2506 40          \u2502<br>\u2502 3155         \u2506 1025        \u2506 27717       \u2506 26847       \u2506 4180        \u2506 54564       \u2506 41          \u2502<br>\u2502 \u2026            \u2506 \u2026           \u2506 \u2026           \u2506 \u2026           \u2506 \u2026           \u2506 \u2026           \u2506 \u2026           \u2502<br>\u2502 24302        \u2506 20162       \u2506 27717       \u2506 26847       \u2506 44464       \u2506 54564       \u2506 59          \u2502<br>\u2502 25356        \u2506 21581       \u2506 27717       \u2506 26847       \u2506 46937       \u2506 54564       \u2506 60          \u2502<br>\u2502 26046        \u2506 23020       \u2506 27717       \u2506 26847       \u2506 49066       \u2506 54564       \u2506 61          \u2502<br>\u2502 26615        \u2506 24131       \u2506 27717       \u2506 26847       \u2506 50746       \u2506 54564       \u2506 62          \u2502<br>\u2502 27216        \u2506 25652       \u2506 27717       \u2506 26847       \u2506 52868       \u2506 54564       \u2506 63          \u2502<br>\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518<\/pre>\n<p>In the next step, the proportions are calculated for each feature\u00a0value.<\/p>\n<pre>.select(<br>    [<br>        (pl.col(f\"cum_sum_class_{target_value}_count\") \/ pl.col(\"cum_sum_count_examples\")).alias(<br>            f\"left_proportion_class_{target_value}\"<br>        )<br>        for target_value in unique_targets<br>    ]<br>    + [<br>        (<br>                (pl.col(f\"sum_class_{target_value}_count\") - pl.col(f\"cum_sum_class_{target_value}_count\"))<br>                \/ (pl.col(\"sum_count_examples\") - pl.col(\"cum_sum_count_examples\"))<br>        ).alias(f\"right_proportion_class_{target_value}\")<br>        for target_value in unique_targets<br>    ]<br>    + [<br>        (pl.col(f\"sum_class_{target_value}_count\") \/ pl.col(\"sum_count_examples\")).alias(<br>            f\"parent_proportion_class_{target_value}\"<br>        )<br>        for target_value in unique_targets<br>    ]<br>    + [<br>        # From previous select<br>        pl.col(\"cum_sum_count_examples\"),<br>        pl.col(\"sum_count_examples\"),<br>        pl.col(\"feature_value\"),<br>    ]<br>)<\/pre>\n<p>To calculate the proportions, the results from the previous step can be used. For the left proportion, the cumulative sum of each target value is divided by the cumulative sum of the example count. For the right proportion, we need to know how many examples we have on the right side for each target value. That is calculated by subtracting the total sum for the target value from the cumulative sum of the target value. The same calculation is used to determine the total count of examples on the right side by subtracting the sum of the example count from the cumulative sum of the example count. Additionally, the parent proportion is calculated. This is done by dividing the sum of the target values counts by the total count of examples.<\/p>\n<p>This is the result data after this\u00a0step:<\/p>\n<pre>\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510<br>\u2502 left_prop \u2506 left_prop \u2506 right_pro \u2506 right_pro \u2506 \u2026 \u2506 parent_pr \u2506 cum_sum_c \u2506 sum_count \u2506 feature_ \u2502<br>\u2502 ortion_cl \u2506 ortion_cl \u2506 portion_c \u2506 portion_c \u2506   \u2506 oportion_ \u2506 ount_exam \u2506 _examples \u2506 value    \u2502<br>\u2502 ass_0     \u2506 ass_1     \u2506 lass_0    \u2506 lass_1    \u2506   \u2506 class_1   \u2506 ples      \u2506 ---       \u2506 ---      \u2502<br>\u2502 ---       \u2506 ---       \u2506 ---       \u2506 ---       \u2506   \u2506 ---       \u2506 ---       \u2506 u32       \u2506 i8       \u2502<br>\u2502 f64       \u2506 f64       \u2506 f64       \u2506 f64       \u2506   \u2506 f64       \u2506 u32       \u2506           \u2506          \u2502<br>\u255e\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2561<br>\u2502 1.0       \u2506 0.0       \u2506 0.506259  \u2506 0.493741  \u2506 \u2026 \u2506 0.493714  \u2506 3         \u2506 54564     \u2506 29       \u2502<br>\u2502 1.0       \u2506 0.0       \u2506 0.50625   \u2506 0.49375   \u2506 \u2026 \u2506 0.493714  \u2506 4         \u2506 54564     \u2506 30       \u2502<br>\u2502 0.754902  \u2506 0.245098  \u2506 0.499605  \u2506 0.500395  \u2506 \u2026 \u2506 0.493714  \u2506 1428      \u2506 54564     \u2506 39       \u2502<br>\u2502 0.765596  \u2506 0.234404  \u2506 0.492739  \u2506 0.507261  \u2506 \u2026 \u2506 0.493714  \u2506 2709      \u2506 54564     \u2506 40       \u2502<br>\u2502 0.741679  \u2506 0.258321  \u2506 0.486929  \u2506 0.513071  \u2506 \u2026 \u2506 0.493714  \u2506 4146      \u2506 54564     \u2506 41       \u2502<br>\u2502 \u2026         \u2506 \u2026         \u2506 \u2026         \u2506 \u2026         \u2506 \u2026 \u2506 \u2026         \u2506 \u2026         \u2506 \u2026         \u2506 \u2026        \u2502<br>\u2502 0.545735  \u2506 0.454265  \u2506 0.333563  \u2506 0.666437  \u2506 \u2026 \u2506 0.493714  \u2506 44419     \u2506 54564     \u2506 59       \u2502<br>\u2502 0.539065  \u2506 0.460935  \u2506 0.305025  \u2506 0.694975  \u2506 \u2026 \u2506 0.493714  \u2506 46922     \u2506 54564     \u2506 60       \u2502<br>\u2502 0.529725  \u2506 0.470275  \u2506 0.297071  \u2506 0.702929  \u2506 \u2026 \u2506 0.493714  \u2506 49067     \u2506 54564     \u2506 61       \u2502<br>\u2502 0.523006  \u2506 0.476994  \u2506 0.282551  \u2506 0.717449  \u2506 \u2026 \u2506 0.493714  \u2506 50770     \u2506 54564     \u2506 62       \u2502<br>\u2502 0.513063  \u2506 0.486937  \u2506 0.296188  \u2506 0.703812  \u2506 \u2026 \u2506 0.493714  \u2506 52859     \u2506 54564     \u2506 63       \u2502<br>\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518<\/pre>\n<p>Now that the proportions are available, the entropy can be calculated.<\/p>\n<pre>.select(<br>    (<br>            -1<br>            * pl.sum_horizontal(<br>        [<br>            (<br>                    pl.col(f\"left_proportion_class_{target_value}\")<br>                    * pl.col(f\"left_proportion_class_{target_value}\").log(base=2)<br>            ).fill_nan(0.0)<br>            for target_value in unique_targets<br>        ]<br>    )<br>    ).alias(\"left_entropy\"),<br>    (<br>            -1<br>            * pl.sum_horizontal(<br>        [<br>            (<br>                    pl.col(f\"right_proportion_class_{target_value}\")<br>                    * pl.col(f\"right_proportion_class_{target_value}\").log(base=2)<br>            ).fill_nan(0.0)<br>            for target_value in unique_targets<br>        ]<br>    )<br>    ).alias(\"right_entropy\"),<br>    (<br>            -1<br>            * pl.sum_horizontal(<br>        [<br>            (<br>                    pl.col(f\"parent_proportion_class_{target_value}\")<br>                    * pl.col(f\"parent_proportion_class_{target_value}\").log(base=2)<br>            ).fill_nan(0.0)<br>            for target_value in unique_targets<br>        ]<br>    )<br>    ).alias(\"parent_entropy\"),<br>    # From previous select<br>    pl.col(\"cum_sum_count_examples\"),<br>    pl.col(\"sum_count_examples\"),<br>    pl.col(\"feature_value\"),<br>)<\/pre>\n<p>For the calculation of the entropy, Equation 2 is used. The left entropy is calculated using the left proportion, and the right entropy uses the right proportion. For the parent entropy, the parent proportion is used. In this implementation, pl.sum_horizontal() is used to calculate the sum of the proportions to make use of possible optimizations from polars. This can also be replaced with the python-native sum()\u00a0method.<\/p>\n<p>The data with the entropy values look as\u00a0follows:<\/p>\n<pre>\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510<br>\u2502 left_entropy \u2506 right_entropy \u2506 parent_entropy \u2506 cum_sum_count_e \u2506 sum_count_exam \u2506 feature_value \u2502<br>\u2502 ---          \u2506 ---           \u2506 ---            \u2506 xamples         \u2506 ples           \u2506 ---           \u2502<br>\u2502 f64          \u2506 f64           \u2506 f64            \u2506 ---             \u2506 ---            \u2506 i8            \u2502<br>\u2502              \u2506               \u2506                \u2506 u32             \u2506 u32            \u2506               \u2502<br>\u255e\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2561<br>\u2502 -0.0         \u2506 0.999854      \u2506 0.999853       \u2506 3               \u2506 54564          \u2506 29            \u2502<br>\u2502 -0.0         \u2506 0.999854      \u2506 0.999853       \u2506 4               \u2506 54564          \u2506 30            \u2502<br>\u2502 0.783817     \u2506 1.0           \u2506 0.999853       \u2506 1427            \u2506 54564          \u2506 39            \u2502<br>\u2502 0.767101     \u2506 0.999866      \u2506 0.999853       \u2506 2694            \u2506 54564          \u2506 40            \u2502<br>\u2502 0.808516     \u2506 0.999503      \u2506 0.999853       \u2506 4177            \u2506 54564          \u2506 41            \u2502<br>\u2502 \u2026            \u2506 \u2026             \u2506 \u2026              \u2506 \u2026               \u2506 \u2026              \u2506 \u2026             \u2502<br>\u2502 0.993752     \u2506 0.918461      \u2506 0.999853       \u2506 44483           \u2506 54564          \u2506 59            \u2502<br>\u2502 0.995485     \u2506 0.890397      \u2506 0.999853       \u2506 46944           \u2506 54564          \u2506 60            \u2502<br>\u2502 0.997367     \u2506 0.880977      \u2506 0.999853       \u2506 49106           \u2506 54564          \u2506 61            \u2502<br>\u2502 0.99837      \u2506 0.859431      \u2506 0.999853       \u2506 50800           \u2506 54564          \u2506 62            \u2502<br>\u2502 0.999436     \u2506 0.872346      \u2506 0.999853       \u2506 52877           \u2506 54564          \u2506 63            \u2502<br>\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518<\/pre>\n<p>Almost there! The final step is missing, which is calculating the child entropy and using that to get the information gain.<\/p>\n<pre>.select(<br>    (<br>        pl.col(\"cum_sum_count_examples\") \/ pl.col(\"sum_count_examples\") * pl.col(\"left_entropy\")<br>        + (pl.col(\"sum_count_examples\") - pl.col(\"cum_sum_count_examples\"))<br>        \/ pl.col(\"sum_count_examples\")<br>        * pl.col(\"right_entropy\")<br>    ).alias(\"child_entropy\"),<br>    # From previous select<br>    pl.col(\"parent_entropy\"),<br>    pl.col(\"feature_value\"),<br>)<br>.select(<br>    (pl.col(\"parent_entropy\") - pl.col(\"child_entropy\")).alias(\"information_gain\"),<br>    # From previous select<br>    pl.col(\"parent_entropy\"),<br>    pl.col(\"feature_value\"),<br>)<br>.filter(pl.col(\"information_gain\").is_not_nan())<br>.sort(\"information_gain\", descending=True)<br>.head(1)<br>.with_columns(feature=pl.lit(feature_name))<br>)<br>information_gain_dfs.append(information_gain_df)<\/pre>\n<p>For the child entropy, the left and right entropy are weighted by the count of examples for the feature values. The sum of both weighted entropy values is used as child entropy. To calculate the information gain, we simply need to subtract the child entropy from the parent entropy, as can be seen in Equation 1. The best feature value is determined by sorting the data by information gain and selecting the first row. It is appended to a list that gathers all the best feature values from all features.<\/p>\n<p>Before applying\u00a0.head(1), the data looks as\u00a0follows:<\/p>\n<pre>\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510<br>\u2502 information_gain \u2506 parent_entropy \u2506 feature_value \u2502<br>\u2502 ---              \u2506 ---            \u2506 ---           \u2502<br>\u2502 f64              \u2506 f64            \u2506 i8            \u2502<br>\u255e\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2561<br>\u2502 0.028388         \u2506 0.999928       \u2506 54            \u2502<br>\u2502 0.027719         \u2506 0.999928       \u2506 52            \u2502<br>\u2502 0.027283         \u2506 0.999928       \u2506 53            \u2502<br>\u2502 0.026826         \u2506 0.999928       \u2506 50            \u2502<br>\u2502 0.026812         \u2506 0.999928       \u2506 51            \u2502<br>\u2502 \u2026                \u2506 \u2026              \u2506 \u2026             \u2502<br>\u2502 0.010928         \u2506 0.999928       \u2506 62            \u2502<br>\u2502 0.005872         \u2506 0.999928       \u2506 39            \u2502<br>\u2502 0.004155         \u2506 0.999928       \u2506 63            \u2502<br>\u2502 0.000072         \u2506 0.999928       \u2506 30            \u2502<br>\u2502 0.000054         \u2506 0.999928       \u2506 29            \u2502<br>\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518<\/pre>\n<p>Here, it can be seen that the age feature value of 54 has the highest information gain. This feature value will be collected for the age feature and needs to compete against the other features.<\/p>\n<h4>Selecting Best Split and Define Sub\u00a0Trees<\/h4>\n<p>To select the best split, the highest information gain needs to be found across all features.<\/p>\n<pre>if isinstance(information_gain_dfs[0], pl.LazyFrame):<br>    information_gain_dfs = pl.collect_all(information_gain_dfs, streaming=self.streaming)<br><br>information_gain_dfs = pl.concat(information_gain_dfs, how=\"vertical_relaxed\").sort(<br>    \"information_gain\", descending=True<br>)<\/pre>\n<p>For that, the pl.collect_all() method is used on information_gain_dfs. This evaluates all LazyFrames in parallel, which makes the processing very efficient. The result is a list of polars DataFrames, which are concatenated and sorted by information gain.<\/p>\n<p>For the heart disease example, the data looks like\u00a0this:<\/p>\n<pre>\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510<br>\u2502 information_gain \u2506 parent_entropy \u2506 feature_value \u2506 feature     \u2502<br>\u2502 ---              \u2506 ---            \u2506 ---           \u2506 ---         \u2502<br>\u2502 f64              \u2506 f64            \u2506 f64           \u2506 str         \u2502<br>\u255e\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2561<br>\u2502 0.138032         \u2506 0.999909       \u2506 129.0         \u2506 ap_hi       \u2502<br>\u2502 0.09087          \u2506 0.999909       \u2506 85.0          \u2506 ap_lo       \u2502<br>\u2502 0.029966         \u2506 0.999909       \u2506 0.0           \u2506 cholesterol \u2502<br>\u2502 0.028388         \u2506 0.999909       \u2506 54.0          \u2506 age_years   \u2502<br>\u2502 0.01968          \u2506 0.999909       \u2506 27.435041     \u2506 bmi         \u2502<br>\u2502 \u2026                \u2506 \u2026              \u2506 \u2026             \u2506 \u2026           \u2502<br>\u2502 0.000851         \u2506 0.999909       \u2506 0.0           \u2506 active      \u2502<br>\u2502 0.000351         \u2506 0.999909       \u2506 156.0         \u2506 height      \u2502<br>\u2502 0.000223         \u2506 0.999909       \u2506 0.0           \u2506 smoke       \u2502<br>\u2502 0.000098         \u2506 0.999909       \u2506 0.0           \u2506 alco        \u2502<br>\u2502 0.000031         \u2506 0.999909       \u2506 0.0           \u2506 gender      \u2502<br>\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518<\/pre>\n<p>Out of all features, the ap_hi (Systolic blood pressure) feature value of 129 results in the best information gain and thus will be selected for the first\u00a0split.<\/p>\n<pre>information_gain = 0<br>if len(information_gain_dfs) &gt; 0:<br>    best_params = information_gain_dfs.row(0, named=True)<br>    information_gain = best_params[\"information_gain\"]<\/pre>\n<p>In some cases, information_gain_dfs might be empty, for example, when all splits result in having only examples on the left or right side. If this is the case, the information gain is zero. Otherwise, we get the feature value with the highest information gain.<\/p>\n<pre>if information_gain &gt; 0:<br>    left_mask = data.select(filter=pl.col(best_params[\"feature\"]) &lt;= best_params[\"feature_value\"])<br>    if isinstance(left_mask, pl.LazyFrame):<br>        left_mask = left_mask.collect(streaming=self.streaming)<br>    left_mask = left_mask[\"filter\"]<br><br>    # Split data<br>    left_df = data.filter(left_mask)<br>    right_df = data.filter(~left_mask)<br><br>    left_subtree = self._build_tree(left_df, feature_names, target_name, unique_targets, depth + 1)<br>    right_subtree = self._build_tree(right_df, feature_names, target_name, unique_targets, depth + 1)<br><br>    if isinstance(data, pl.LazyFrame):<br>        target_distribution = (<br>            data.select(target_name)<br>            .collect(streaming=self.streaming)[target_name]<br>            .value_counts()<br>            .sort(target_name)[\"count\"]<br>            .to_list()<br>        )<br>    else:<br>        target_distribution = data[target_name].value_counts().sort(target_name)[\"count\"].to_list()<br><br>    return {<br>        \"type\": \"node\",<br>        \"feature\": best_params[\"feature\"],<br>        \"threshold\": best_params[\"feature_value\"],<br>        \"information_gain\": best_params[\"information_gain\"],<br>        \"entropy\": best_params[\"parent_entropy\"],<br>        \"target_distribution\": target_distribution,<br>        \"left\": left_subtree,<br>        \"right\": right_subtree,<br>    }<br>else:<br>    return {\"type\": \"leaf\", \"value\": self.get_majority_class(data, target_name)}<\/pre>\n<p>When the information gain is greater than zero, the sub-trees are defined. For that, the left mask is defined using the feature value that resulted in the best information gain. The mask is applied to the parent data to get the left data frame. The negation of the left mask is used to define the right data frame. Both left and right data frames are used to call the _build_tree() method again with an increased depth+1. As the last step, the target distribution is calculated. This is used as additional information on the node and will be visible when plotting the tree along with the other information.<\/p>\n<p>When information gain is zero, a leaf instance will be returned. This contains the majority class of the given\u00a0data.<\/p>\n<h3>Make predictions<\/h3>\n<p>It is possible to make predictions in two different ways. If the input data is small, the predict() method can be\u00a0used.<\/p>\n<pre>def predict(self, data: Iterable[dict]):<br>    def _predict_sample(node, sample):<br>        if node[\"type\"] == \"leaf\":<br>            return node[\"value\"]<br>        if sample[node[\"feature\"]] &lt;= node[\"threshold\"]:<br>            return _predict_sample(node[\"left\"], sample)<br>        else:<br>            return _predict_sample(node[\"right\"], sample)<br><br>    predictions = [_predict_sample(self.tree, sample) for sample in data]<br>    return predictions<\/pre>\n<p>Here, the data can be provided as an iterable of dicts. Each dict contains the feature names as keys and the feature values as values. By using the _predict_sample() method, the path in the tree is followed until a leaf node is reached. This contains the class that is assigned to the respective example.<\/p>\n<pre>def predict_many(self, data: Union[pl.DataFrame, pl.LazyFrame]) -&gt; List[Union[int, float]]:<br>    \"\"\"<br>    Predict method.<br><br>    :param data: Polars DataFrame or LazyFrame.<br>    :return: List of predicted target values.<br>    \"\"\"<br>    if self.categorical_mappings:<br>        data = self.apply_categorical_mappings(data)<br><br>    def _predict_many(node, temp_data):<br>        if node[\"type\"] == \"node\":<br>            left = _predict_many(node[\"left\"], temp_data.filter(pl.col(node[\"feature\"]) &lt;= node[\"threshold\"]))<br>            right = _predict_many(node[\"right\"], temp_data.filter(pl.col(node[\"feature\"]) &gt; node[\"threshold\"]))<br>            return pl.concat([left, right], how=\"diagonal_relaxed\")<br>        else:<br>            return temp_data.select(pl.col(\"temp_prediction_index\"), pl.lit(node[\"value\"]).alias(\"prediction\"))<br><br>    data = data.with_row_index(\"temp_prediction_index\")<br>    predictions = _predict_many(self.tree, data).sort(\"temp_prediction_index\").select(pl.col(\"prediction\"))<br><br>    # Convert predictions to a list<br>    if isinstance(predictions, pl.LazyFrame):<br>        # Despite the execution plans says there is no streaming, using streaming here significantly<br>        # increases the performance and decreases the memory food print.<br>        predictions = predictions.collect(streaming=True)<br><br>    predictions = predictions[\"prediction\"].to_list()<br>    return predictions<\/pre>\n<p>If a big example set should be predicted, it is more efficient to use the predict_many() method. This makes use of the advantages that polars provides in terms of parallel processing and memory efficiency.<\/p>\n<p>The data can be provided as a polars DataFrame or LazyFrame. Similarly to the _build_tree() method in the training process, a _predict_many() method is called recursively. All examples in the data are filtered into sub-trees until the leaf node is reached. Examples that went the same path to the leaf node get the same prediction value assigned. At the end of the process, all sub-frames of examples are concatenated again. Since the order can not be preserved with that, a temporary prediction index is set at the beginning of the process. When all predictions are done, the original order is restored with sorting by that\u00a0index.<\/p>\n<h3>Using the classifier on a\u00a0dataset<\/h3>\n<p>A usage example for the decision tree classifier can be found <a href=\"https:\/\/github.com\/tocab\/efficient-trees\/blob\/main\/examples\/heart_disease.py\">here<\/a>. The decision tree is trained on a heart disease dataset. A train and test set is defined to test the performance of the implementation. After the training, the tree is plotted and saved to a\u00a0file.<\/p>\n<p>With a max depth of four, the resulting tree looks as\u00a0follows:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A8wEPWI1OzNJwVKqctm5Rlw.png?ssl=1\"><figcaption>Decision tree for heart disease dataset. Image by\u00a0author.<\/figcaption><\/figure>\n<p>It achieves a train and test accuracy of 73% on the given\u00a0data.<\/p>\n<h3>Runtime comparison<\/h3>\n<p>One goal of using polars as a backend for decision trees is to explore the runtime and memory usage and compare it to other frameworks. For that, I created a memory profiling script that can be found\u00a0<a href=\"https:\/\/github.com\/tocab\/efficient-trees\/blob\/main\/examples\/memory_profiling.py\">here<\/a>.<\/p>\n<p>The script compares this implementation, which is called \u201cefficient-trees\u201d against sklearn and lightgbm. For efficient-trees, the lazy streaming variant and non-lazy in-memory variant are\u00a0tested.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/999\/1%2AWqrg3C7I2D6tPqtebZBKqw.png?ssl=1\"><figcaption>Comparison of runtime and memory usage. Image by\u00a0author.<\/figcaption><\/figure>\n<p>In the graph, it can be seen that lightgbm is the fastest and most memory-efficient framework. Since it introduced the possibility of using arrow datasets a while ago, the data can be processed efficiently. However, since the whole dataset still needs to be loaded and can\u2019t be streamed, there are still potential scaling\u00a0issues.<\/p>\n<p>The next best framework is efficient-trees without and with streaming. While efficient-trees without streaming has a better runtime, the streaming variant uses less\u00a0memory.<\/p>\n<p>The sklearn implementation achieves the worst results in terms of memory usage and runtime. Since the data needs to be provided as a numpy array, the memory usage grows a lot. The runtime can be explained by using only one CPU core. Support for multi-threading or multi-processing doesn\u2019t exist\u00a0yet.<\/p>\n<h3>Deep dive: Streaming in\u00a0polars<\/h3>\n<p>As can be seen in the comparison of the frameworks, the possibility of streaming the data instead of having it in memory makes a difference to all other frameworks. However, the streaming engine is still considered an experimental feature, and not all operations are compatible with streaming yet.<\/p>\n<p>To get a better understanding of what happens in the background, a look into the execution plan is useful. Let\u2019s jump back into the training process and get the execution plan for the following operation:<\/p>\n<pre>def fit(self, data: Union[pl.DataFrame, pl.LazyFrame], target_name: str) -&gt; None:<br>    \"\"\"<br>    Fit method to train the decision tree.<br><br>    :param data: Polars DataFrame or LazyFrame containing the training data.<br>    :param target_name: Name of the target column<br>    \"\"\"<br>    columns = data.collect_schema().names()<br>    feature_names = [col for col in columns if col != target_name]<br><br>    # Shrink dtypes<br>    data = data.select(pl.all().shrink_dtype()).with_columns(<br>        pl.col(target_name).cast(pl.UInt64).shrink_dtype().alias(target_name)<br>    )<\/pre>\n<p>The execution plan for data can be created with the following command:<\/p>\n<pre>data.explain(streaming=True)<\/pre>\n<p>This returns the execution plan for the LazyFrame.<\/p>\n<pre> WITH_COLUMNS:<br> [col(\"cardio\").strict_cast(UInt64).shrink_dtype().alias(\"cardio\")] <br>   SELECT [col(\"gender\").shrink_dtype(), col(\"height\").shrink_dtype(), col(\"weight\").shrink_dtype(), col(\"ap_hi\").shrink_dtype(), col(\"ap_lo\").shrink_dtype(), col(\"cholesterol\").shrink_dtype(), col(\"gluc\").shrink_dtype(), col(\"smoke\").shrink_dtype(), col(\"alco\").shrink_dtype(), col(\"active\").shrink_dtype(), col(\"cardio\").shrink_dtype(), col(\"age_years\").shrink_dtype(), col(\"bmi\").shrink_dtype()] FROM<br>    STREAMING:<br>      DF [\"gender\", \"height\", \"weight\", \"ap_hi\"]; PROJECT 13\/13 COLUMNS; SELECTION: None<\/pre>\n<p>The keyword that is important here is STREAMING. It can be seen that the initial dataset loading happens in the streaming mode, but when shrinking the dtypes, the whole dataset needs to be loaded into memory. Since the dtype shrinking is not a necessary part, I remove it temporarily to explore until what operation streaming is supported.<\/p>\n<p>The next problematic operation is assigning the categorical features.<\/p>\n<pre>def apply_categorical_mappings(self, data: Union[pl.DataFrame, pl.LazyFrame]) -&gt; Union[pl.DataFrame, pl.LazyFrame]:<br>    \"\"\"<br>    Apply categorical mappings on input frame.<br><br>    :param data: Polars DataFrame or LazyFrame with categorical columns.<br><br>    :return: Polars DataFrame or LazyFrame with mapped categorical columns<br>    \"\"\"<br>    return data.with_columns(<br>        [pl.col(col).replace(self.categorical_mappings[col]).cast(pl.UInt32) for col in self.categorical_columns]<br>    )<\/pre>\n<p>The replace expression doesn\u2019t support the streaming mode. Even after removing the cast, streaming is not used which can be seen in the execution plan.<\/p>\n<pre> WITH_COLUMNS:<br> [col(\"gender\").replace([Series, Series]), col(\"cholesterol\").replace([Series, Series]), col(\"gluc\").replace([Series, Series]), col(\"smoke\").replace([Series, Series]), col(\"alco\").replace([Series, Series]), col(\"active\").replace([Series, Series])] <br>  STREAMING:<br>    DF [\"gender\", \"height\", \"weight\", \"ap_hi\"]; PROJECT *\/13 COLUMNS; SELECTION: None<\/pre>\n<p>Moving on, I also remove the support for categorical features. What happens next is the calculation of the information gain.<\/p>\n<pre>information_gain_df = (<br>    feature_data.group_by(\"feature_value\")<br>    .agg(<br>        [<br>            pl.col(target_name)<br>            .filter(pl.col(target_name) == target_value)<br>            .len()<br>            .alias(f\"class_{target_value}_count\")<br>            for target_value in unique_targets<br>        ]<br>        + [pl.col(target_name).len().alias(\"count_examples\")]<br>    )<br>    .sort(\"feature_value\")<br>)<\/pre>\n<p>Unfortunately, already in the first part of calculating, the streaming mode is not supported anymore. Here, using pl.col().filter() prevents us from streaming the\u00a0data.<\/p>\n<pre>SORT BY [col(\"feature_value\")]<br>  AGGREGATE<br>   [col(\"cardio\").filter([(col(\"cardio\")) == (1)]).count().alias(\"class_1_count\"), col(\"cardio\").filter([(col(\"cardio\")) == (0)]).count().alias(\"class_0_count\"), col(\"cardio\").count().alias(\"count_examples\")] BY [col(\"feature_value\")] FROM<br>    STREAMING:<br>      RENAME<br>        simple \u03c0 2\/2 [\"gender\", \"cardio\"]<br>          DF [\"gender\", \"height\", \"weight\", \"ap_hi\"]; PROJECT 2\/13 COLUMNS; SELECTION: col(\"gender\").is_not_null()<\/pre>\n<p>Since this is not so easy to change, I will stop the exploration here. It can be concluded that in the decision tree implementation with polars backend, the full potential of streaming can\u2019t be used yet since important operators are still missing streaming support. Since the streaming mode is under active development, it might be possible to run most of the operators or even the whole calculation of the decision tree in the streaming mode in the\u00a0future.<\/p>\n<h3>Conclusion<\/h3>\n<p>In this blog post, I presented my custom implementation of a decision tree using polars as a backend. I showed implementation details and compared it to other decision tree frameworks. The comparison shows that this implementation can outperform sklearn in terms of runtime and memory usage. But there are still other frameworks like lightgbm that provide a better runtime and more efficient processing. There is a lot of potential in the streaming mode when using polars backend. Currently, some operators prevent an end-to-end streaming approach due to a lack of streaming support, but this is under active development. When polars makes progress with that, it is worth revisiting this implementation and comparing it to other frameworks again.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/medium.com\/_\/stat?event=post.clientViewed&amp;referrerSource=full_rss&amp;postId=d48892926ecf\" width=\"1\" height=\"1\" alt=\"\"><\/p>\n<hr>\n<p><a href=\"https:\/\/towardsdatascience.com\/build-a-decision-tree-in-polars-from-scratch-d48892926ecf\">Build a Decision Tree in Polars from Scratch<\/a> was originally published in <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a> on Medium, where people are continuing the conversation by highlighting and responding to this story.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Tobias Cabanski<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/medium.com\/m\/global-identity-2?redirectUrl=https%3A%2F%2Ftowardsdatascience.com%2Fbuild-a-decision-tree-in-polars-from-scratch-d48892926ecf\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Build a Decision Tree in Polars from Scratch Explore decision trees with polars\u00a0backend Photo by Leonard Laub on\u00a0Unsplash Decision tree algorithms have always fascinated me. They are easy to implement and achieve good results on various classification and regression tasks. Combined with boosting, decision trees are still state-of-the-art in many applications. Frameworks such as sklearn, [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,83,1058,166,1255,1495],"tags":[84,1496,1497],"class_list":["post-1477","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-data-science","category-decision-tree","category-hands-on-tutorials","category-lightgbm","category-polars-dataframe","tag-data","tag-pl","tag-polars"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1477"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=1477"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1477\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=1477"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=1477"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=1477"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}