Marginal and Conditional Importance Measures from Machine Learning Models and Their Relationship with Conditional Average Treatment Effect

arXiv:2501.16988v1 Announce Type: new
Abstract: Interpreting black-box machine learning models is challenging due to their strong dependence on data and inherently non-parametric nature. This paper reintroduces the concept of importance through “Marginal Variable Importance Metric” (MVIM), a model-agnostic measure of predictor importance based on the true conditional expectation function. MVIM evaluates predictors’ influence on continuous or discrete outcomes. A permutation-based estimation approach, inspired by citet{breiman2001random} and citet{fisher2019all}, is proposed to estimate MVIM. MVIM estimator is biased when predictors are highly correlated, as black-box models struggle to extrapolate in low-probability regions. To address this, we investigated the bias-variance decomposition of MVIM to understand the source and pattern of the bias under high correlation. A Conditional Variable Importance Metric (CVIM), adapted from citet{strobl2008conditional}, is introduced to reduce this bias. Both MVIM and CVIM exhibit a quadratic relationship with the conditional average treatment effect (CATE).

Mohammad Kaviul Anam Khan, Olli Saarela, Rafal Kustra

Go to original source