Category: econ.EM

  • Partial Identification under Missing Data Using Weak Shadow Variables from Pretrained Models

    Partial Identification under Missing Data Using Weak Shadow Variables from Pretrained Models arXiv:2602.16061v1 Announce Type: new Abstract: Estimating population quantities such as mean outcomes from user feedback is fundamental to platform evaluation and social science, yet feedback is often missing not at random (MNAR): users with stronger opinions are more likely to respond, so standard…

  • Robust X-Learner: Breaking the Curse of Imbalance and Heavy Tails via Robust Cross-Imputation

    Robust X-Learner: Breaking the Curse of Imbalance and Heavy Tails via Robust Cross-Imputation arXiv:2601.15360v1 Announce Type: new Abstract: Estimating Heterogeneous Treatment Effects (HTE) in industrial applications such as AdTech and healthcare presents a dual challenge: extreme class imbalance and heavy-tailed outcome distributions. While the X-Learner framework effectively addresses imbalance through cross-imputation, we demonstrate that it…

  • Stochastic Deep Learning: A Probabilistic Framework for Modeling Uncertainty in Structured Temporal Data

    Stochastic Deep Learning: A Probabilistic Framework for Modeling Uncertainty in Structured Temporal Data arXiv:2601.05227v1 Announce Type: new Abstract: I propose a novel framework that integrates stochastic differential equations (SDEs) with deep generative models to improve uncertainty quantification in machine learning applications involving structured and temporal data. This approach, termed Stochastic Latent Differential Inference (SLDI), embeds…

  • Detecting and Mitigating Treatment Leakage in Text-Based Causal Inference: Distillation and Sensitivity Analysis

    Detecting and Mitigating Treatment Leakage in Text-Based Causal Inference: Distillation and Sensitivity Analysis arXiv:2601.02400v1 Announce Type: cross Abstract: Text-based causal inference increasingly employs textual data as proxies for unobserved confounders, yet this approach introduces a previously undertheorized source of bias: treatment leakage. Treatment leakage occurs when text intended to capture confounding information also contains signals…

  • Sharp Structure-Agnostic Lower Bounds for General Functional Estimation

    Sharp Structure-Agnostic Lower Bounds for General Functional Estimation arXiv:2512.17341v1 Announce Type: new Abstract: The design of efficient nonparametric estimators has long been a central problem in statistics, machine learning, and decision making. Classical optimal procedures often rely on strong structural assumptions, which can be misspecified in practice and complicate deployment. This limitation has sparked growing…

  • Empirical Likelihood for Random Forests and Ensembles

    Empirical Likelihood for Random Forests and Ensembles arXiv:2511.13934v1 Announce Type: new Abstract: We develop an empirical likelihood (EL) framework for random forests and related ensemble methods, providing a likelihood-based approach to quantify their statistical uncertainty. Exploiting the incomplete $U$-statistic structure inherent in ensemble predictions, we construct an EL statistic that is asymptotically chi-squared when subsampling…

  • Testing Most Influential Sets

    Testing Most Influential Sets arXiv:2510.20372v1 Announce Type: new Abstract: Small subsets of data with disproportionate influence on model outcomes can have dramatic impacts on conclusions, with a few data points sometimes overturning key findings. While recent work has developed methods to identify these emph{most influential sets}, no formal theory exists to determine when their influence…

  • Beating the Winner’s Curse via Inference-Aware Policy Optimization

    Beating the Winner’s Curse via Inference-Aware Policy Optimization arXiv:2510.18161v1 Announce Type: new Abstract: There has been a surge of recent interest in automatically learning policies to target treatment decisions based on rich individual covariates. A common approach is to train a machine learning model to predict counterfactual outcomes, and then select the policy that optimizes…

  • From Reviews to Actionable Insights: An LLM-Based Approach for Attribute and Feature Extraction

    From Reviews to Actionable Insights: An LLM-Based Approach for Attribute and Feature Extraction arXiv:2510.16551v1 Announce Type: new Abstract: This research proposes a systematic, large language model (LLM) approach for extracting product and service attributes, features, and associated sentiments from customer reviews. Grounded in marketing theory, the framework distinguishes perceptual attributes from actionable features, producing interpretable…

  • Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression

    Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression arXiv:2509.22794v1 Announce Type: new Abstract: We study instrumental variable regression (IVaR) under differential privacy constraints. Classical IVaR methods (like two-stage least squares regression) rely on solving moment equations that directly use sensitive covariates and instruments, creating significant risks of privacy leakage and posing challenges in designing…

  • Galerkin-ARIMA: A Two-Stage Polynomial Regression Framework for Fast Rolling One-Step-Ahead Forecasting

    Galerkin-ARIMA: A Two-Stage Polynomial Regression Framework for Fast Rolling One-Step-Ahead Forecasting arXiv:2507.07469v1 Announce Type: new Abstract: Time-series models like ARIMA remain widely used for forecasting but limited to linear assumptions and high computational cost in large and complex datasets. We propose Galerkin-ARIMA that generalizes the AR component of ARIMA and replace it with a flexible…

  • It’s Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation

    It’s Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation arXiv:2507.02275v1 Announce Type: new Abstract: Structure-agnostic causal inference studies how well one can estimate a treatment effect given black-box machine learning estimates of nuisance functions (like the impact of confounders on treatment and outcomes). Here, we find that the answer depends in a…

  • Adaptive stable distribution and Hurst exponent by method of moments moving estimator for nonstationary time series

    Adaptive stable distribution and Hurst exponent by method of moments moving estimator for nonstationary time series arXiv:2506.05354v1 Announce Type: cross Abstract: Nonstationarity of real-life time series requires model adaptation. In classical approaches like ARMA-ARCH there is assumed some arbitrarily chosen dependence type. To avoid their bias, we will focus on novel more agnostic approach: moving…

  • Covariate-Adjusted Deep Causal Learning for Heterogeneous Panel Data Models

    Covariate-Adjusted Deep Causal Learning for Heterogeneous Panel Data Models arXiv:2505.20536v1 Announce Type: new Abstract: This paper studies the task of estimating heterogeneous treatment effects in causal panel data models, in the presence of covariate effects. We propose a novel Covariate-Adjusted Deep Causal Learning (CoDEAL) for panel data models, that employs flexible model structures and powerful…

  • Double Machine Learning for Causal Inference under Shared-State Interference

    Double Machine Learning for Causal Inference under Shared-State Interference arXiv:2504.08836v1 Announce Type: new Abstract: Researchers and practitioners often wish to measure treatment effects in settings where units interact via markets and recommendation systems. In these settings, units are affected by certain shared states, like prices, algorithmic recommendations or social signals. We formalize this structure, calling…

  • Online Multivariate Regularized Distributional Regression for High-dimensional Probabilistic Electricity Price Forecasting

    Online Multivariate Regularized Distributional Regression for High-dimensional Probabilistic Electricity Price Forecasting arXiv:2504.02518v1 Announce Type: new Abstract: Probabilistic electricity price forecasting (PEPF) is a key task for market participants in short-term electricity markets. The increasing availability of high-frequency data and the need for real-time decision-making in energy markets require online estimation methods for efficient model updating.…

  • An Artificial Trend Index for Private Consumption Using Google Trends

    An Artificial Trend Index for Private Consumption Using Google Trends arXiv:2503.21981v1 Announce Type: cross Abstract: In recent years, the use of databases that analyze trends, sentiments or news to make economic projections or create indicators has gained significant popularity, particularly with the Google Trends platform. This article explores the potential of Google search data to…

  • SNPL: Simultaneous Policy Learning and Evaluation for Safe Multi-Objective Policy Improvement

    SNPL: Simultaneous Policy Learning and Evaluation for Safe Multi-Objective Policy Improvement arXiv:2503.12760v1 Announce Type: new Abstract: To design effective digital interventions, experimenters face the challenge of learning decision policies that balance multiple objectives using offline data. Often, they aim to develop policies that maximize goal outcomes, while ensuring there are no undesirable changes in guardrail…

  • Modifying Final Splits of Classification Tree for Fine-tuning Subpopulation Target in Policy Making

    Modifying Final Splits of Classification Tree for Fine-tuning Subpopulation Target in Policy Making arXiv:2502.15072v1 Announce Type: new Abstract: Policymakers often use Classification and Regression Trees (CART) to partition populations based on binary outcomes and target subpopulations whose probability of the binary event exceeds a threshold. However, classic CART and knowledge distillation method whose student model…

  • Semiparametric Bayesian Difference-in-Differences

    Semiparametric Bayesian Difference-in-Differences arXiv:2412.04605v1 Announce Type: cross Abstract: This paper studies semiparametric Bayesian inference for the average treatment effect on the treated (ATT) within the difference-in-differences research design. We propose two new Bayesian methods with frequentist validity. The first one places a standard Gaussian process prior on the conditional mean function of the control group.…

  • Selective Reviews of Bandit Problems in AI via a Statistical View

    Selective Reviews of Bandit Problems in AI via a Statistical View arXiv:2412.02251v1 Announce Type: new Abstract: Reinforcement Learning (RL) is a widely researched area in artificial intelligence that focuses on teaching agents decision-making through interactions with their environment. A key subset includes stochastic multi-armed bandit (MAB) and continuum-armed bandit (SCAB) problems, which model sequential decision-making…

  • When Is Heterogeneity Actionable for Personalization?

    When Is Heterogeneity Actionable for Personalization? arXiv:2411.16552v1 Announce Type: cross Abstract: Targeting and personalization policies can be used to improve outcomes beyond the uniform policy that assigns the best performing treatment in an A/B test to everyone. Personalization relies on the presence of heterogeneity of treatment effects, yet, as we show in this paper, heterogeneity…