Tag: dolce

DOLCE: Decomposing Off-Policy Evaluation/Learning into Lagged and Current Effects

DOLCE: Decomposing Off-Policy Evaluation/Learning into Lagged and Current Effects arXiv:2505.00961v1 Announce Type: new Abstract: Off-policy evaluation (OPE) and off-policy learning (OPL) for contextual bandit policies leverage historical data to evaluate and optimize a target policy. Most existing OPE/OPL methods–based on importance weighting or imputation–assume common support between the target and logging policies. When this assumption…

May 5, 2025

DOLCE: Decomposing Off-Policy Evaluation/Learning into Lagged and Current Effects