Quantifying Overfitting along the Regularization Path for Two-Part-Code MDL in Supervised Classification

arXiv:2503.02110v1 Announce Type: new
Abstract: We provide a complete characterization of the entire regularization curve of a modified two-part-code Minimum Description Length (MDL) learning rule for binary classification, based on an arbitrary prior or description language. citet{GL} previously established the lack of asymptotic consistency, from an agnostic PAC (frequentist worst case) perspective, of the MDL rule with a penalty parameter of $lambda=1$, suggesting that it underegularizes. Driven by interest in understanding how benign or catastrophic under-regularization and overfitting might be, we obtain a precise quantitative description of the worst case limiting error as a function of the regularization parameter $lambda$ and noise level (or approximation error), significantly tightening the analysis of citeauthor{GL} for $lambda=1$ and extending it to all other choices of $lambda$.

Xiaohan Zhu, Nathan Srebro

Go to original source