Optimal Convergence Rates of Deep Neural Network Classifiers

arXiv:2506.14899v1 Announce Type: new
Abstract: In this paper, we study the binary classification problem on $[0,1]^d$ under the Tsybakov noise condition (with exponent $s in [0,infty]$) and the compositional assumption. This assumption requires the conditional class probability function of the data distribution to be the composition of $q+1$ vector-valued multivariate functions, where each component function is either a maximum value function or a H”{o}lder-$beta$ smooth function that depends only on $d_*$ of its input variables. Notably, $d_*$ can be significantly smaller than the input dimension $d$. We prove that, under these conditions, the optimal convergence rate for the excess 0-1 risk of classifiers is $$ left( frac{1}{n} right)^{frac{betacdot(1wedgebeta)^q}{{frac{d_*}{s+1}+(1+frac{1}{s+1})cdotbetacdot(1wedgebeta)^q}}};;;, $$ which is independent of the input dimension $d$. Additionally, we demonstrate that ReLU deep neural networks (DNNs) trained with hinge loss can achieve this optimal convergence rate up to a logarithmic factor. This result provides theoretical justification for the excellent performance of ReLU DNNs in practical classification tasks, particularly in high-dimensional settings. The technique used to establish these results extends the oracle inequality presented in our previous work. The generalized approach is of independent interest.

Zihan Zhang, Lei Shi, Ding-Xuan Zhou

Go to original source