Tag: networks
-
The Inductive Bias of Convolutional Neural Networks: Locality and Weight Sharing Reshape Implicit Regularization
The Inductive Bias of Convolutional Neural Networks: Locality and Weight Sharing Reshape Implicit Regularization arXiv:2603.04807v1 Announce Type: new Abstract: We study how architectural inductive bias reshapes the implicit regularization induced by the edge-of-stability phenomenon in gradient descent. Prior work has established that for fully connected networks, the strength of this regularization is governed solely by…
-
Beyond NNGP: Large Deviations and Feature Learning in Bayesian Neural Networks
Beyond NNGP: Large Deviations and Feature Learning in Bayesian Neural Networks arXiv:2602.22925v1 Announce Type: new Abstract: We study wide Bayesian neural networks focusing on the rare but statistically dominant fluctuations that govern posterior concentration, beyond Gaussian-process limits. Large-deviation theory provides explicit variational objectives-rate functions-on predictors, providing an emerging notion of complexity and feature learning directly…
-
Deep networks learn to parse uniform-depth context-free languages from local statistics
Deep networks learn to parse uniform-depth context-free languages from local statistics arXiv:2602.06065v1 Announce Type: new Abstract: Understanding how the structure of language can be learned from sentences alone is a central question in both cognitive science and machine learning. Studies of the internal representations of Large Language Models (LLMs) support their ability to parse text…
-
Inheritance Between Feedforward and Convolutional Networks via Model Projection
Inheritance Between Feedforward and Convolutional Networks via Model Projection arXiv:2602.06245v1 Announce Type: new Abstract: Techniques for feedforward networks (FFNs) and convolutional networks (CNNs) are frequently reused across families, but the relationship between the underlying model classes is rarely made explicit. We introduce a unified node-level formalization with tensor-valued activations and show that generalized feedforward networks…
-
Singular Bayesian Neural Networks
Singular Bayesian Neural Networks arXiv:2602.00387v1 Announce Type: new Abstract: Bayesian neural networks promise calibrated uncertainty but require $O(mn)$ parameters for standard mean-field Gaussian posteriors. We argue this cost is often unnecessary, particularly when weight matrices exhibit fast singular value decay. By parameterizing weights as $W = AB^{top}$ with $A in mathbb{R}^{m times r}$, $B in…
-
On the Possibility of Small Networks for Physics-Informed Learning
On the Possibility of Small Networks for Physics-Informed Learning A new kind of hyperparameter study The post On the Possibility of Small Networks for Physics-Informed Learning appeared first on Towards Data Science. Conor Rowan Go to original source
-
Structural Dimension Reduction in Bayesian Networks
Structural Dimension Reduction in Bayesian Networks arXiv:2601.08236v1 Announce Type: new Abstract: This work introduces a novel technique, named structural dimension reduction, to collapse a Bayesian network onto a minimum and localized one while ensuring that probabilistic inferences between the original and reduced networks remain consistent. To this end, we propose a new combinatorial structure in…
-
Aligned explanations in neural networks
Aligned explanations in neural networks arXiv:2601.04378v1 Announce Type: cross Abstract: Feature attribution is the dominant paradigm for explaining deep neural networks. However, most existing methods only loosely reflect the model’s prediction-making process, thereby merely white-painting the black box. We argue that explanatory alignment is a key aspect of trustworthiness in prediction tasks: explanations must be…
-
Neural Networks on Symmetric Spaces of Noncompact Type
Neural Networks on Symmetric Spaces of Noncompact Type arXiv:2601.01097v1 Announce Type: new Abstract: Recent works have demonstrated promising performances of neural networks on hyperbolic spaces and symmetric positive definite (SPD) manifolds. These spaces belong to a family of Riemannian manifolds referred to as symmetric spaces of noncompact type. In this paper, we propose a novel…
-
A Teacher-Student Perspective on the Dynamics of Learning Near the Optimal Point
A Teacher-Student Perspective on the Dynamics of Learning Near the Optimal Point arXiv:2512.15606v1 Announce Type: new Abstract: Near an optimal learning point of a neural network, the learning performance of gradient descent dynamics is dictated by the Hessian matrix of the loss function with respect to the network parameters. We characterize the Hessian eigenspectrum for…
-
Variational Estimators for Node Popularity Models
Variational Estimators for Node Popularity Models arXiv:2511.17783v1 Announce Type: new Abstract: Node popularity is recognized as a key factor in modeling real-world networks, capturing heterogeneity in connectivity across communities. This concept is equally important in bipartite networks, where nodes in different partitions may exhibit varying popularity patterns, motivating models such as the Two-Way Node Popularity…
-
Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit
Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit arXiv:2511.15120v1 Announce Type: new Abstract: In deep learning, a central issue is to understand how neural networks efficiently learn high-dimensional features. To this end, we explore the gradient descent learning of a general Gaussian Multi-index model $f(boldsymbol{x})=g(boldsymbol{U}boldsymbol{x})$ with hidden subspace $boldsymbol{U}in mathbb{R}^{rtimes d}$, which is the…
-
Siegel Neural Networks
Siegel Neural Networks arXiv:2511.09577v1 Announce Type: new Abstract: Riemannian symmetric spaces (RSS) such as hyperbolic spaces and symmetric positive definite (SPD) manifolds have become popular spaces for representation learning. In this paper, we propose a novel approach for building discriminative neural networks on Siegel spaces, a family of RSS that is largely unexplored in machine…
-
Accuracy estimation of neural networks by extreme value theory
Accuracy estimation of neural networks by extreme value theory arXiv:2511.00490v1 Announce Type: new Abstract: Neural networks are able to approximate any continuous function on a compact set. However, it is not obvious how to quantify the error of the neural network, i.e., the remaining bias between the function and the neural network. Here, we propose…
-
Distributionally robust approximation property of neural networks
Distributionally robust approximation property of neural networks arXiv:2510.09177v1 Announce Type: new Abstract: The universal approximation property uniformly with respect to weakly compact families of measures is established for several classes of neural networks. To that end, we prove that these neural networks are dense in Orlicz spaces, thereby extending classical universal approximation theorems even beyond…
-
PyTorch Explained: From Automatic Differentiation to Training Custom Neural Networks
PyTorch Explained: From Automatic Differentiation to Training Custom Neural Networks Deep learning is shaping our world as we speak. In fact, it has been slowly revolutionizing software since the early 2010s. In 2025, PyTorch is at the forefront of this revolution, emerging as one of the most important libraries to train neural networks. Whether you…
-
Tree-like Pairwise Interaction Networks
Tree-like Pairwise Interaction Networks arXiv:2508.15678v1 Announce Type: new Abstract: Modeling feature interactions in tabular data remains a key challenge in predictive modeling, for example, as used for insurance pricing. This paper proposes the Tree-like Pairwise Interaction Network (PIN), a novel neural network architecture that explicitly captures pairwise feature interactions through a shared feed-forward neural network…
-
From Sublinear to Linear: Fast Convergence in Deep Networks via Locally Polyak-Lojasiewicz Regions
From Sublinear to Linear: Fast Convergence in Deep Networks via Locally Polyak-Lojasiewicz Regions arXiv:2507.21429v1 Announce Type: new Abstract: The convergence of gradient descent (GD) on the non-convex loss landscapes of deep neural networks (DNNs) presents a fundamental theoretical challenge. While recent work has established that GD converges to a stationary point at a sublinear rate…
-
Accelerating Hamiltonian Monte Carlo for Bayesian Inference in Neural Networks and Neural Operators
Accelerating Hamiltonian Monte Carlo for Bayesian Inference in Neural Networks and Neural Operators arXiv:2507.14652v1 Announce Type: new Abstract: Hamiltonian Monte Carlo (HMC) is a powerful and accurate method to sample from the posterior distribution in Bayesian inference. However, HMC techniques are computationally demanding for Bayesian neural networks due to the high dimensionality of the network’s…
-
Posterior Contraction for Sparse Neural Networks in Besov Spaces with Intrinsic Dimensionality
Posterior Contraction for Sparse Neural Networks in Besov Spaces with Intrinsic Dimensionality arXiv:2506.19144v1 Announce Type: new Abstract: This work establishes that sparse Bayesian neural networks achieve optimal posterior contraction rates over anisotropic Besov spaces and their hierarchical compositions. These structures reflect the intrinsic dimensionality of the underlying function, thereby mitigating the curse of dimensionality. Our…
-
Near-optimal estimates for the $ell^p$-Lipschitz constants of deep random ReLU neural networks
Near-optimal estimates for the $ell^p$-Lipschitz constants of deep random ReLU neural networks arXiv:2506.19695v1 Announce Type: new Abstract: This paper studies the $ell^p$-Lipschitz constants of ReLU neural networks $Phi: mathbb{R}^d to mathbb{R}$ with random parameters for $p in [1,infty]$. The distribution of the weights follows a variant of the He initialization and the biases are drawn…
-
Fast Bayesian Optimization of Function Networks with Partial Evaluations
Fast Bayesian Optimization of Function Networks with Partial Evaluations arXiv:2506.11456v1 Announce Type: new Abstract: Bayesian optimization of function networks (BOFN) is a framework for optimizing expensive-to-evaluate objective functions structured as networks, where some nodes’ outputs serve as inputs for others. Many real-world applications, such as manufacturing and drug discovery, involve function networks with additional properties…
-
Global Minimizers of $ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks
Global Minimizers of $ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks arXiv:2505.21791v1 Announce Type: new Abstract: Overparameterized neural networks can interpolate a given dataset in many different ways, prompting the fundamental question: which among these solutions should we prefer, and what explicit regularization strategies will provably yield these solutions? This paper addresses the challenge of…
-
Humble your Overconfident Networks: Unlearning Overfitting via Sequential Monte Carlo Tempered Deep Ensembles
Humble your Overconfident Networks: Unlearning Overfitting via Sequential Monte Carlo Tempered Deep Ensembles arXiv:2505.11671v1 Announce Type: new Abstract: Sequential Monte Carlo (SMC) methods offer a principled approach to Bayesian uncertainty quantification but are traditionally limited by the need for full-batch gradient evaluations. We introduce a scalable variant by incorporating Stochastic Gradient Hamiltonian Monte Carlo (SGHMC)…
-
LatticeVision: Image to Image Networks for Modeling Non-Stationary Spatial Data
LatticeVision: Image to Image Networks for Modeling Non-Stationary Spatial Data arXiv:2505.09803v1 Announce Type: new Abstract: In many scientific and industrial applications, we are given a handful of instances (a ‘small ensemble’) of a spatially distributed quantity (a ‘field’) but would like to acquire many more. For example, a large ensemble of global temperature sensitivity fields…
-
On the expressivity of deep Heaviside networks
On the expressivity of deep Heaviside networks arXiv:2505.00110v1 Announce Type: new Abstract: We show that deep Heaviside networks (DHNs) have limited expressiveness but that this can be overcome by including either skip connections or neurons with linear activation. We provide lower and upper bounds for the Vapnik-Chervonenkis (VC) dimensions and approximation rates of these network…
-
Privacy-Preserving Transfer Learning for Community Detection using Locally Distributed Multiple Networks
Privacy-Preserving Transfer Learning for Community Detection using Locally Distributed Multiple Networks arXiv:2504.00890v1 Announce Type: new Abstract: This paper develops a new spectral clustering-based method called TransNet for transfer learning in community detection of network data. Our goal is to improve the clustering performance of the target network using auxiliary source networks, which are heterogeneous, privacy-preserved,…
-
Improving Equivariant Networks with Probabilistic Symmetry Breaking
Improving Equivariant Networks with Probabilistic Symmetry Breaking arXiv:2503.21985v1 Announce Type: cross Abstract: Equivariance encodes known symmetries into neural networks, often enhancing generalization. However, equivariant networks cannot break symmetries: the output of an equivariant network must, by definition, have at least the same self-symmetries as the input. This poses an important problem, both (1) for prediction…
-
Interpretability of Graph Neural Networks to Assert Effects of Global Change Drivers on Ecological Networks
Interpretability of Graph Neural Networks to Assert Effects of Global Change Drivers on Ecological Networks arXiv:2503.15107v1 Announce Type: new Abstract: Pollinators play a crucial role for plant reproduction, either in natural ecosystem or in human-modified landscape. Global change drivers,including climate change or land use modifications, can alter the plant-pollinator interactions. To assert the potential influence…
-
Explainable Bayesian deep learning through input-skip Latent Binary Bayesian Neural Networks
Explainable Bayesian deep learning through input-skip Latent Binary Bayesian Neural Networks arXiv:2503.10496v1 Announce Type: new Abstract: Modeling natural phenomena with artificial neural networks (ANNs) often provides highly accurate predictions. However, ANNs often suffer from over-parameterization, complicating interpretation and raising uncertainty issues. Bayesian neural networks (BNNs) address the latter by representing weights as probability distributions, allowing…
-
Exploring specialization and sensitivity of convolutional neural networks in the context of simultaneous image augmentations
Exploring specialization and sensitivity of convolutional neural networks in the context of simultaneous image augmentations arXiv:2503.03283v1 Announce Type: new Abstract: Drawing parallels with the way biological networks are studied, we adapt the treatment–control paradigm to explainable artificial intelligence research and enrich it through multi-parametric input alterations. In this study, we propose a framework for investigating…
-
Learning Dynamics of Deep Linear Networks Beyond the Edge of Stability
Learning Dynamics of Deep Linear Networks Beyond the Edge of Stability arXiv:2502.20531v1 Announce Type: new Abstract: Deep neural networks trained using gradient descent with a fixed learning rate $eta$ often operate in the regime of “edge of stability” (EOS), where the largest eigenvalue of the Hessian equilibrates about the stability threshold $2/eta$. In this work,…
-
Networks with Finite VC Dimension: Pro and Contra
Networks with Finite VC Dimension: Pro and Contra arXiv:2502.02679v1 Announce Type: new Abstract: Approximation and learning of classifiers of large data sets by neural networks in terms of high-dimensional geometry and statistical learning theory are investigated. The influence of the VC dimension of sets of input-output functions of networks on approximation capabilities is compared with…
-
Covariate Dependent Mixture of Bayesian Networks
Covariate Dependent Mixture of Bayesian Networks arXiv:2501.05745v1 Announce Type: new Abstract: Learning the structure of Bayesian networks from data provides insights into underlying processes and the causal relationships that generate the data, but its usefulness depends on the homogeneity of the data population, a condition often violated in real-world applications. In such cases, using a…
-
A Visual Understanding of Neural Networks
A Visual Understanding of Neural Networks The math behind neural networks visually explained Continue reading on Towards Data Science ยป Reza Bagheri Go to original source
-
Deep Networks are Reproducing Kernel Chains
Deep Networks are Reproducing Kernel Chains arXiv:2501.03697v1 Announce Type: cross Abstract: Identifying an appropriate function space for deep neural networks remains a key open question. While shallow neural networks are naturally associated with Reproducing Kernel Banach Spaces (RKBS), deep networks present unique challenges. In this work, we extend RKBS to chain RKBS (cRKBS), a new…
-
Neural Networks Perform Sufficient Dimension Reduction
Neural Networks Perform Sufficient Dimension Reduction arXiv:2412.19033v1 Announce Type: new Abstract: This paper investigates the connection between neural networks and sufficient dimension reduction (SDR), demonstrating that neural networks inherently perform SDR in regression tasks under appropriate rank regularizations. Specifically, the weights in the first layer span the central mean subspace. We establish the statistical consistency…
-
Representation learning of dynamic networks
Representation learning of dynamic networks arXiv:2412.11065v1 Announce Type: new Abstract: This study presents a novel representation learning model tailored for dynamic networks, which describes the continuously evolving relationships among individuals within a population. The problem is encapsulated in the dimension reduction topic of functional data analysis. With dynamic networks represented as matrix-valued functions, our objective…
-
Learning Networks from Wide-Sense Stationary Stochastic Processes
Learning Networks from Wide-Sense Stationary Stochastic Processes arXiv:2412.03768v1 Announce Type: new Abstract: Complex networked systems driven by latent inputs are common in fields like neuroscience, finance, and engineering. A key inference problem here is to learn edge connectivity from node outputs (potentials). We focus on systems governed by steady-state linear conservation laws: $X_t = {L^{ast}}Y_{t}$,…