Tag: networks

The Inductive Bias of Convolutional Neural Networks: Locality and Weight Sharing Reshape Implicit Regularization

The Inductive Bias of Convolutional Neural Networks: Locality and Weight Sharing Reshape Implicit Regularization arXiv:2603.04807v1 Announce Type: new Abstract: We study how architectural inductive bias reshapes the implicit regularization induced by the edge-of-stability phenomenon in gradient descent. Prior work has established that for fully connected networks, the strength of this regularization is governed solely by…

March 6, 2026
Beyond NNGP: Large Deviations and Feature Learning in Bayesian Neural Networks

Beyond NNGP: Large Deviations and Feature Learning in Bayesian Neural Networks arXiv:2602.22925v1 Announce Type: new Abstract: We study wide Bayesian neural networks focusing on the rare but statistically dominant fluctuations that govern posterior concentration, beyond Gaussian-process limits. Large-deviation theory provides explicit variational objectives-rate functions-on predictors, providing an emerging notion of complexity and feature learning directly…

February 27, 2026
Deep networks learn to parse uniform-depth context-free languages from local statistics

Deep networks learn to parse uniform-depth context-free languages from local statistics arXiv:2602.06065v1 Announce Type: new Abstract: Understanding how the structure of language can be learned from sentences alone is a central question in both cognitive science and machine learning. Studies of the internal representations of Large Language Models (LLMs) support their ability to parse text…

February 9, 2026
Inheritance Between Feedforward and Convolutional Networks via Model Projection

Inheritance Between Feedforward and Convolutional Networks via Model Projection arXiv:2602.06245v1 Announce Type: new Abstract: Techniques for feedforward networks (FFNs) and convolutional networks (CNNs) are frequently reused across families, but the relationship between the underlying model classes is rarely made explicit. We introduce a unified node-level formalization with tensor-valued activations and show that generalized feedforward networks…

February 9, 2026
Singular Bayesian Neural Networks

Singular Bayesian Neural Networks arXiv:2602.00387v1 Announce Type: new Abstract: Bayesian neural networks promise calibrated uncertainty but require $O(mn)$ parameters for standard mean-field Gaussian posteriors. We argue this cost is often unnecessary, particularly when weight matrices exhibit fast singular value decay. By parameterizing weights as $W = AB^{top}$ with $A in mathbb{R}^{m times r}$, $B in…

February 3, 2026
On the Possibility of Small Networks for Physics-Informed Learning

On the Possibility of Small Networks for Physics-Informed Learning A new kind of hyperparameter study The post On the Possibility of Small Networks for Physics-Informed Learning appeared first on Towards Data Science. Conor Rowan Go to original source

January 31, 2026
Structural Dimension Reduction in Bayesian Networks

Structural Dimension Reduction in Bayesian Networks arXiv:2601.08236v1 Announce Type: new Abstract: This work introduces a novel technique, named structural dimension reduction, to collapse a Bayesian network onto a minimum and localized one while ensuring that probabilistic inferences between the original and reduced networks remain consistent. To this end, we propose a new combinatorial structure in…

January 14, 2026
Aligned explanations in neural networks

Aligned explanations in neural networks arXiv:2601.04378v1 Announce Type: cross Abstract: Feature attribution is the dominant paradigm for explaining deep neural networks. However, most existing methods only loosely reflect the model’s prediction-making process, thereby merely white-painting the black box. We argue that explanatory alignment is a key aspect of trustworthiness in prediction tasks: explanations must be…

January 9, 2026
Neural Networks on Symmetric Spaces of Noncompact Type

Neural Networks on Symmetric Spaces of Noncompact Type arXiv:2601.01097v1 Announce Type: new Abstract: Recent works have demonstrated promising performances of neural networks on hyperbolic spaces and symmetric positive definite (SPD) manifolds. These spaces belong to a family of Riemannian manifolds referred to as symmetric spaces of noncompact type. In this paper, we propose a novel…

January 6, 2026
A Teacher-Student Perspective on the Dynamics of Learning Near the Optimal Point

A Teacher-Student Perspective on the Dynamics of Learning Near the Optimal Point arXiv:2512.15606v1 Announce Type: new Abstract: Near an optimal learning point of a neural network, the learning performance of gradient descent dynamics is dictated by the Hessian matrix of the loss function with respect to the network parameters. We characterize the Hessian eigenspectrum for…

December 18, 2025
Variational Estimators for Node Popularity Models

Variational Estimators for Node Popularity Models arXiv:2511.17783v1 Announce Type: new Abstract: Node popularity is recognized as a key factor in modeling real-world networks, capturing heterogeneity in connectivity across communities. This concept is equally important in bipartite networks, where nodes in different partitions may exhibit varying popularity patterns, motivating models such as the Two-Way Node Popularity…

November 25, 2025
Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit

Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit arXiv:2511.15120v1 Announce Type: new Abstract: In deep learning, a central issue is to understand how neural networks efficiently learn high-dimensional features. To this end, we explore the gradient descent learning of a general Gaussian Multi-index model $f(boldsymbol{x})=g(boldsymbol{U}boldsymbol{x})$ with hidden subspace $boldsymbol{U}in mathbb{R}^{rtimes d}$, which is the…

November 20, 2025
Siegel Neural Networks

Siegel Neural Networks arXiv:2511.09577v1 Announce Type: new Abstract: Riemannian symmetric spaces (RSS) such as hyperbolic spaces and symmetric positive definite (SPD) manifolds have become popular spaces for representation learning. In this paper, we propose a novel approach for building discriminative neural networks on Siegel spaces, a family of RSS that is largely unexplored in machine…

November 14, 2025
Accuracy estimation of neural networks by extreme value theory

Accuracy estimation of neural networks by extreme value theory arXiv:2511.00490v1 Announce Type: new Abstract: Neural networks are able to approximate any continuous function on a compact set. However, it is not obvious how to quantify the error of the neural network, i.e., the remaining bias between the function and the neural network. Here, we propose…

November 4, 2025
Distributionally robust approximation property of neural networks

Distributionally robust approximation property of neural networks arXiv:2510.09177v1 Announce Type: new Abstract: The universal approximation property uniformly with respect to weakly compact families of measures is established for several classes of neural networks. To that end, we prove that these neural networks are dense in Orlicz spaces, thereby extending classical universal approximation theorems even beyond…

October 13, 2025
PyTorch Explained: From Automatic Differentiation to Training Custom Neural Networks

PyTorch Explained: From Automatic Differentiation to Training Custom Neural Networks Deep learning is shaping our world as we speak. In fact, it has been slowly revolutionizing software since the early 2010s. In 2025, PyTorch is at the forefront of this revolution, emerging as one of the most important libraries to train neural networks. Whether you…

September 25, 2025
Tree-like Pairwise Interaction Networks

Tree-like Pairwise Interaction Networks arXiv:2508.15678v1 Announce Type: new Abstract: Modeling feature interactions in tabular data remains a key challenge in predictive modeling, for example, as used for insurance pricing. This paper proposes the Tree-like Pairwise Interaction Network (PIN), a novel neural network architecture that explicitly captures pairwise feature interactions through a shared feed-forward neural network…

August 22, 2025
From Sublinear to Linear: Fast Convergence in Deep Networks via Locally Polyak-Lojasiewicz Regions

From Sublinear to Linear: Fast Convergence in Deep Networks via Locally Polyak-Lojasiewicz Regions arXiv:2507.21429v1 Announce Type: new Abstract: The convergence of gradient descent (GD) on the non-convex loss landscapes of deep neural networks (DNNs) presents a fundamental theoretical challenge. While recent work has established that GD converges to a stationary point at a sublinear rate…

July 30, 2025
Accelerating Hamiltonian Monte Carlo for Bayesian Inference in Neural Networks and Neural Operators

Accelerating Hamiltonian Monte Carlo for Bayesian Inference in Neural Networks and Neural Operators arXiv:2507.14652v1 Announce Type: new Abstract: Hamiltonian Monte Carlo (HMC) is a powerful and accurate method to sample from the posterior distribution in Bayesian inference. However, HMC techniques are computationally demanding for Bayesian neural networks due to the high dimensionality of the network’s…

July 22, 2025
Posterior Contraction for Sparse Neural Networks in Besov Spaces with Intrinsic Dimensionality

Posterior Contraction for Sparse Neural Networks in Besov Spaces with Intrinsic Dimensionality arXiv:2506.19144v1 Announce Type: new Abstract: This work establishes that sparse Bayesian neural networks achieve optimal posterior contraction rates over anisotropic Besov spaces and their hierarchical compositions. These structures reflect the intrinsic dimensionality of the underlying function, thereby mitigating the curse of dimensionality. Our…

June 25, 2025
Near-optimal estimates for the $ell^p$-Lipschitz constants of deep random ReLU neural networks

Near-optimal estimates for the $ell^p$-Lipschitz constants of deep random ReLU neural networks arXiv:2506.19695v1 Announce Type: new Abstract: This paper studies the $ell^p$-Lipschitz constants of ReLU neural networks $Phi: mathbb{R}^d to mathbb{R}$ with random parameters for $p in [1,infty]$. The distribution of the weights follows a variant of the He initialization and the biases are drawn…

June 25, 2025
Fast Bayesian Optimization of Function Networks with Partial Evaluations

Fast Bayesian Optimization of Function Networks with Partial Evaluations arXiv:2506.11456v1 Announce Type: new Abstract: Bayesian optimization of function networks (BOFN) is a framework for optimizing expensive-to-evaluate objective functions structured as networks, where some nodes’ outputs serve as inputs for others. Many real-world applications, such as manufacturing and drug discovery, involve function networks with additional properties…

June 16, 2025
Global Minimizers of $ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks

Global Minimizers of $ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks arXiv:2505.21791v1 Announce Type: new Abstract: Overparameterized neural networks can interpolate a given dataset in many different ways, prompting the fundamental question: which among these solutions should we prefer, and what explicit regularization strategies will provably yield these solutions? This paper addresses the challenge of…

May 29, 2025
Humble your Overconfident Networks: Unlearning Overfitting via Sequential Monte Carlo Tempered Deep Ensembles

Humble your Overconfident Networks: Unlearning Overfitting via Sequential Monte Carlo Tempered Deep Ensembles arXiv:2505.11671v1 Announce Type: new Abstract: Sequential Monte Carlo (SMC) methods offer a principled approach to Bayesian uncertainty quantification but are traditionally limited by the need for full-batch gradient evaluations. We introduce a scalable variant by incorporating Stochastic Gradient Hamiltonian Monte Carlo (SGHMC)…

May 20, 2025
LatticeVision: Image to Image Networks for Modeling Non-Stationary Spatial Data

LatticeVision: Image to Image Networks for Modeling Non-Stationary Spatial Data arXiv:2505.09803v1 Announce Type: new Abstract: In many scientific and industrial applications, we are given a handful of instances (a ‘small ensemble’) of a spatially distributed quantity (a ‘field’) but would like to acquire many more. For example, a large ensemble of global temperature sensitivity fields…

May 16, 2025
On the expressivity of deep Heaviside networks

On the expressivity of deep Heaviside networks arXiv:2505.00110v1 Announce Type: new Abstract: We show that deep Heaviside networks (DHNs) have limited expressiveness but that this can be overcome by including either skip connections or neurons with linear activation. We provide lower and upper bounds for the Vapnik-Chervonenkis (VC) dimensions and approximation rates of these network…

May 2, 2025
Privacy-Preserving Transfer Learning for Community Detection using Locally Distributed Multiple Networks

Privacy-Preserving Transfer Learning for Community Detection using Locally Distributed Multiple Networks arXiv:2504.00890v1 Announce Type: new Abstract: This paper develops a new spectral clustering-based method called TransNet for transfer learning in community detection of network data. Our goal is to improve the clustering performance of the target network using auxiliary source networks, which are heterogeneous, privacy-preserved,…

April 2, 2025
Improving Equivariant Networks with Probabilistic Symmetry Breaking

Improving Equivariant Networks with Probabilistic Symmetry Breaking arXiv:2503.21985v1 Announce Type: cross Abstract: Equivariance encodes known symmetries into neural networks, often enhancing generalization. However, equivariant networks cannot break symmetries: the output of an equivariant network must, by definition, have at least the same self-symmetries as the input. This poses an important problem, both (1) for prediction…

March 31, 2025
Interpretability of Graph Neural Networks to Assert Effects of Global Change Drivers on Ecological Networks

Interpretability of Graph Neural Networks to Assert Effects of Global Change Drivers on Ecological Networks arXiv:2503.15107v1 Announce Type: new Abstract: Pollinators play a crucial role for plant reproduction, either in natural ecosystem or in human-modified landscape. Global change drivers,including climate change or land use modifications, can alter the plant-pollinator interactions. To assert the potential influence…

March 20, 2025
Explainable Bayesian deep learning through input-skip Latent Binary Bayesian Neural Networks

Explainable Bayesian deep learning through input-skip Latent Binary Bayesian Neural Networks arXiv:2503.10496v1 Announce Type: new Abstract: Modeling natural phenomena with artificial neural networks (ANNs) often provides highly accurate predictions. However, ANNs often suffer from over-parameterization, complicating interpretation and raising uncertainty issues. Bayesian neural networks (BNNs) address the latter by representing weights as probability distributions, allowing…

March 14, 2025
Exploring specialization and sensitivity of convolutional neural networks in the context of simultaneous image augmentations

Exploring specialization and sensitivity of convolutional neural networks in the context of simultaneous image augmentations arXiv:2503.03283v1 Announce Type: new Abstract: Drawing parallels with the way biological networks are studied, we adapt the treatment–control paradigm to explainable artificial intelligence research and enrich it through multi-parametric input alterations. In this study, we propose a framework for investigating…

March 6, 2025
Learning Dynamics of Deep Linear Networks Beyond the Edge of Stability

Learning Dynamics of Deep Linear Networks Beyond the Edge of Stability arXiv:2502.20531v1 Announce Type: new Abstract: Deep neural networks trained using gradient descent with a fixed learning rate $eta$ often operate in the regime of “edge of stability” (EOS), where the largest eigenvalue of the Hessian equilibrates about the stability threshold $2/eta$. In this work,…

March 3, 2025
Networks with Finite VC Dimension: Pro and Contra

Networks with Finite VC Dimension: Pro and Contra arXiv:2502.02679v1 Announce Type: new Abstract: Approximation and learning of classifiers of large data sets by neural networks in terms of high-dimensional geometry and statistical learning theory are investigated. The influence of the VC dimension of sets of input-output functions of networks on approximation capabilities is compared with…

February 6, 2025
Covariate Dependent Mixture of Bayesian Networks

Covariate Dependent Mixture of Bayesian Networks arXiv:2501.05745v1 Announce Type: new Abstract: Learning the structure of Bayesian networks from data provides insights into underlying processes and the causal relationships that generate the data, but its usefulness depends on the homogeneity of the data population, a condition often violated in real-world applications. In such cases, using a…

January 13, 2025
A Visual Understanding of Neural Networks

A Visual Understanding of Neural Networks The math behind neural networks visually explained Continue reading on Towards Data Science » Reza Bagheri Go to original source

January 12, 2025
Deep Networks are Reproducing Kernel Chains

Deep Networks are Reproducing Kernel Chains arXiv:2501.03697v1 Announce Type: cross Abstract: Identifying an appropriate function space for deep neural networks remains a key open question. While shallow neural networks are naturally associated with Reproducing Kernel Banach Spaces (RKBS), deep networks present unique challenges. In this work, we extend RKBS to chain RKBS (cRKBS), a new…

January 8, 2025
Neural Networks Perform Sufficient Dimension Reduction

Neural Networks Perform Sufficient Dimension Reduction arXiv:2412.19033v1 Announce Type: new Abstract: This paper investigates the connection between neural networks and sufficient dimension reduction (SDR), demonstrating that neural networks inherently perform SDR in regression tasks under appropriate rank regularizations. Specifically, the weights in the first layer span the central mean subspace. We establish the statistical consistency…

December 30, 2024
Representation learning of dynamic networks

Representation learning of dynamic networks arXiv:2412.11065v1 Announce Type: new Abstract: This study presents a novel representation learning model tailored for dynamic networks, which describes the continuously evolving relationships among individuals within a population. The problem is encapsulated in the dimension reduction topic of functional data analysis. With dynamic networks represented as matrix-valued functions, our objective…

December 17, 2024
Learning Networks from Wide-Sense Stationary Stochastic Processes

Learning Networks from Wide-Sense Stationary Stochastic Processes arXiv:2412.03768v1 Announce Type: new Abstract: Complex networked systems driven by latent inputs are common in fields like neuroscience, finance, and engineering. A key inference problem here is to learn edge connectivity from node outputs (potentials). We focus on systems governed by steady-state linear conservation laws: $X_t = {L^{ast}}Y_{t}$,…

December 6, 2024