Tag: regression

Nonparametric Distribution Regression Re-calibration

Nonparametric Distribution Regression Re-calibration arXiv:2602.13362v1 Announce Type: new Abstract: A key challenge in probabilistic regression is ensuring that predictive distributions accurately reflect true empirical uncertainty. Minimizing overall prediction error often encourages models to prioritize informativeness over calibration, producing narrow but overconfident predictions. However, in safety-critical settings, trustworthy uncertainty estimates are often more valuable than narrow…

February 17, 2026
Linear Regression with Unknown Truncation Beyond Gaussian Features

Linear Regression with Unknown Truncation Beyond Gaussian Features arXiv:2602.12534v1 Announce Type: new Abstract: In truncated linear regression, samples $(x,y)$ are shown only when the outcome $y$ falls inside a certain survival set $S^star$ and the goal is to estimate the unknown $d$-dimensional regressor $w^star$. This problem has a long history of study in Statistics and…

February 16, 2026
Efficient and Minimax-optimal In-context Nonparametric Regression with Transformers

Efficient and Minimax-optimal In-context Nonparametric Regression with Transformers arXiv:2601.15014v1 Announce Type: new Abstract: We study in-context learning for nonparametric regression with $alpha$-H”older smooth regression functions, for some $alpha>0$. We prove that, with $n$ in-context examples and $d$-dimensional regression covariates, a pretrained transformer with $Theta(log n)$ parameters and $Omegabigl(n^{2alpha/(2alpha+d)}log^3 nbigr)$ pretraining sequences can achieve the minimax-optimal…

January 22, 2026
The Machine Learning “Advent Calendar” Day 14: Softmax Regression in Excel

The Machine Learning “Advent Calendar” Day 14: Softmax Regression in Excel Softmax Regression is simply Logistic Regression extended to multiple classes. By computing one linear score per class and normalizing them with Softmax, we obtain multiclass probabilities without changing the core logic. The loss, the gradients, and the optimization remain the same. Only the number…

December 15, 2025
The Machine Learning “Advent Calendar” Day 13: LASSO and Ridge Regression in Excel

The Machine Learning “Advent Calendar” Day 13: LASSO and Ridge Regression in Excel Ridge and Lasso regression are often perceived as more complex versions of linear regression. In reality, the prediction model remains exactly the same. What changes is the training objective. By adding a penalty on the coefficients, regularization forces the model to choose…

December 14, 2025
The Machine Learning “Advent Calendar” Day 12: Logistic Regression in Excel

The Machine Learning “Advent Calendar” Day 12: Logistic Regression in Excel In this article, we rebuild Logistic Regression step by step directly in Excel. Starting from a binary dataset, we explore why linear regression struggles as a classifier, how the logistic function fixes these issues, and how log-loss naturally appears from the likelihood. With a…

December 13, 2025
The Machine Learning “Advent Calendar” Day 11: Linear Regression in Excel

The Machine Learning “Advent Calendar” Day 11: Linear Regression in Excel Linear Regression looks simple, but it introduces the core ideas of modern machine learning: loss functions, optimization, gradients, scaling, and interpretation. In this article, we rebuild Linear Regression in Excel, compare the closed-form solution with Gradient Descent, and see how the coefficients evolve step…

December 12, 2025
PyTorch Tutorial for Beginners: Build a Multiple Regression Model from Scratch

PyTorch Tutorial for Beginners: Build a Multiple Regression Model from Scratch Hands-on PyTorch: Building a 3-layer neural network for multiple regression The post PyTorch Tutorial for Beginners: Build a Multiple Regression Model from Scratch appeared first on Towards Data Science. Gustavo Santos Go to original source

November 20, 2025
Splat Regression Models

Splat Regression Models arXiv:2511.14042v1 Announce Type: new Abstract: We introduce a highly expressive class of function approximators called Splat Regression Models. Model outputs are mixtures of heterogeneous and anisotropic bump functions, termed splats, each weighted by an output vector. The power of splat modeling lies in its ability to locally adjust the scale and direction…

November 19, 2025
Neural Local Wasserstein Regression

Neural Local Wasserstein Regression arXiv:2511.10824v1 Announce Type: new Abstract: We study the estimation problem of distribution-on-distribution regression, where both predictors and responses are probability measures. Existing approaches typically rely on a global optimal transport map or tangent-space linearization, which can be restrictive in approximation capacity and distort geometry in multivariate underlying domains. In this paper,…

November 17, 2025
How to Decide Between Regression and Time Series Models for “Forecasting”?

How to Decide Between Regression and Time Series Models for “Forecasting”? Hi everyone, I’m trying to understand intuitively when it makes sense to use a time series model like SARIMAX versus a simpler approach like linear regression, especially in cases of weak autocorrelation. For example, in wind power generation forecasting, energy output mainly depends on…

November 10, 2025
Using latent representations to link disjoint longitudinal data for mixed-effects regression

Using latent representations to link disjoint longitudinal data for mixed-effects regression arXiv:2510.25531v1 Announce Type: new Abstract: Many rare diseases offer limited established treatment options, leading patients to switch therapies when new medications emerge. To analyze the impact of such treatment switches within the low sample size limitations of rare disease trials, it is important to…

October 30, 2025
Multiple Linear Regression Explained Simply (Part 1)

Multiple Linear Regression Explained Simply (Part 1) The math behind fitting a plane instead of a line. The post Multiple Linear Regression Explained Simply (Part 1) appeared first on Towards Data Science. Nikhil Dasari Go to original source

October 24, 2025
Calibrated Principal Component Regression

Calibrated Principal Component Regression arXiv:2510.19020v1 Announce Type: new Abstract: We propose a new method for statistical inference in generalized linear models. In the overparameterized regime, Principal Component Regression (PCR) reduces variance by projecting high-dimensional data to a low-dimensional principal subspace before fitting. However, PCR incurs truncation bias whenever the true regression vector has mass outside…

October 23, 2025
Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression

Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression arXiv:2509.22794v1 Announce Type: new Abstract: We study instrumental variable regression (IVaR) under differential privacy constraints. Classical IVaR methods (like two-stage least squares regression) rely on solving moment equations that directly use sensitive covariates and instruments, creating significant risks of privacy leakage and posing challenges in designing…

September 30, 2025
Convex Regression with a Penalty

Convex Regression with a Penalty arXiv:2509.19788v1 Announce Type: new Abstract: A common way to estimate an unknown convex regression function $f_0: Omega subset mathbb{R}^d rightarrow mathbb{R}$ from a set of $n$ noisy observations is to fit a convex function that minimizes the sum of squared errors. However, this estimator is known for its tendency to…

September 25, 2025
Risk Comparisons in Linear Regression: Implicit Regularization Dominates Explicit Regularization

Risk Comparisons in Linear Regression: Implicit Regularization Dominates Explicit Regularization arXiv:2509.17251v1 Announce Type: new Abstract: Existing theory suggests that for linear regression problems categorized by capacity and source conditions, gradient descent (GD) is always minimax optimal, while both ridge regression and online stochastic gradient descent (SGD) are polynomially suboptimal for certain categories of such problems.…

September 23, 2025
Discovering equations from data: symbolic regression in dynamical systems

Discovering equations from data: symbolic regression in dynamical systems arXiv:2508.20257v1 Announce Type: cross Abstract: The process of discovering equations from data lies at the heart of physics and in many other areas of research, including mathematical ecology and epidemiology. Recently, machine learning methods known as symbolic regression have automated this process. As several methods are…

August 29, 2025
Stepwise Selection Made Simple: Improve Your Regression Models in Python

Stepwise Selection Made Simple: Improve Your Regression Models in Python Dimensionality reduction in linear regression: classical stepwise methods and a Python application on real-world data The post Stepwise Selection Made Simple: Improve Your Regression Models in Python appeared first on Towards Data Science. JUNIOR JUMBONG Go to original source

August 29, 2025
Negative binomial regression and inference using a pre-trained transformer

Negative binomial regression and inference using a pre-trained transformer arXiv:2508.04111v1 Announce Type: new Abstract: Negative binomial regression is essential for analyzing over-dispersed count data in in comparative studies, but parameter estimation becomes computationally challenging in large screens requiring millions of comparisons. We investigate using a pre-trained transformer to produce estimates of negative binomial regression parameters…

August 7, 2025
Bayesian symbolic regression: Automated equation discovery from a physicists’ perspective

Bayesian symbolic regression: Automated equation discovery from a physicists’ perspective arXiv:2507.19540v1 Announce Type: new Abstract: Symbolic regression automates the process of learning closed-form mathematical models from data. Standard approaches to symbolic regression, as well as newer deep learning approaches, rely on heuristic model selection criteria, heuristic regularization, and heuristic exploration of model space. Here, we…

July 29, 2025
Meta Optimality for Demographic Parity Constrained Regression via Post-Processing

Meta Optimality for Demographic Parity Constrained Regression via Post-Processing arXiv:2506.13947v1 Announce Type: new Abstract: We address the regression problem under the constraint of demographic parity, a commonly used fairness definition. Recent studies have revealed fair minimax optimal regression algorithms, the most accurate algorithms that adhere to the fairness constraint. However, these analyses are tightly coupled…

June 18, 2025
Exploring the Proportional Odds Model for Ordinal Logistic Regression

Exploring the Proportional Odds Model for Ordinal Logistic Regression Understanding and Implementing Brant’s Tests in Ordinal Logistic Regression with Python The post Exploring the Proportional Odds Model for Ordinal Logistic Regression appeared first on Towards Data Science. JUNIOR JUMBONG Go to original source

June 12, 2025
Multiple Linear Regression Analysis

Multiple Linear Regression Analysis Implementation of multiple linear regression on real data: Assumption checks, model evaluation, and interpretation of results using Python. The post Multiple Linear Regression Analysis appeared first on Towards Data Science. JUNIOR JUMBONG Go to original source

May 23, 2025
Nash: Neural Adaptive Shrinkage for Structured High-Dimensional Regression

Nash: Neural Adaptive Shrinkage for Structured High-Dimensional Regression arXiv:2505.11143v1 Announce Type: new Abstract: Sparse linear regression is a fundamental tool in data analysis. However, traditional approaches often fall short when covariates exhibit structure or arise from heterogeneous sources. In biomedical applications, covariates may stem from distinct modalities or be structured according to an underlying graph.…

May 19, 2025
Risk Bounds For Distributional Regression

Risk Bounds For Distributional Regression arXiv:2505.09075v1 Announce Type: new Abstract: This work examines risk bounds for nonparametric distributional regression estimators. For convex-constrained distributional regression, general upper bounds are established for the continuous ranked probability score (CRPS) and the worst-case mean squared error (MSE) across the domain. These theoretical results are applied to isotonic and trend…

May 15, 2025
Normalizing Flow Regression for Bayesian Inference with Offline Likelihood Evaluations

Normalizing Flow Regression for Bayesian Inference with Offline Likelihood Evaluations arXiv:2504.11554v1 Announce Type: new Abstract: Bayesian inference with computationally expensive likelihood evaluations remains a significant challenge in many scientific domains. We propose normalizing flow regression (NFR), a novel offline inference method for approximating posterior distributions. Unlike traditional surrogate approaches that require additional sampling or inference…

April 17, 2025
When Predictors Collide: Mastering VIF in Multicollinear Regression

When Predictors Collide: Mastering VIF in Multicollinear Regression In regression models, the independent variables must be not or only slightly dependent on each other, i.e. that they are not correlated. However, if such a dependency exists, this is referred to as Multicollinearity and leads to unstable models and results that are difficult to interpret. The…

April 17, 2025
Differentially Private Geodesic and Linear Regression

Differentially Private Geodesic and Linear Regression arXiv:2504.11304v1 Announce Type: new Abstract: In statistical applications it has become increasingly common to encounter data structures that live on non-linear spaces such as manifolds. Classical linear regression, one of the most fundamental methodologies of statistical learning, captures the relationship between an independent variable and a response variable which…

April 16, 2025
On the Robustness of Kernel Ridge Regression Using the Cauchy Loss Function

On the Robustness of Kernel Ridge Regression Using the Cauchy Loss Function arXiv:2503.20120v1 Announce Type: new Abstract: Robust regression aims to develop methods for estimating an unknown regression function in the presence of outliers, heavy-tailed distributions, or contaminated data, which can severely impact performance. Most existing theoretical results in robust regression assume that the noise…

March 27, 2025
Linear Regression in Time Series: Sources of Spurious Regression

Linear Regression in Time Series: Sources of Spurious Regression 1. Introduction It’s pretty clear that most of our work will be automated by AI in the future. This will be possible because many researchers and professionals are working hard to make their work available online. These contributions not only help us understand fundamental concepts but…

March 11, 2025
Task Shift: From Classification to Regression in Overparameterized Linear Models

Task Shift: From Classification to Regression in Overparameterized Linear Models arXiv:2502.13285v1 Announce Type: new Abstract: Modern machine learning methods have recently demonstrated remarkable capability to generalize under task shift, where latent knowledge is transferred to a different, often more difficult, task under a similar data distribution. We investigate this phenomenon in an overparameterized linear regression…

February 20, 2025
Theoretical and Practical Analysis of Fr’echet Regression via Comparison Geometry

Theoretical and Practical Analysis of Fr’echet Regression via Comparison Geometry arXiv:2502.01995v1 Announce Type: new Abstract: Fr’echet regression extends classical regression methods to non-Euclidean metric spaces, enabling the analysis of data relationships on complex structures such as manifolds and graphs. This work establishes a rigorous theoretical analysis for Fr’echet regression through the lens of comparison geometry…

February 5, 2025
How to Use Pre-Trained Language Models for Regression

How to Use Pre-Trained Language Models for Regression Why and how to convert mT5 into a regression metric for numerical prediction Continue reading on Towards Data Science » Aden Haussmann Go to original source

January 19, 2025
Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression

Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression arXiv:2501.04898v1 Announce Type: new Abstract: We provide a convergence analysis of deep feature instrumental variable (DFIV) regression (Xu et al., 2021), a nonparametric approach to IV regression using data-adaptive features learned by deep neural networks in two stages. We prove that the DFIV algorithm…

January 10, 2025
Class-Balance Bias in Regularized Regression

Class-Balance Bias in Regularized Regression arXiv:2501.03821v1 Announce Type: new Abstract: Regularized models are often sensitive to the scales of the features in the data and it has therefore become standard practice to normalize (center and scale) the features before fitting the model. But there are many different ways to normalize the features and the choice…

January 8, 2025
Mastering the Basics: How Linear Regression Unlocks the Secrets of Complex Models

Mastering the Basics: How Linear Regression Unlocks the Secrets of Complex Models Full explanation on Linear Regression and how it learns The Crane Stance. Public Domain image from Openverse Just like Mr. Miyagi taught young Daniel LaRusso karate through repetitive simple chores, which ultimately transformed him into the Karate Kid, mastering foundational algorithms like linear regression…

January 5, 2025
Introduction to the Finite Normal Mixtures in Regression with

Introduction to the Finite Normal Mixtures in Regression with Introduction to the Finite Normal Mixtures in Regression with R How to make linear regression flexible enough for non-linear data The linear regression is usually considered not flexible enough to tackle the nonlinear data. From theoretical viewpoint it is not capable to dealing with them. However, we…

December 28, 2024
Fr’echet regression for multi-label feature selection with implicit regularization

Fr’echet regression for multi-label feature selection with implicit regularization arXiv:2412.18247v1 Announce Type: new Abstract: Fr’echet regression extends linear regression to model complex responses in metric spaces, making it particularly relevant for multi-label regression, where each instance can have multiple associated labels. However, variable selection within this framework remains underexplored. In this paper, we pro pose…

December 25, 2024
Asymptotics of Linear Regression with Linearly Dependent Data

Asymptotics of Linear Regression with Linearly Dependent Data arXiv:2412.03702v1 Announce Type: new Abstract: In this paper we study the asymptotics of linear regression in settings where the covariates exhibit a linear dependency structure, departing from the standard assumption of independence. We model the covariates using stochastic processes with spatio-temporal covariance and analyze the performance of…

December 6, 2024
Intrinsic Wrapped Gaussian Process Regression Modeling for Manifold-valued Response Variable

Intrinsic Wrapped Gaussian Process Regression Modeling for Manifold-valued Response Variable arXiv:2411.18989v1 Announce Type: new Abstract: In this paper, we propose a novel intrinsic wrapped Gaussian process regression model for response variable measured on Riemannian manifold. We apply the parallel transport operator to define an intrinsic covariance structure addressing a critical aspect of constructing a well…

December 2, 2024