Tag: methods

Time-uniform conformal and PAC prediction

Time-uniform conformal and PAC prediction arXiv:2602.06297v1 Announce Type: new Abstract: Given that machine learning algorithms are increasingly being deployed to aid in high stakes decision-making, uncertainty quantification methods that wrap around these black box models such as conformal prediction have received much attention in recent years. In sequential settings, where data are observed/generated in a…

February 9, 2026
Boosting methods for interval-censored data with regression and classification

Boosting methods for interval-censored data with regression and classification arXiv:2601.17973v1 Announce Type: new Abstract: Boosting has garnered significant interest across both machine learning and statistical communities. Traditional boosting algorithms, designed for fully observed random samples, often struggle with real-world problems, particularly with interval-censored data. This type of data is common in survival analysis and time-to-event…

January 27, 2026
Clustering Approaches for Mixed-Type Data: A Comparative Study

Clustering Approaches for Mixed-Type Data: A Comparative Study arXiv:2511.19755v1 Announce Type: new Abstract: Clustering is widely used in unsupervised learning to find homogeneous groups of observations within a dataset. However, clustering mixed-type data remains a challenge, as few existing approaches are suited for this task. This study presents the state-of-the-art of these approaches and compares…

November 26, 2025
Certainty in Uncertainty: Reasoning over Uncertain Knowledge Graphs with Statistical Guarantees

Certainty in Uncertainty: Reasoning over Uncertain Knowledge Graphs with Statistical Guarantees arXiv:2510.24754v1 Announce Type: new Abstract: Uncertain knowledge graph embedding (UnKGE) methods learn vector representations that capture both structural and uncertainty information to predict scores of unseen triples. However, existing methods produce only point estimates, without quantifying predictive uncertainty-limiting their reliability in high-stakes applications where…

October 30, 2025
Fast kernel methods: Sobolev, physics-informed, and additive models

Fast kernel methods: Sobolev, physics-informed, and additive models arXiv:2509.02649v1 Announce Type: new Abstract: Kernel methods are powerful tools in statistical learning, but their cubic complexity in the sample size n limits their use on large-scale datasets. In this work, we introduce a scalable framework for kernel regression with O(n log n) complexity, fully leveraging GPU…

September 4, 2025
Comparing Model-agnostic Feature Selection Methods through Relative Efficiency

Comparing Model-agnostic Feature Selection Methods through Relative Efficiency arXiv:2508.14268v1 Announce Type: new Abstract: Feature selection and importance estimation in a model-agnostic setting is an ongoing challenge of significant interest. Wrapper methods are commonly used because they are typically model-agnostic, even though they are computationally intensive. In this paper, we focus on feature selection methods related…

August 21, 2025
A Survey of Dimension Estimation Methods

A Survey of Dimension Estimation Methods arXiv:2507.13887v1 Announce Type: new Abstract: It is a standard assumption that datasets in high dimension have an internal structure which means that they in fact lie on, or near, subsets of a lower dimension. In many instances it is important to understand the real dimension of the data, hence…

July 21, 2025
Random feature approximation for general spectral methods

Random feature approximation for general spectral methods arXiv:2506.16283v1 Announce Type: new Abstract: Random feature approximation is arguably one of the most widely used techniques for kernel methods in large-scale learning algorithms. In this work, we analyze the generalization properties of random feature methods, extending previous results for Tikhonov regularization to a broad class of spectral…

June 23, 2025
A Practical Starters’ Guide to Causal Structure Learning with Bayesian Methods in Python

A Practical Starters’ Guide to Causal Structure Learning with Bayesian Methods in Python Learn Causal Structures and make inferences with Bayesian Methods: Python Tutorial The post A Practical Starters’ Guide to Causal Structure Learning with Bayesian Methods in Python appeared first on Towards Data Science. Erdogan Taskesen Go to original source

June 17, 2025
Data Balancing Strategies: A Survey of Resampling and Augmentation Methods

Data Balancing Strategies: A Survey of Resampling and Augmentation Methods arXiv:2505.13518v1 Announce Type: new Abstract: Imbalanced data poses a significant obstacle in machine learning, as an unequal distribution of class labels often results in skewed predictions and diminished model accuracy. To mitigate this problem, various resampling strategies have been developed, encompassing both oversampling and undersampling…

May 21, 2025
Categorical and geometric methods in statistical, manifold, and machine learning

Categorical and geometric methods in statistical, manifold, and machine learning arXiv:2505.03862v1 Announce Type: new Abstract: We present and discuss applications of the category of probabilistic morphisms, initially developed in cite{Le2023}, as well as some geometric methods to several classes of problems in statistical, machine and manifold learning which shall be, along with many other topics,…

May 8, 2025
Are You Sure Your Posterior Makes Sense?

Are You Sure Your Posterior Makes Sense? This article is co-authored by Felipe Bandeira, Giselle Fretta, Thu Than, and Elbion Redenica. We also thank Prof. Carl Scheffler for his support. Introduction Parameter estimation has been for decades one of the most important topics in statistics. While frequentist approaches, such as Maximum Likelihood Estimations, used to…

April 12, 2025
Modifying Final Splits of Classification Tree for Fine-tuning Subpopulation Target in Policy Making

Modifying Final Splits of Classification Tree for Fine-tuning Subpopulation Target in Policy Making arXiv:2502.15072v1 Announce Type: new Abstract: Policymakers often use Classification and Regression Trees (CART) to partition populations based on binary outcomes and target subpopulations whose probability of the binary event exceeds a threshold. However, classic CART and knowledge distillation method whose student model…

February 24, 2025
Unraveling Spatially Variable Genes: A Statistical Perspective on Spatial Transcriptomics

Unraveling Spatially Variable Genes: A Statistical Perspective on Spatial Transcriptomics [ The article was written by Guanao Yan, Ph.D. student of Statistics and Data Science at UCLA. Guanao is the first author of the Nature Communications review article [1]. Spatially resolved transcriptomics (SRT) is revolutionizing Genomics by enabling the high-throughput measurement of gene expression while…

February 21, 2025
Online Covariance Matrix Estimation in Sketched Newton Methods

Online Covariance Matrix Estimation in Sketched Newton Methods arXiv:2502.07114v1 Announce Type: new Abstract: Given the ubiquity of streaming data, online algorithms have been widely used for parameter estimation, with second-order methods particularly standing out for their efficiency and robustness. In this paper, we study an online sketched Newton method that leverages a randomized sketching technique…

February 12, 2025
Online Covariance Estimation in Nonsmooth Stochastic Approximation

Online Covariance Estimation in Nonsmooth Stochastic Approximation arXiv:2502.05305v1 Announce Type: new Abstract: We consider applying stochastic approximation (SA) methods to solve nonsmooth variational inclusion problems. Existing studies have shown that the averaged iterates of SA methods exhibit asymptotic normality, with an optimal limiting covariance matrix in the local minimax sense of H’ajek and Le Cam.…

February 11, 2025
Different thresholding methods on Nearest Shrunken Centroid algorithm

Different thresholding methods on Nearest Shrunken Centroid algorithm arXiv:2501.00632v1 Announce Type: new Abstract: This article considers the impact of different thresholding methods to the Nearest Shrunken Centroid algorithm, which is popularly referred as the Prediction Analysis of Microarrays (PAM) for high-dimensional classification. PAM uses soft thresholding to achieve high computational efficiency and high classification accuracy…

January 3, 2025
The Return of Pseudosciences in Artificial Intelligence: Have Machine Learning and Deep Learning Forgotten Lessons from Statistics and History?

The Return of Pseudosciences in Artificial Intelligence: Have Machine Learning and Deep Learning Forgotten Lessons from Statistics and History? arXiv:2411.18656v1 Announce Type: new Abstract: In today’s world, AI programs powered by Machine Learning are ubiquitous, and have achieved seemingly exceptional performance across a broad range of tasks, from medical diagnosis and credit rating in banking,…

December 2, 2024
Model Validation Techniques, Explained: A Visual Guide with Code Examples

Model Validation Techniques, Explained: A Visual Guide with Code Examples MODEL EVALUATION & OPTIMIZATION 12 must-know methods to validate your machine learning Every day, machines make millions of predictions — from detecting objects in photos to helping doctors find diseases. But before trusting these predictions, we need to know if they’re any good. After all, no one would…

December 1, 2024
Dunder Methods: The Hidden Gems of Python

Dunder Methods: The Hidden Gems of Python Real-world examples on how actively using special methods can simplify coding and improve readability. Dunder methods, though possibly a basic topic in Python, are something I have often noticed being understood only superficially, even by people who have been coding for quite some time. Disclaimer: This is a forgivable…

December 1, 2024