Tag: clustering

Learning Order Forest for Qualitative-Attribute Data Clustering

Learning Order Forest for Qualitative-Attribute Data Clustering arXiv:2603.03387v1 Announce Type: new Abstract: Clustering is a fundamental approach to understanding data patterns, wherein the intuitive Euclidean distance space is commonly adopted. However, this is not the case for implicit cluster distributions reflected by qualitative attribute values, e.g., the nominal values of attributes like symptoms, marital status,…

March 5, 2026
Empirical Cumulative Distribution Function Clustering for LLM-based Agent System Analysis

Empirical Cumulative Distribution Function Clustering for LLM-based Agent System Analysis arXiv:2602.16131v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used as agents to solve complex tasks such as question answering (QA), scientific debate, and software development. A standard evaluation procedure aggregates multiple responses from LLM agents into a single final answer, often via…

February 19, 2026
Efficient Clustering in Stochastic Bandits

Efficient Clustering in Stochastic Bandits arXiv:2601.09162v1 Announce Type: cross Abstract: We study the Bandit Clustering (BC) problem under the fixed confidence setting, where the objective is to group a collection of data sequences (arms) into clusters through sequential sampling from adaptively selected arms at each time step while ensuring a fixed error probability at the…

January 15, 2026
Clustering Approaches for Mixed-Type Data: A Comparative Study

Clustering Approaches for Mixed-Type Data: A Comparative Study arXiv:2511.19755v1 Announce Type: new Abstract: Clustering is widely used in unsupervised learning to find homogeneous groups of observations within a dataset. However, clustering mixed-type data remains a challenge, as few existing approaches are suited for this task. This study presents the state-of-the-art of these approaches and compares…

November 26, 2025
Convex Clustering Redefined: Robust Learning with the Median of Means Estimator

Convex Clustering Redefined: Robust Learning with the Median of Means Estimator arXiv:2511.14784v1 Announce Type: new Abstract: Clustering approaches that utilize convex loss functions have recently attracted growing interest in the formation of compact data clusters. Although classical methods like k-means and its wide family of variants are still widely used, all of them require the…

November 20, 2025
Benchmarking of Clustering Validity Measures Revisited

Benchmarking of Clustering Validity Measures Revisited arXiv:2511.05983v1 Announce Type: new Abstract: Validation plays a crucial role in the clustering process. Many different internal validity indexes exist for the purpose of determining the best clustering solution(s) from a given collection of candidates, e.g., as produced by different algorithms or different algorithm hyper-parameters. In this study, we…

November 11, 2025
A New Framework for Convex Clustering in Kernel Spaces: Finite Sample Bounds, Consistency and Performance Insights

A New Framework for Convex Clustering in Kernel Spaces: Finite Sample Bounds, Consistency and Performance Insights arXiv:2511.05159v1 Announce Type: new Abstract: Convex clustering is a well-regarded clustering method, resembling the similar centroid-based approach of Lloyd’s $k$-means, without requiring a predefined cluster count. It starts with each data point as its centroid and iteratively merges them.…

November 10, 2025
Topology of Currencies: Persistent Homology for FX Co-movements: A Comparative Clustering Study

Topology of Currencies: Persistent Homology for FX Co-movements: A Comparative Clustering Study arXiv:2510.19306v1 Announce Type: new Abstract: This study investigates whether Topological Data Analysis (TDA) can provide additional insights beyond traditional statistical methods in clustering currency behaviours. We focus on the foreign exchange (FX) market, which is a complex system often exhibiting non-linear and high-dimensional…

October 23, 2025
Reliable data clustering with Bayesian community detection

Reliable data clustering with Bayesian community detection arXiv:2510.15013v1 Announce Type: new Abstract: From neuroscience and genomics to systems biology and ecology, researchers rely on clustering similarity data to uncover modular structure. Yet widely used clustering methods, such as hierarchical clustering, k-means, and WGCNA, lack principled model selection, leaving them susceptible to noise. A common workaround…

October 20, 2025
Causal Clustering for Conditional Average Treatment Effects Estimation and Subgroup Discovery

Causal Clustering for Conditional Average Treatment Effects Estimation and Subgroup Discovery arXiv:2509.05775v1 Announce Type: new Abstract: Estimating heterogeneous treatment effects is critical in domains such as personalized medicine, resource allocation, and policy evaluation. A central challenge lies in identifying subpopulations that respond differently to interventions, thereby enabling more targeted and effective decision-making. While clustering methods…

September 9, 2025
Stellar Flare Detection and Prediction Using Clustering and Machine Learning

Stellar Flare Detection and Prediction Using Clustering and Machine Learning Combining unsupervised clustering with supervised learning to detect and predict stellar flares The post Stellar Flare Detection and Prediction Using Clustering and Machine Learning appeared first on Towards Data Science. Diksha Sen Chaudhury Go to original source

August 6, 2025
funOCLUST: Clustering Functional Data with Outliers

funOCLUST: Clustering Functional Data with Outliers arXiv:2508.00110v1 Announce Type: new Abstract: Functional data present unique challenges for clustering due to their infinite-dimensional nature and potential sensitivity to outliers. An extension of the OCLUST algorithm to the functional setting is proposed to address these issues. The approach leverages the OCLUST framework, creating a robust method to…

August 4, 2025
Perfect Clustering in Very Sparse Diverse Multiplex Networks

Perfect Clustering in Very Sparse Diverse Multiplex Networks arXiv:2507.19423v1 Announce Type: new Abstract: The paper studies the DIverse MultiPLEx Signed Generalized Random Dot Product Graph (DIMPLE-SGRDPG) network model (Pensky (2024)), where all layers of the network have the same collection of nodes. In addition, all layers can be partitioned into groups such that the layers…

July 28, 2025
Robust Multi-Manifold Clustering via Simplex Paths

Robust Multi-Manifold Clustering via Simplex Paths arXiv:2507.10710v1 Announce Type: new Abstract: This article introduces a novel, geometric approach for multi-manifold clustering (MMC), i.e. for clustering a collection of potentially intersecting, d-dimensional manifolds into the individual manifold components. We first compute a locality graph on d-simplices, using the dihedral angle in between adjacent simplices as the…

July 16, 2025
GOLFS: Feature Selection via Combining Both Global and Local Information for High Dimensional Clustering

GOLFS: Feature Selection via Combining Both Global and Local Information for High Dimensional Clustering arXiv:2507.10956v1 Announce Type: new Abstract: It is important to identify the discriminative features for high dimensional clustering. However, due to the lack of cluster labels, the regularization methods developed for supervised feature selection can not be directly applied. To learn the…

July 16, 2025
LILI clustering algorithm: Limit Inferior Leaf Interval Integrated into Causal Forest for Causal Interference

LILI clustering algorithm: Limit Inferior Leaf Interval Integrated into Causal Forest for Causal Interference arXiv:2507.03271v1 Announce Type: new Abstract: Causal forest methods are powerful tools in causal inference. Similar to traditional random forest in machine learning, causal forest independently considers each causal tree. However, this independence consideration increases the likelihood that classification errors in one…

July 8, 2025
A Tutorial on Discriminative Clustering and Mutual Information

A Tutorial on Discriminative Clustering and Mutual Information arXiv:2505.04484v1 Announce Type: new Abstract: To cluster data is to separate samples into distinctive groups that should ideally have some cohesive properties. Today, numerous clustering algorithms exist, and their differences lie essentially in what can be perceived as “cohesive properties”. Therefore, hypotheses on the nature of clusters…

May 8, 2025
Statistical Inference for Clustering-based Anomaly Detection

Statistical Inference for Clustering-based Anomaly Detection arXiv:2504.18633v1 Announce Type: new Abstract: Unsupervised anomaly detection (AD) is a fundamental problem in machine learning and statistics. A popular approach to unsupervised AD is clustering-based detection. However, this method lacks the ability to guarantee the reliability of the detected anomalies. In this paper, we propose SI-CLAD (Statistical Inference…

April 29, 2025
Hierarchical clustering with maximum density paths and mixture models

Hierarchical clustering with maximum density paths and mixture models arXiv:2503.15582v1 Announce Type: new Abstract: Hierarchical clustering is an effective and interpretable technique for analyzing structure in data, offering a nuanced understanding by revealing insights at multiple scales and resolutions. It is particularly helpful in settings where the exact number of clusters is unknown, and provides…

March 21, 2025
Clustering Items through Bandit Feedback: Finding the Right Feature out of Many

Clustering Items through Bandit Feedback: Finding the Right Feature out of Many arXiv:2503.11209v1 Announce Type: new Abstract: We study the problem of clustering a set of items based on bandit feedback. Each of the $n$ items is characterized by a feature vector, with a possibly large dimension $d$. The items are partitioned into two unknown…

March 17, 2025
Deep Matrix Factorization with Adaptive Weights for Multi-View Clustering

Deep Matrix Factorization with Adaptive Weights for Multi-View Clustering arXiv:2412.02292v1 Announce Type: new Abstract: Recently, deep matrix factorization has been established as a powerful model for unsupervised tasks, achieving promising results, especially for multi-view clustering. However, existing methods often lack effective feature selection mechanisms and rely on empirical hyperparameter selection. To address these issues, we…

December 4, 2024
3D Clustering with Graph Theory: The Complete Guide

3D Clustering with Graph Theory: The Complete Guide Python Tutorial for Euclidean Clustering of 3D Point Clouds with Graph Theory. Fundamental concepts and sequential workflow for… Continue reading on Towards Data Science » Florent Poux, Ph.D. Go to original source

December 3, 2024
Graph Max Shift: A Hill-Climbing Method for Graph Clustering

Graph Max Shift: A Hill-Climbing Method for Graph Clustering arXiv:2411.18794v1 Announce Type: new Abstract: We present a method for graph clustering that is analogous with gradient ascent methods previously proposed for clustering points in space. We show that, when applied to a random geometric graph with data iid from some density with Morse regularity, the…

December 2, 2024