Tag: clustering
-
Learning Order Forest for Qualitative-Attribute Data Clustering
Learning Order Forest for Qualitative-Attribute Data Clustering arXiv:2603.03387v1 Announce Type: new Abstract: Clustering is a fundamental approach to understanding data patterns, wherein the intuitive Euclidean distance space is commonly adopted. However, this is not the case for implicit cluster distributions reflected by qualitative attribute values, e.g., the nominal values of attributes like symptoms, marital status,…
-
Empirical Cumulative Distribution Function Clustering for LLM-based Agent System Analysis
Empirical Cumulative Distribution Function Clustering for LLM-based Agent System Analysis arXiv:2602.16131v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used as agents to solve complex tasks such as question answering (QA), scientific debate, and software development. A standard evaluation procedure aggregates multiple responses from LLM agents into a single final answer, often via…
-
Efficient Clustering in Stochastic Bandits
Efficient Clustering in Stochastic Bandits arXiv:2601.09162v1 Announce Type: cross Abstract: We study the Bandit Clustering (BC) problem under the fixed confidence setting, where the objective is to group a collection of data sequences (arms) into clusters through sequential sampling from adaptively selected arms at each time step while ensuring a fixed error probability at the…
-
Clustering Approaches for Mixed-Type Data: A Comparative Study
Clustering Approaches for Mixed-Type Data: A Comparative Study arXiv:2511.19755v1 Announce Type: new Abstract: Clustering is widely used in unsupervised learning to find homogeneous groups of observations within a dataset. However, clustering mixed-type data remains a challenge, as few existing approaches are suited for this task. This study presents the state-of-the-art of these approaches and compares…
-
Convex Clustering Redefined: Robust Learning with the Median of Means Estimator
Convex Clustering Redefined: Robust Learning with the Median of Means Estimator arXiv:2511.14784v1 Announce Type: new Abstract: Clustering approaches that utilize convex loss functions have recently attracted growing interest in the formation of compact data clusters. Although classical methods like k-means and its wide family of variants are still widely used, all of them require the…
-
Benchmarking of Clustering Validity Measures Revisited
Benchmarking of Clustering Validity Measures Revisited arXiv:2511.05983v1 Announce Type: new Abstract: Validation plays a crucial role in the clustering process. Many different internal validity indexes exist for the purpose of determining the best clustering solution(s) from a given collection of candidates, e.g., as produced by different algorithms or different algorithm hyper-parameters. In this study, we…
-
A New Framework for Convex Clustering in Kernel Spaces: Finite Sample Bounds, Consistency and Performance Insights
A New Framework for Convex Clustering in Kernel Spaces: Finite Sample Bounds, Consistency and Performance Insights arXiv:2511.05159v1 Announce Type: new Abstract: Convex clustering is a well-regarded clustering method, resembling the similar centroid-based approach of Lloyd’s $k$-means, without requiring a predefined cluster count. It starts with each data point as its centroid and iteratively merges them.…
-
Topology of Currencies: Persistent Homology for FX Co-movements: A Comparative Clustering Study
Topology of Currencies: Persistent Homology for FX Co-movements: A Comparative Clustering Study arXiv:2510.19306v1 Announce Type: new Abstract: This study investigates whether Topological Data Analysis (TDA) can provide additional insights beyond traditional statistical methods in clustering currency behaviours. We focus on the foreign exchange (FX) market, which is a complex system often exhibiting non-linear and high-dimensional…
-
Reliable data clustering with Bayesian community detection
Reliable data clustering with Bayesian community detection arXiv:2510.15013v1 Announce Type: new Abstract: From neuroscience and genomics to systems biology and ecology, researchers rely on clustering similarity data to uncover modular structure. Yet widely used clustering methods, such as hierarchical clustering, k-means, and WGCNA, lack principled model selection, leaving them susceptible to noise. A common workaround…
-
Causal Clustering for Conditional Average Treatment Effects Estimation and Subgroup Discovery
Causal Clustering for Conditional Average Treatment Effects Estimation and Subgroup Discovery arXiv:2509.05775v1 Announce Type: new Abstract: Estimating heterogeneous treatment effects is critical in domains such as personalized medicine, resource allocation, and policy evaluation. A central challenge lies in identifying subpopulations that respond differently to interventions, thereby enabling more targeted and effective decision-making. While clustering methods…
-
Stellar Flare Detection and Prediction Using Clustering and Machine Learning
Stellar Flare Detection and Prediction Using Clustering and Machine Learning Combining unsupervised clustering with supervised learning to detect and predict stellar flares The post Stellar Flare Detection and Prediction Using Clustering and Machine Learning appeared first on Towards Data Science. Diksha Sen Chaudhury Go to original source
-
funOCLUST: Clustering Functional Data with Outliers
funOCLUST: Clustering Functional Data with Outliers arXiv:2508.00110v1 Announce Type: new Abstract: Functional data present unique challenges for clustering due to their infinite-dimensional nature and potential sensitivity to outliers. An extension of the OCLUST algorithm to the functional setting is proposed to address these issues. The approach leverages the OCLUST framework, creating a robust method to…
-
Perfect Clustering in Very Sparse Diverse Multiplex Networks
Perfect Clustering in Very Sparse Diverse Multiplex Networks arXiv:2507.19423v1 Announce Type: new Abstract: The paper studies the DIverse MultiPLEx Signed Generalized Random Dot Product Graph (DIMPLE-SGRDPG) network model (Pensky (2024)), where all layers of the network have the same collection of nodes. In addition, all layers can be partitioned into groups such that the layers…
-
Robust Multi-Manifold Clustering via Simplex Paths
Robust Multi-Manifold Clustering via Simplex Paths arXiv:2507.10710v1 Announce Type: new Abstract: This article introduces a novel, geometric approach for multi-manifold clustering (MMC), i.e. for clustering a collection of potentially intersecting, d-dimensional manifolds into the individual manifold components. We first compute a locality graph on d-simplices, using the dihedral angle in between adjacent simplices as the…
-
GOLFS: Feature Selection via Combining Both Global and Local Information for High Dimensional Clustering
GOLFS: Feature Selection via Combining Both Global and Local Information for High Dimensional Clustering arXiv:2507.10956v1 Announce Type: new Abstract: It is important to identify the discriminative features for high dimensional clustering. However, due to the lack of cluster labels, the regularization methods developed for supervised feature selection can not be directly applied. To learn the…
-
LILI clustering algorithm: Limit Inferior Leaf Interval Integrated into Causal Forest for Causal Interference
LILI clustering algorithm: Limit Inferior Leaf Interval Integrated into Causal Forest for Causal Interference arXiv:2507.03271v1 Announce Type: new Abstract: Causal forest methods are powerful tools in causal inference. Similar to traditional random forest in machine learning, causal forest independently considers each causal tree. However, this independence consideration increases the likelihood that classification errors in one…
-
A Tutorial on Discriminative Clustering and Mutual Information
A Tutorial on Discriminative Clustering and Mutual Information arXiv:2505.04484v1 Announce Type: new Abstract: To cluster data is to separate samples into distinctive groups that should ideally have some cohesive properties. Today, numerous clustering algorithms exist, and their differences lie essentially in what can be perceived as “cohesive properties”. Therefore, hypotheses on the nature of clusters…
-
Statistical Inference for Clustering-based Anomaly Detection
Statistical Inference for Clustering-based Anomaly Detection arXiv:2504.18633v1 Announce Type: new Abstract: Unsupervised anomaly detection (AD) is a fundamental problem in machine learning and statistics. A popular approach to unsupervised AD is clustering-based detection. However, this method lacks the ability to guarantee the reliability of the detected anomalies. In this paper, we propose SI-CLAD (Statistical Inference…
-
Hierarchical clustering with maximum density paths and mixture models
Hierarchical clustering with maximum density paths and mixture models arXiv:2503.15582v1 Announce Type: new Abstract: Hierarchical clustering is an effective and interpretable technique for analyzing structure in data, offering a nuanced understanding by revealing insights at multiple scales and resolutions. It is particularly helpful in settings where the exact number of clusters is unknown, and provides…
-
Clustering Items through Bandit Feedback: Finding the Right Feature out of Many
Clustering Items through Bandit Feedback: Finding the Right Feature out of Many arXiv:2503.11209v1 Announce Type: new Abstract: We study the problem of clustering a set of items based on bandit feedback. Each of the $n$ items is characterized by a feature vector, with a possibly large dimension $d$. The items are partitioned into two unknown…
-
Deep Matrix Factorization with Adaptive Weights for Multi-View Clustering
Deep Matrix Factorization with Adaptive Weights for Multi-View Clustering arXiv:2412.02292v1 Announce Type: new Abstract: Recently, deep matrix factorization has been established as a powerful model for unsupervised tasks, achieving promising results, especially for multi-view clustering. However, existing methods often lack effective feature selection mechanisms and rely on empirical hyperparameter selection. To address these issues, we…
-
3D Clustering with Graph Theory: The Complete Guide
3D Clustering with Graph Theory: The Complete Guide Python Tutorial for Euclidean Clustering of 3D Point Clouds with Graph Theory. Fundamental concepts and sequential workflow for… Continue reading on Towards Data Science » Florent Poux, Ph.D. Go to original source
-
Graph Max Shift: A Hill-Climbing Method for Graph Clustering
Graph Max Shift: A Hill-Climbing Method for Graph Clustering arXiv:2411.18794v1 Announce Type: new Abstract: We present a method for graph clustering that is analogous with gradient ascent methods previously proposed for clustering points in space. We show that, when applied to a random geometric graph with data iid from some density with Morse regularity, the…