Tag: based
-
Dictionary Based Pattern Entropy for Causal Direction Discovery
Dictionary Based Pattern Entropy for Causal Direction Discovery arXiv:2603.04473v1 Announce Type: new Abstract: Discovering causal direction from temporal observational data is particularly challenging for symbolic sequences, where functional models and noise assumptions are often unavailable. We propose a novel emph{Dictionary Based Pattern Entropy ($DPE$)} framework that infers both the direction of causation and the specific…
-
Initialization-Aware Score-Based Diffusion Sampling
Initialization-Aware Score-Based Diffusion Sampling arXiv:2603.00772v1 Announce Type: new Abstract: Score-based generative models (SGMs) aim at generating samples from a target distribution by approximating the reverse-time dynamics of a stochastic differential equation. Despite their strong empirical performance, classical samplers initialized from a Gaussian distribution require a long time horizon noising typically inducing a large number of…
-
Amortised and provably-robust simulation-based inference
Amortised and provably-robust simulation-based inference arXiv:2602.11325v1 Announce Type: new Abstract: Complex simulator-based models are now routinely used to perform inference across the sciences and engineering, but existing inference methods are often unable to account for outliers and other extreme values in data which occur due to faulty measurement instruments or human error. In this paper,…
-
A Diffusive Classification Loss for Learning Energy-based Generative Models
A Diffusive Classification Loss for Learning Energy-based Generative Models arXiv:2601.21025v1 Announce Type: new Abstract: Score-based generative models have recently achieved remarkable success. While they are usually parameterized by the score, an alternative way is to use a series of time-dependent energy-based models (EBMs), where the score is obtained from the negative input-gradient of the energy.…
-
Empirical Likelihood-Based Fairness Auditing: Distribution-Free Certification and Flagging
Empirical Likelihood-Based Fairness Auditing: Distribution-Free Certification and Flagging arXiv:2601.20269v1 Announce Type: new Abstract: Machine learning models in high-stakes applications, such as recidivism prediction and automated personnel selection, often exhibit systematic performance disparities across sensitive subpopulations, raising critical concerns regarding algorithmic bias. Fairness auditing addresses these risks through two primary functions: certification, which verifies adherence to…
-
Semiparametric KSD test: unifying score and distance-based approaches for goodness-of-fit testing
Semiparametric KSD test: unifying score and distance-based approaches for goodness-of-fit testing arXiv:2512.20007v1 Announce Type: new Abstract: Goodness-of-fit (GoF) tests are fundamental for assessing model adequacy. Score-based tests are appealing because they require fitting the model only once under the null. However, extending them to powerful nonparametric alternatives is difficult due to the lack of suitable…
-
Cluster-Based Generalized Additive Models Informed by Random Fourier Features
Cluster-Based Generalized Additive Models Informed by Random Fourier Features arXiv:2512.19373v1 Announce Type: new Abstract: Explainable machine learning aims to strike a balance between prediction accuracy and model transparency, particularly in settings where black-box predictive models, such as deep neural networks or kernel-based methods, achieve strong empirical performance but remain difficult to interpret. This work introduces…
-
Disentangled representations via score-based variational autoencoders
Disentangled representations via score-based variational autoencoders arXiv:2512.17127v1 Announce Type: new Abstract: We present the Score-based Autoencoder for Multiscale Inference (SAMI), a method for unsupervised representation learning that combines the theoretical frameworks of diffusion models and VAEs. By unifying their respective evidence lower bounds, SAMI formulates a principled objective that learns representations through score-based guidance of…
-
Co-Hub Node Based Multiview Graph Learning with Theoretical Guarantees
Co-Hub Node Based Multiview Graph Learning with Theoretical Guarantees arXiv:2512.12435v1 Announce Type: new Abstract: Identifying the graphical structure underlying the observed multivariate data is essential in numerous applications. Current methodologies are predominantly confined to deducing a singular graph under the presumption that the observed data are uniform. However, many contexts involve heterogeneous datasets that feature…
-
Generalized Inequality-based Approach for Probabilistic WCET Estimation
Generalized Inequality-based Approach for Probabilistic WCET Estimation arXiv:2511.11682v1 Announce Type: new Abstract: Estimating the probabilistic Worst-Case Execution Time (pWCET) is essential for ensuring the timing correctness of real-time applications, such as in robot IoT systems and autonomous driving systems. While methods based on Extreme Value Theory (EVT) can provide tight bounds, they suffer from model…
-
Score-based constrained generative modeling via Langevin diffusions with boundary conditions
Score-based constrained generative modeling via Langevin diffusions with boundary conditions arXiv:2510.23985v1 Announce Type: new Abstract: Score-based generative models based on stochastic differential equations (SDEs) achieve impressive performance in sampling from unknown distributions, but often fail to satisfy underlying constraints. We propose a constrained generative model using kinetic (underdamped) Langevin dynamics with specular reflection of velocity…
-
Learning Decentralized Routing Policies via Graph Attention-based Multi-Agent Reinforcement Learning in Lunar Delay-Tolerant Networks
Learning Decentralized Routing Policies via Graph Attention-based Multi-Agent Reinforcement Learning in Lunar Delay-Tolerant Networks arXiv:2510.20436v1 Announce Type: new Abstract: We present a fully decentralized routing framework for multi-robot exploration missions operating under the constraints of a Lunar Delay-Tolerant Network (LDTN). In this setting, autonomous rovers must relay collected data to a lander under intermittent connectivity…
-
Updated based on subreddit feedback. Applying for mid-senior based roles. Thank you
Updated based on subreddit feedback. Applying for mid-senior based roles. Thank you submitted by /u/StormyT [link] [comments] /u/StormyT Go to original source
-
Why Task-Based Evaluations Matter
Why Task-Based Evaluations Matter This article is adapted from a lecture series I gave at Deeplearn 2025: From Prototype to Production: Evaluation Strategies for Agentic Applications. Task-based evaluations, which measure an AI system’s performance in use-case-specific, real-world settings, are underadopted and understudied. There is still an outsized focus in AI literature on foundation model benchmarks.…
-
Simulation-based inference of yeast centromeres
Simulation-based inference of yeast centromeres arXiv:2509.00200v1 Announce Type: new Abstract: The chromatin folding and the spatial arrangement of chromosomes in the cell play a crucial role in DNA replication and genes expression. An improper chromatin folding could lead to malfunctions and, over time, diseases. For eukaryotes, centromeres are essential for proper chromosome segregation and folding.…
-
CP4SBI: Local Conformal Calibration of Credible Sets in Simulation-Based Inference
CP4SBI: Local Conformal Calibration of Credible Sets in Simulation-Based Inference arXiv:2508.17077v1 Announce Type: new Abstract: Current experimental scientists have been increasingly relying on simulation-based inference (SBI) to invert complex non-linear models with intractable likelihoods. However, posterior approximations obtained with SBI are often miscalibrated, causing credible regions to undercover true parameters. We develop $texttt{CP4SBI}$, a model-agnostic…
-
Statistical Inference for Autoencoder-based Anomaly Detection after Representation Learning-based Domain Adaptation
Statistical Inference for Autoencoder-based Anomaly Detection after Representation Learning-based Domain Adaptation arXiv:2508.07049v1 Announce Type: new Abstract: Anomaly detection (AD) plays a vital role across a wide range of domains, but its performance might deteriorate when applied to target domains with limited data. Domain Adaptation (DA) offers a solution by transferring knowledge from a related source…
-
Stochastic Trace Optimization of Parameter Dependent Matrices Based on Statistical Learning Theory
Stochastic Trace Optimization of Parameter Dependent Matrices Based on Statistical Learning Theory arXiv:2508.05764v1 Announce Type: new Abstract: We consider matrices $boldsymbol{A}(boldsymboltheta)inmathbb{R}^{mtimes m}$ that depend, possibly nonlinearly, on a parameter $boldsymboltheta$ from a compact parameter space $Theta$. We present a Monte Carlo estimator for minimizing $text{trace}(boldsymbol{A}(boldsymboltheta))$ over all $boldsymbolthetainTheta$, and determine the sampling amount so that…
-
DICOM De-Identification via Hybrid AI and Rule-Based Framework for Scalable, Uncertainty-Aware Redaction
DICOM De-Identification via Hybrid AI and Rule-Based Framework for Scalable, Uncertainty-Aware Redaction arXiv:2507.23736v1 Announce Type: new Abstract: Access to medical imaging and associated text data has the potential to drive major advances in healthcare research and patient outcomes. However, the presence of Protected Health Information (PHI) and Personally Identifiable Information (PII) in Digital Imaging and…
-
Measuring Sample Quality with Copula Discrepancies
Measuring Sample Quality with Copula Discrepancies arXiv:2507.21434v1 Announce Type: new Abstract: The scalable Markov chain Monte Carlo (MCMC) algorithms that underpin modern Bayesian machine learning, such as Stochastic Gradient Langevin Dynamics (SGLD), sacrifice asymptotic exactness for computational speed, creating a critical diagnostic gap: traditional sample quality measures fail catastrophically when applied to biased samplers. While…
-
Deep Learning-Based Survival Analysis with Copula-Based Activation Functions for Multivariate Response Prediction
Deep Learning-Based Survival Analysis with Copula-Based Activation Functions for Multivariate Response Prediction arXiv:2507.14641v1 Announce Type: new Abstract: This research integrates deep learning, copula functions, and survival analysis to effectively handle highly correlated and right-censored multivariate survival data. It introduces copula-based activation functions (Clayton, Gumbel, and their combinations) to model the nonlinear dependencies inherent in such…
-
Diffusion-Based Hypothesis Testing and Change-Point Detection
Diffusion-Based Hypothesis Testing and Change-Point Detection arXiv:2506.16089v1 Announce Type: new Abstract: Score-based methods have recently seen increasing popularity in modeling and generation. Methods have been constructed to perform hypothesis testing and change-point detection with score functions, but these methods are in general not as powerful as their likelihood-based peers. Recent works consider generalizing the score-based…
-
Liouville PDE-based sliced-Wasserstein flow for fair regression
Liouville PDE-based sliced-Wasserstein flow for fair regression arXiv:2505.17204v1 Announce Type: new Abstract: The sliced Wasserstein flow (SWF), a nonparametric and implicit generative gradient flow, is applied to fair regression. We have improved the SWF in a few aspects. First, the stochastic diffusive term from the Fokker-Planck equation-based Monte Carlo is transformed to Liouville partial differential…
-
An Efficient Transport-Based Dissimilarity Measure for Time Series Classification under Warping Distortions
An Efficient Transport-Based Dissimilarity Measure for Time Series Classification under Warping Distortions arXiv:2505.05676v1 Announce Type: cross Abstract: Time Series Classification (TSC) is an important problem with numerous applications in science and technology. Dissimilarity-based approaches, such as Dynamic Time Warping (DTW), are classical methods for distinguishing time series when time deformations are confounding information. In this…
-
A Practical Guide to BERTopic for Transformer-Based Topic Modeling
A Practical Guide to BERTopic for Transformer-Based Topic Modeling Topic modeling has a wide range of use cases in the natural language processing (NLP) domain, such as document tagging, survey analysis, and content organization. It falls under the realm of unsupervised learning technique, making it a very cost-effective technique that reduces the resources required to…
-
Statistical Inference for Clustering-based Anomaly Detection
Statistical Inference for Clustering-based Anomaly Detection arXiv:2504.18633v1 Announce Type: new Abstract: Unsupervised anomaly detection (AD) is a fundamental problem in machine learning and statistics. A popular approach to unsupervised AD is clustering-based detection. However, this method lacks the ability to guarantee the reliability of the detected anomalies. In this paper, we propose SI-CLAD (Statistical Inference…
-
Gradient-based Sample Selection for Faster Bayesian Optimization
Gradient-based Sample Selection for Faster Bayesian Optimization arXiv:2504.07742v1 Announce Type: new Abstract: Bayesian optimization (BO) is an effective technique for black-box optimization. However, its applicability is typically limited to moderate-budget problems due to the cubic complexity in computing the Gaussian process (GP) surrogate model. In large-budget scenarios, directly employing the standard GP model faces significant…
-
Accelerating Particle-based Energetic Variational Inference
Accelerating Particle-based Energetic Variational Inference arXiv:2504.03158v1 Announce Type: new Abstract: In this work, we propose a novel particle-based variational inference (ParVI) method that accelerates the EVI-Im. Inspired by energy quadratization (EQ) and operator splitting techniques for gradient flows, our approach efficiently drives particles towards the target distribution. Unlike EVI-Im, which employs the implicit Euler method…
-
Bayesian Optimization of Robustness Measures Using Randomized GP-UCB-based Algorithms under Input Uncertainty
Bayesian Optimization of Robustness Measures Using Randomized GP-UCB-based Algorithms under Input Uncertainty arXiv:2504.03172v1 Announce Type: new Abstract: Bayesian optimization based on Gaussian process upper confidence bound (GP-UCB) has a theoretical guarantee for optimizing black-box functions. Black-box functions often have input uncertainty, but even in this case, GP-UCB can be extended to optimize evaluation measures called…
-
Reheated Gradient-based Discrete Sampling for Combinatorial Optimization
Reheated Gradient-based Discrete Sampling for Combinatorial Optimization arXiv:2503.04047v1 Announce Type: new Abstract: Recently, gradient-based discrete sampling has emerged as a highly efficient, general-purpose solver for various combinatorial optimization (CO) problems, achieving performance comparable to or surpassing the popular data-driven approaches. However, we identify a critical issue in these methods, which we term ”wandering in contours”.…
-
Enhancing Gradient-based Discrete Sampling via Parallel Tempering
Enhancing Gradient-based Discrete Sampling via Parallel Tempering arXiv:2502.19240v1 Announce Type: new Abstract: While gradient-based discrete samplers are effective in sampling from complex distributions, they are susceptible to getting trapped in local minima, particularly in high-dimensional, multimodal discrete distributions, owing to the discontinuities inherent in these landscapes. To circumvent this issue, we combine parallel tempering, also…
-
Optimizing Likelihoods via Mutual Information: Bridging Simulation-Based Inference and Bayesian Optimal Experimental Design
Optimizing Likelihoods via Mutual Information: Bridging Simulation-Based Inference and Bayesian Optimal Experimental Design arXiv:2502.08004v1 Announce Type: new Abstract: Simulation-based inference (SBI) is a method to perform inference on a variety of complex scientific models with challenging inference (inverse) problems. Bayesian Optimal Experimental Design (BOED) aims to efficiently use experimental resources to make better inferences. Various…
-
Deep Learning-based Approaches for State Space Models: A Selective Review
Deep Learning-based Approaches for State Space Models: A Selective Review arXiv:2412.11211v1 Announce Type: new Abstract: State-space models (SSMs) offer a powerful framework for dynamical system analysis, wherein the temporal dynamics of the system are assumed to be captured through the evolution of the latent states, which govern the values of the observations. This paper provides…
-
Sequential Controlled Langevin Diffusions
Sequential Controlled Langevin Diffusions arXiv:2412.07081v1 Announce Type: new Abstract: An effective approach for sampling from unnormalized densities is based on the idea of gradually transporting samples from an easy prior to the complicated target distribution. Two popular methods are (1) Sequential Monte Carlo (SMC), where the transport is performed through successive annealed densities via prescribed…
-
Preference-based Pure Exploration
Preference-based Pure Exploration arXiv:2412.02988v1 Announce Type: new Abstract: We study the preference-based pure exploration problem for bandits with vector-valued rewards. The rewards are ordered using a (given) preference cone $mathcal{C}$ and our the goal is to identify the set of Pareto optimal arms. First, to quantify the impact of preferences, we derive a novel lower…
-
Optimal Particle-based Approximation of Discrete Distributions (OPAD)
Optimal Particle-based Approximation of Discrete Distributions (OPAD) arXiv:2412.00545v1 Announce Type: new Abstract: Particle-based methods include a variety of techniques, such as Markov Chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC), for approximating a probabilistic target distribution with a set of weighted particles. In this paper, we prove that for any set of particles, there…
-
Functional relevance based on the continuous Shapley value
Functional relevance based on the continuous Shapley value arXiv:2411.18575v1 Announce Type: new Abstract: The presence of Artificial Intelligence (AI) in our society is increasing, which brings with it the need to understand the behaviour of AI mechanisms, including machine learning predictive algorithms fed with tabular data, text, or images, among other types of data. This…