Tag: information

Data-Driven Information-Theoretic Causal Bounds under Unmeasured Confounding

Data-Driven Information-Theoretic Causal Bounds under Unmeasured Confounding arXiv:2601.17160v1 Announce Type: new Abstract: We develop a data-driven information-theoretic framework for sharp partial identification of causal effects under unmeasured confounding. Existing approaches often rely on restrictive assumptions, such as bounded or discrete outcomes; require external inputs (for example, instrumental variables, proxies, or user-specified sensitivity parameters); necessitate full…

January 27, 2026
An approach to Fisher-Rao metric for infinite dimensional non-parametric information geometry

An approach to Fisher-Rao metric for infinite dimensional non-parametric information geometry arXiv:2512.21451v1 Announce Type: new Abstract: Being infinite dimensional, non-parametric information geometry has long faced an “intractability barrier” due to the fact that the Fisher-Rao metric is now a functional incurring difficulties in defining its inverse. This paper introduces a novel framework to resolve the…

December 29, 2025
BITS for GAPS: Bayesian Information-Theoretic Sampling for hierarchical GAussian Process Surrogates

BITS for GAPS: Bayesian Information-Theoretic Sampling for hierarchical GAussian Process Surrogates arXiv:2511.16815v1 Announce Type: new Abstract: We introduce the Bayesian Information-Theoretic Sampling for hierarchical GAussian Process Surrogates (BITS for GAPS) framework to emulate latent components in hybrid physical systems. BITS for GAPS supports serial hybrid modeling, where known physics governs part of the system and…

November 24, 2025
How to Perform Agentic Information Retrieval

How to Perform Agentic Information Retrieval Learn how to utilize AI agents to find information in your document corpus The post How to Perform Agentic Information Retrieval appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

November 20, 2025
Knowledge vs. Experience: Asymptotic Limits of Impatience in Edge Tenants

Knowledge vs. Experience: Asymptotic Limits of Impatience in Edge Tenants arXiv:2511.13763v1 Announce Type: new Abstract: We study how two information feeds, a closed-form Markov estimator of residual sojourn and an online trained actor-critic, affect reneging and jockeying in a dual M/M/1 system. Analytically, for unequal service rates and total-time patience, we show that total wait…

November 19, 2025
Unifying Information-Theoretic and Pair-Counting Clustering Similarity

Unifying Information-Theoretic and Pair-Counting Clustering Similarity arXiv:2511.03000v1 Announce Type: new Abstract: Comparing clusterings is central to evaluating unsupervised models, yet the many existing similarity measures can produce widely divergent, sometimes contradictory, evaluations. Clustering similarity measures are typically organized into two principal families, pair-counting and information-theoretic, reflecting whether they quantify agreement through element pairs or aggregate…

November 6, 2025
A novel Information-Driven Strategy for Optimal Regression Assessment

A novel Information-Driven Strategy for Optimal Regression Assessment arXiv:2510.14222v1 Announce Type: new Abstract: In Machine Learning (ML), a regression algorithm aims to minimize a loss function based on data. An assessment method in this context seeks to quantify the discrepancy between the optimal response for an input-output system and the estimate produced by a learned…

October 17, 2025
IndiSeek learns information-guided disentangled representations

IndiSeek learns information-guided disentangled representations arXiv:2509.21584v1 Announce Type: new Abstract: Learning disentangled representations is a fundamental task in multi-modal learning. In modern applications such as single-cell multi-omics, both shared and modality-specific features are critical for characterizing cell states and supporting downstream analyses. Ideally, modality-specific features should be independent of shared ones while also capturing all…

September 29, 2025
An Information-Theoretic Framework for Credit Risk Modeling: Unifying Industry Practice with Statistical Theory for Fair and Interpretable Scorecards

An Information-Theoretic Framework for Credit Risk Modeling: Unifying Industry Practice with Statistical Theory for Fair and Interpretable Scorecards arXiv:2509.09855v1 Announce Type: new Abstract: Credit risk modeling relies extensively on Weight of Evidence (WoE) and Information Value (IV) for feature engineering, and Population Stability Index (PSI) for drift monitoring, yet their theoretical foundations remain disconnected. We…

September 15, 2025
The Information Dynamics of Generative Diffusion

The Information Dynamics of Generative Diffusion arXiv:2508.19897v1 Announce Type: new Abstract: Generative diffusion models have emerged as a powerful class of models in machine learning, yet a unified theoretical understanding of their operation is still developing. This perspective paper provides an integrated perspective on generative diffusion by connecting their dynamic, information-theoretic, and thermodynamic properties under…

August 28, 2025
Measuring Semantic Information Production in Generative Diffusion Models

Measuring Semantic Information Production in Generative Diffusion Models arXiv:2506.10433v1 Announce Type: new Abstract: It is well known that semantic and structural features of the generated images emerge at different times during the reverse dynamics of diffusion, a phenomenon that has been connected to physical phase transitions in magnets and other materials. In this paper, we…

June 13, 2025
CoT Information: Improved Sample Complexity under Chain-of-Thought Supervision

CoT Information: Improved Sample Complexity under Chain-of-Thought Supervision arXiv:2505.15927v1 Announce Type: new Abstract: Learning complex functions that involve multi-step reasoning poses a significant challenge for standard supervised learning from input-output examples. Chain-of-thought (CoT) supervision, which provides intermediate reasoning steps together with the final output, has emerged as a powerful empirical technique, underpinning much of the…

May 23, 2025
Build and Query Knowledge Graphs with LLMs

Build and Query Knowledge Graphs with LLMs Knowledge Graphs are relevant A Knowledge Graph could be defined as a structured representation of information that connects concepts, entities, and their relationships in a way that mimics human understanding. It is often used to organise and integrate data from various sources, enabling machines to reason, infer, and retrieve relevant…

May 3, 2025
Retrieval Augmented Generation (RAG) — An Introduction

Retrieval Augmented Generation (RAG) — An Introduction The model hallucinated! It was giving me OK answers and then it just started hallucinating. We’ve all heard or experienced it. Natural Language Generation models can sometimes hallucinate, i.e., they start generating text that is not quite accurate for the prompt provided. In layman’s terms, they start making…

April 22, 2025
Agentic GraphRAG for Commercial Contracts

Agentic GraphRAG for Commercial Contracts In every business, legal contracts are foundational documents that define the relationships, obligations, and responsibilities between parties. Whether it’s a partnership agreement, an NDA, or a supplier contract, these documents often contain critical information that drives decision-making, risk management, and compliance. However, navigating and extracting insights from these contracts can…

April 3, 2025
Exploiting Concavity Information in Gaussian Process Contextual Bandit Optimization

Exploiting Concavity Information in Gaussian Process Contextual Bandit Optimization arXiv:2503.10836v1 Announce Type: new Abstract: The contextual bandit framework is widely used to solve sequential optimization problems where the reward of each decision depends on auxiliary context variables. In settings such as medicine, business, and engineering, the decision maker often possesses additional structural information on the…

March 17, 2025
Empirical Bound Information-Directed Sampling for Norm-Agnostic Bandits

Empirical Bound Information-Directed Sampling for Norm-Agnostic Bandits arXiv:2503.05098v1 Announce Type: new Abstract: Information-directed sampling (IDS) is a powerful framework for solving bandit problems which has shown strong results in both Bayesian and frequentist settings. However, frequentist IDS, like many other bandit algorithms, requires that one have prior knowledge of a (relatively) tight upper bound on…

March 10, 2025
Overcome Failing Document Ingestion & RAG Strategies with Agentic Knowledge Distillation

Overcome Failing Document Ingestion & RAG Strategies with Agentic Knowledge Distillation Introduction Many generative AI use cases still revolve around Retrieval Augmented Generation (RAG), yet consistently fall short of user expectations. Despite the growing body of research on RAG improvements and even adding Agents into the process, many solutions still fail to return exhaustive results,…

March 6, 2025
Evolution of Information in Interactive Decision Making: A Case Study for Multi-Armed Bandits

Evolution of Information in Interactive Decision Making: A Case Study for Multi-Armed Bandits arXiv:2503.00273v1 Announce Type: new Abstract: We study the evolution of information in interactive decision making through the lens of a stochastic multi-armed bandit problem. Focusing on a fundamental example where a unique optimal arm outperforms the rest by a fixed margin, we…

March 4, 2025
AI-Powered Information Extraction and Matchmaking

AI-Powered Information Extraction and Matchmaking Developing an application for extracting key profile information from CVs and recommending jobs aligned with the profile Continue reading on Towards Data Science » Umair Ali Khan Go to original source

January 2, 2025
An information theoretic limit to data amplification

An information theoretic limit to data amplification arXiv:2412.18041v1 Announce Type: new Abstract: In recent years generative artificial intelligence has been used to create data to support science analysis. For example, Generative Adversarial Networks (GANs) have been trained using Monte Carlo simulated input and then used to generate data for the same problem. This has the…

December 25, 2024
Statistical Undersampling with Mutual Information and Support Points

Statistical Undersampling with Mutual Information and Support Points arXiv:2412.14527v1 Announce Type: new Abstract: Class imbalance and distributional differences in large datasets present significant challenges for classification tasks machine learning, often leading to biased models and poor predictive performance for minority classes. This work introduces two novel undersampling approaches: mutual information-based stratified simple random sampling and…

December 20, 2024
An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandits

An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandits arXiv:2412.02861v1 Announce Type: new Abstract: We study the performance of the Thompson Sampling algorithm for logistic bandit problems, where the agent receives binary rewards with probabilities determined by a logistic function $exp(beta langle a, theta rangle)/(1+exp(beta langle a, theta rangle))$. We focus on the setting where…

December 5, 2024