Category: aimldsaimlds

How to Use Pre-Trained Language Models for Regression

How to Use Pre-Trained Language Models for Regression Why and how to convert mT5 into a regression metric for numerical prediction Continue reading on Towards Data Science » Aden Haussmann Go to original source

January 19, 2025
Satellite Image Classification with Deep Learning — Complete Project

Satellite Image Classification with Deep Learning — Complete Project A Comprehensive Guide Using PyTorch and CNNs Continue reading on Towards Data Science » Leo Anello Go to original source

January 18, 2025
My Experience Switching From Power BI to Looker (as a Senior Data Analyst)

My Experience Switching From Power BI to Looker (as a Senior Data Analyst) What you need to know before you switch from Power BI to Looker. Continue reading on Towards Data Science » Tomas Jancovic (It’s AI Thomas) Go to original source

January 18, 2025
Where to Start When Data is Limited

Where to Start When Data is Limited A launch pad for projects with small datasets Photo by Google DeepMind: https://www.pexels.com/photo/an-artist-s-illustration-of-artificial-intelligence-ai-this-image-depicts-how-ai-can-help-humans-to-understand-the-complexity-of-biology-it-was-created-by-artist-khyati-trehan-as-part-17484975/ Machine Learning (ML) has driven remarkable breakthroughs in computer vision, natural language processing, and speech recognition, largely due to the abundance of data in these fields. However, many challenges — especially those tied to specific product features or…

January 18, 2025
Learning from Machine Learning | Sebastian Raschka: Mastering ML and Pushing AI Forward Responsibly

Learning from Machine Learning | Sebastian Raschka: Mastering ML and Pushing AI Forward Responsibly Sebastian Raschka has helped demystify deep learning for thousands through his books, tutorials and teachings Sebastian Raschka has helped shape how thousands of data scientists and machine learning engineers learn their craft. As a passionate coder and proponent of open-source software,…

January 18, 2025
A Practical Exploration of Sora — Intuitively and Exhaustively Explained

A Practical Exploration of Sora — Intuitively and Exhaustively Explained A new cutting edge video generation tool, and the theory behind it Continue reading on Towards Data Science » Daniel Warfield Go to original source

January 18, 2025
Generative Models with ELBOs Converging to Entropy Sums

Generative Models with ELBOs Converging to Entropy Sums arXiv:2501.09022v1 Announce Type: new Abstract: The evidence lower bound (ELBO) is one of the most central objectives for probabilistic unsupervised learning. For the ELBOs of several generative models and model classes, we here prove convergence to entropy sums. As one result, we provide a list of generative…

January 17, 2025
Estimating shared subspace with AJIVE: the power and limitation of multiple data matrices

Estimating shared subspace with AJIVE: the power and limitation of multiple data matrices arXiv:2501.09336v1 Announce Type: new Abstract: Integrative data analysis often requires disentangling joint and individual variations across multiple datasets, a challenge commonly addressed by the Joint and Individual Variation Explained (JIVE) model. While numerous methods have been developed to estimate the shared subspace…

January 17, 2025
On the convergence of noisy Bayesian Optimization with Expected Improvement

On the convergence of noisy Bayesian Optimization with Expected Improvement arXiv:2501.09262v1 Announce Type: new Abstract: Expected improvement (EI) is one of the most widely-used acquisition functions in Bayesian optimization (BO). Despite its proven success in applications for decades, important open questions remain on the theoretical convergence behaviors and rates for EI. In this paper, we…

January 17, 2025
Predictions as Surrogates: Revisiting Surrogate Outcomes in the Age of AI

Predictions as Surrogates: Revisiting Surrogate Outcomes in the Age of AI arXiv:2501.09731v1 Announce Type: new Abstract: We establish a formal connection between the decades-old surrogate outcome model in biostatistics and economics and the emerging field of prediction-powered inference (PPI). The connection treats predictions from pre-trained models, prevalent in the age of AI, as cost-effective surrogates…

January 17, 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks

Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks arXiv:2501.09137v1 Announce Type: cross Abstract: We study the gradient descent (GD) dynamics of a depth-2 linear neural network with a single input and output. We show that GD converges at an explicit linear rate to a global minimum of the training…

January 17, 2025
Learnings from a Machine Learning Engineer — Part 4: The Model

Learnings from a Machine Learning Engineer — Part 4: The Model Practical insights for a data-driven approach to model optimization Continue reading on Towards Data Science » David Martin Go to original source

January 17, 2025
Learnings from a Machine Learning Engineer — Part 3: The Evaluation

Learnings from a Machine Learning Engineer — Part 3: The Evaluation Practical insights for a data-driven approach to model optimization Continue reading on Towards Data Science » David Martin Go to original source

January 17, 2025
Learnings from a Machine Learning Engineer — Part 2: The Data Sets

Learnings from a Machine Learning Engineer — Part 2: The Data Sets Practical insights for a data-driven approach to model optimization Continue reading on Towards Data Science » David Martin Go to original source

January 17, 2025
Top 3 Questions to Ask in Near Real-Time Data Solutions

Top 3 Questions to Ask in Near Real-Time Data Solutions Questions that guide architectural decisions to balance functional requirements with non-functional ones, like latency and scalability Continue reading on Towards Data Science » Shawn Shi Go to original source

January 17, 2025
The Data Analyst Every CEO Wants

The Data Analyst Every CEO Wants Data Analyst is probably the most underrated job in the data industry Continue reading on Towards Data Science » Benoit Pimpaud Go to original source

January 17, 2025
A Constant Velocity Latent Dynamics Approach for Accelerating Simulation of Stiff Nonlinear Systems

A Constant Velocity Latent Dynamics Approach for Accelerating Simulation of Stiff Nonlinear Systems arXiv:2501.08423v1 Announce Type: new Abstract: Solving stiff ordinary differential equations (StODEs) requires sophisticated numerical solvers, which are often computationally expensive. In particular, StODE’s often cannot be solved with traditional explicit time integration schemes and one must resort to costly implicit methods to…

January 16, 2025
A Theory of Optimistically Universal Online Learnability for General Concept Classes

A Theory of Optimistically Universal Online Learnability for General Concept Classes arXiv:2501.08551v1 Announce Type: new Abstract: We provide a full characterization of the concept classes that are optimistically universally online learnable with ${0, 1}$ labels. The notion of optimistically universal online learning was defined in [Hanneke, 2021] in order to understand learnability under minimal assumptions.…

January 16, 2025
Causal vs. Anticausal merging of predictors

Causal vs. Anticausal merging of predictors arXiv:2501.08426v1 Announce Type: cross Abstract: We study the differences arising from merging predictors in the causal and anticausal directions using the same data. In particular we study the asymmetries that arise in a simple model where we merge the predictors using one binary variable as target and two continuous…

January 16, 2025
Quantum Reservoir Computing and Risk Bounds

Quantum Reservoir Computing and Risk Bounds arXiv:2501.08640v1 Announce Type: cross Abstract: We propose a way to bound the generalisation errors of several classes of quantum reservoirs using the Rademacher complexity. We give specific, parameter-dependent bounds for two particular quantum reservoir classes. We analyse how the generalisation bounds scale with growing numbers of qubits. Applying our…

January 16, 2025
Diagonal Over-parameterization in Reproducing Kernel Hilbert Spaces as an Adaptive Feature Model: Generalization and Adaptivity

Diagonal Over-parameterization in Reproducing Kernel Hilbert Spaces as an Adaptive Feature Model: Generalization and Adaptivity arXiv:2501.08679v1 Announce Type: cross Abstract: This paper introduces a diagonal adaptive kernel model that dynamically learns kernel eigenvalues and output coefficients simultaneously during training. Unlike fixed-kernel methods tied to the neural tangent kernel theory, the diagonal adaptive kernel model adapts…

January 16, 2025
A 12-step visual guide to understanding NeRF (Representing Scenes as Neural Radiance Fields)

A 12-step visual guide to understanding NeRF (Representing Scenes as Neural Radiance Fields) NeRF overview — Image by Author A Beginner’s 12-Step Visual Guide to Understanding NeRF: Neural Radiance Fields for Scene Representation and View Synthesis A basic understanding of NeRF’s workings through visual representations Who should read this article? This article aims to provide a basic beginner level…

January 16, 2025
Basics of GANs & SMOTE for Data Augmentation

Basics of GANs & SMOTE for Data Augmentation GANs and SMOTE Explained with Bartending: Data Science for Machine Learning Series (1) Continue reading on Towards Data Science » Sunghyun Ahn Go to original source

January 16, 2025
Learnings from a Machine Learning Engineer — Part 1: The Data

Learnings from a Machine Learning Engineer — Part 1: The Data Practical insights for a data-driven approach to model optimization Continue reading on Towards Data Science » David Martin Go to original source

January 16, 2025
Water Cooler Small Talk: Benford’s Law

Water Cooler Small Talk: Benford’s Law A look into the strange first digit distribution of naturally occurring datasets Continue reading on Towards Data Science » Maria Mouschoutzi, PhD Go to original source

January 16, 2025
Qubits Explained: Everything You Need to Know

Qubits Explained: Everything You Need to Know A deep dive into the building block of quantum computers. Continue reading on Towards Data Science » Sara A. Metwalli Go to original source

January 16, 2025
Concentration of Measure for Distributions Generated via Diffusion Models

Concentration of Measure for Distributions Generated via Diffusion Models arXiv:2501.07741v1 Announce Type: new Abstract: We show via a combination of mathematical arguments and empirical evidence that data distributions sampled from diffusion models satisfy a Concentration of Measure Property saying that any Lipschitz $1$-dimensional projection of a random vector is not too far from its mean…

January 15, 2025
On the use of Statistical Learning Theory for model selection in Structural Health Monitoring

On the use of Statistical Learning Theory for model selection in Structural Health Monitoring arXiv:2501.08050v1 Announce Type: new Abstract: Whenever data-based systems are employed in engineering applications, defining an optimal statistical representation is subject to the problem of model selection. This paper focusses on how well models can generalise in Structural Health Monitoring (SHM). Although…

January 15, 2025
On the Statistical Capacity of Deep Generative Models

On the Statistical Capacity of Deep Generative Models arXiv:2501.07763v1 Announce Type: new Abstract: Deep generative models are routinely used in generating samples from complex, high-dimensional distributions. Despite their apparent successes, their statistical properties are not well understood. A common assumption is that with enough training data and sufficiently large neural networks, deep generative model samples…

January 15, 2025
Globally Convergent Variational Inference

Globally Convergent Variational Inference arXiv:2501.08201v1 Announce Type: new Abstract: In variational inference (VI), an approximation of the posterior distribution is selected from a family of distributions through numerical optimization. With the most common variational objective function, known as the evidence lower bound (ELBO), only convergence to a local optimum can be guaranteed. In this work,…

January 15, 2025
Avoiding subtraction and division of stochastic signals using normalizing flows: NFdeconvolve

Avoiding subtraction and division of stochastic signals using normalizing flows: NFdeconvolve arXiv:2501.08288v1 Announce Type: new Abstract: Across the scientific realm, we find ourselves subtracting or dividing stochastic signals. For instance, consider a stochastic realization, $x$, generated from the addition or multiplication of two stochastic signals $a$ and $b$, namely $x=a+b$ or $x = ab$. For…

January 15, 2025
Hands-On Delivery Routes Optimization (TSP) with AI, Using LKH and Python

Hands-On Delivery Routes Optimization (TSP) with AI, Using LKH and Python Here’s how to optimize the delivery routes, from theory to code. Continue reading on Towards Data Science » Piero Paialunga Go to original source

January 15, 2025
How To: Forecast Time Series Using Lags

How To: Forecast Time Series Using Lags Lag columns can significantly boost your model’s performance Continue reading on Towards Data Science » Haden Pelletier Go to original source

January 15, 2025
Static and Dynamic Attention: Implications for Graph Neural Networks

Static and Dynamic Attention: Implications for Graph Neural Networks Examining the expressive capacity of Graph Attention Networks Image by the author In graph representation learning, neighborhood aggregation is one of the most well-studied and investigated areas, among which attention-based methods largely remain state-of-the-art. Leveraging learnable attention scores for weighted aggregations, graph attention networks exhibit higher expressivity…

January 15, 2025
Deep Dive into KV-Caching In Mistral

Deep Dive into KV-Caching In Mistral Ever wondered why the time to first token in LLMs is high but subsequent tokens are super fast? In this post, I dive into the details of KV-Caching used in Mistral, a topic I initially found quite daunting. However, as I delved deeper, it became a fascinating subject, especially when…

January 15, 2025
Scale Experiment Decision-Making with Programmatic Decision Rules

Scale Experiment Decision-Making with Programmatic Decision Rules Decide what to do with experiment results in code Photo by Cytonn Photography on Unsplash The experiment lifecycle is like the human lifecycle. First, a person or idea is born, then it develops, then it is tested, then its test ends, and then the Gods (or Product Managers) decide its worth.…

January 15, 2025
Counterfactually Fair Reinforcement Learning via Sequential Data Preprocessing

Counterfactually Fair Reinforcement Learning via Sequential Data Preprocessing arXiv:2501.06366v1 Announce Type: new Abstract: When applied in healthcare, reinforcement learning (RL) seeks to dynamically match the right interventions to subjects to maximize population benefit. However, the learned policy may disproportionately allocate efficacious actions to one subpopulation, creating or exacerbating disparities in other socioeconomically-disadvantaged subgroups. These biases…

January 14, 2025
Computational and Statistical Asymptotic Analysis of the JKO Scheme for Iterative Algorithms to update distributions

Computational and Statistical Asymptotic Analysis of the JKO Scheme for Iterative Algorithms to update distributions arXiv:2501.06408v1 Announce Type: new Abstract: The seminal paper of Jordan, Kinderlehrer, and Otto introduced what is now widely known as the JKO scheme, an iterative algorithmic framework for computing distributions. This scheme can be interpreted as a Wasserstein gradient flow…

January 14, 2025
Variable Selection Methods for Multivariate, Functional, and Complex Biomedical Data in the AI Age

Variable Selection Methods for Multivariate, Functional, and Complex Biomedical Data in the AI Age arXiv:2501.06868v1 Announce Type: new Abstract: Many problems within personalized medicine and digital health rely on the analysis of continuous-time functional biomarkers and other complex data structures emerging from high-resolution patient monitoring. In this context, this work proposes new optimization-based variable selection…

January 14, 2025
Dynamic Causal Structure Discovery and Causal Effect Estimation

Dynamic Causal Structure Discovery and Causal Effect Estimation arXiv:2501.06534v1 Announce Type: new Abstract: To represent the causal relationships between variables, a directed acyclic graph (DAG) is widely utilized in many areas, such as social sciences, epidemics, and genetics. Many causal structure learning approaches are developed to learn the hidden causal structure utilizing deep-learning approaches. However,…

January 14, 2025
Automatic Double Reinforcement Learning in Semiparametric Markov Decision Processes with Applications to Long-Term Causal Inference

Automatic Double Reinforcement Learning in Semiparametric Markov Decision Processes with Applications to Long-Term Causal Inference arXiv:2501.06926v1 Announce Type: new Abstract: Double reinforcement learning (DRL) enables statistically efficient inference on the value of a policy in a nonparametric Markov Decision Process (MDP) given trajectories generated by another policy. However, this approach necessarily requires stringent overlap between…

January 14, 2025
Machine Learning: From 0 to Something

Machine Learning: From 0 to Something How I learned ML foundations to tackle a complex problem Continue reading on Towards Data Science » Ricardo Ribas Go to original source

January 14, 2025
Four Ways to Improve Statistical Power in A/B Testing (Without Increasing Test Duration, Duh)

Four Ways to Improve Statistical Power in A/B Testing (Without Increasing Test Duration, Duh) In A/B testing, you often have to balance statistical power and how long the test takes. Learn how Allocation, Effect Size, CUPED & Binarization can help you. Image by author In A/B testing, you often have to balance statistical power and how long…

January 14, 2025
The AI (R)Evolution, Looking From 2024 Into the Immediate Future

The AI (R)Evolution, Looking From 2024 Into the Immediate Future Witnessing rapid innovation, fierce competition, and transformative tools for life, work, and human development Continue reading on Towards Data Science » LucianoSphere (Luciano Abriata, PhD) Go to original source

January 14, 2025
Contextual Topic Modelling in Chinese Corpora with KeyNMF

Contextual Topic Modelling in Chinese Corpora with KeyNMF A comprehensive guide on getting the most out of your Chinese topic models, from preprocessing to interpretation. With our recent paper on discourse dynamics in European Chinese diaspora media, our team has tapped into an almost unanimous frustration with the quality of topic modelling approaches when applied…

January 14, 2025
llama.cpp: Writing A Simple C++ Inference Program for GGUF LLM Models

llama.cpp: Writing A Simple C++ Inference Program for GGUF LLM Models Exploring llama.cpp internals and a basic chat program flow Photo by Mathew Schwartz on Unsplash llama.cpp has revolutionized the space of LLM inference by the means of wide adoption and simplicity. It has enabled enterprises and individual developers to deploy LLMs on devices ranging from SBCs…

January 14, 2025
Covariate Dependent Mixture of Bayesian Networks

Covariate Dependent Mixture of Bayesian Networks arXiv:2501.05745v1 Announce Type: new Abstract: Learning the structure of Bayesian networks from data provides insights into underlying processes and the causal relationships that generate the data, but its usefulness depends on the homogeneity of the data population, a condition often violated in real-world applications. In such cases, using a…

January 13, 2025
Outlyingness Scores with Cluster Catch Digraphs

Outlyingness Scores with Cluster Catch Digraphs arXiv:2501.05530v1 Announce Type: new Abstract: This paper introduces two novel, outlyingness scores (OSs) based on Cluster Catch Digraphs (CCDs): Outbound Outlyingness Score (OOS) and Inbound Outlyingness Score (IOS). These scores enhance the interpretability of outlier detection results. Both OSs employ graph-, density-, and distribution-based techniques, tailored to high-dimensional data…

January 13, 2025
Analog Bayesian neural networks are insensitive to the shape of the weight distribution

Analog Bayesian neural networks are insensitive to the shape of the weight distribution arXiv:2501.05564v1 Announce Type: cross Abstract: Recent work has demonstrated that Bayesian neural networks (BNN’s) trained with mean field variational inference (MFVI) can be implemented in analog hardware, promising orders of magnitude energy savings compared to the standard digital implementations. However, while Gaussians…

January 13, 2025
rmlnomogram: An R package to construct an explainable nomogram for any machine learning algorithms

rmlnomogram: An R package to construct an explainable nomogram for any machine learning algorithms arXiv:2501.05772v1 Announce Type: cross Abstract: Background: Current nomogram can only be created for regression algorithm. Providing nomogram for any machine learning (ML) algorithms may accelerate model deployment in clinical settings or improve model availability. We developed an R package and web…

January 13, 2025
Random Sparse Lifts: Construction, Analysis and Convergence of finite sparse networks

Random Sparse Lifts: Construction, Analysis and Convergence of finite sparse networks arXiv:2501.05930v1 Announce Type: cross Abstract: We present a framework to define a large class of neural networks for which, by construction, training by gradient flow provably reaches arbitrarily low loss when the number of parameters grows. Distinct from the fixed-space global optimality of non-convex…

January 13, 2025
Weekly Entering & Transitioning – Thread 13 Jan, 2025 – 20 Jan, 2025

Weekly Entering & Transitioning – Thread 13 Jan, 2025 – 20 Jan, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

January 13, 2025
Where do you go to stay up to date on data analytics/science?

Where do you go to stay up to date on data analytics/science? Are there any people or organizations you follow on Youtube, Twitter, Medium, LinkedIn, or some other website/blog/podcast that you always tend to keep going back to? My previous career absolutely lacked all the professional “content creators” that data analytics have, so I was…

January 13, 2025
Is data science at meta just a/b testing?

Is data science at meta just a/b testing? I’ve been at Meta a year and all I do is run a/b tests. In my old jobs I used to build models and products using data science. Does this happen under a different job title here or am I just in wrong department? submitted by /u/Longjumping-Will-127…

January 13, 2025
How we matured Fisher, our A/B testing library

How we matured Fisher, our A/B testing library submitted by /u/chomoloc0 [link] [comments] /u/chomoloc0 Go to original source

January 13, 2025
200 applications – no response, please help. I have applied for data science (associate or mid-level) positions. Thank you

200 applications – no response, please help. I have applied for data science (associate or mid-level) positions. Thank you submitted by /u/Sad_Campaign713 [link] [comments] /u/Sad_Campaign713 Go to original source

January 13, 2025
Using Constraint Programming to Solve Math Theorems

Using Constraint Programming to Solve Math Theorems Case study: the quasigroups existence problem TLDR Some mathematical theorems can be solved by combinatorial exploration. In this article, we focus on the problem of the existence of some quasigroups. We will demonstrate the existence or non existence of some quasigroups using NuCS. NuCs is a fast constraint…

January 13, 2025
What is MicroPython? Do I Need to Know it as a Data Scientist?

What is MicroPython? Do I Need to Know it as a Data Scientist? In this year’s edition of the Stack Overflow survey, MicroPython is with 1.6% in the Most Popular Technologies — but why? Continue reading on Towards Data Science » Sarah Lea Go to original source

January 13, 2025
Your Classifier Is Broken, But It Is Still Useful

Your Classifier Is Broken, But It Is Still Useful When you run a binary classifier over a population you get an estimate of the proportion of true positives in that population. This is known as the prevalence. Photo by Rod Long on Unsplash But that estimate is biased, because no classifier is perfect. For example, if…

January 13, 2025
What Would a Stoic Do? — An AI-Based Decision-Making Model

What Would a Stoic Do? — An AI-Based Decision-Making Model Using AI to build Marcus Aurelius’ reincarnation Continue reading on Towards Data Science » Pol Marin Go to original source

January 13, 2025
LightGBM: The Fastest Option of Gradient Boosting

LightGBM: The Fastest Option of Gradient Boosting Learn how to implement a fast and effective Gradient Boosting model using Python Continue reading on Towards Data Science » Gustavo R Santos Go to original source

January 13, 2025
Machine Learning + openAI: solving a text classification problem

Machine Learning + openAI: solving a text classification problem How I migrated an old solution to a more elegant, robust and scalable solution using text classification from openAI Continue reading on Towards Data Science » Ricardo Ribas Go to original source

January 12, 2025
Exploring New Hyperparameter Dimensions with Laplace Approximated Bayesian Optimization

Exploring New Hyperparameter Dimensions with Laplace Approximated Bayesian Optimization Is it better than grid search? Image by author from canva When I notice my model is overfitting, I often think, “It is time to regularize”. But how do I decide which regularization method to use (L1, L2) and what parameters to choose? Typically, I perform hyperparameter optimization…

January 12, 2025
Building Visual Agents that can Navigate the Web Autonomously

Building Visual Agents that can Navigate the Web Autonomously A step-by-step guide to creating visual agents that can navigate the web autonomously Continue reading on Towards Data Science » Luís Roque Go to original source

January 12, 2025
A Visual Understanding of Neural Networks

A Visual Understanding of Neural Networks The math behind neural networks visually explained Continue reading on Towards Data Science » Reza Bagheri Go to original source

January 12, 2025
3 Powerful Examples of the Python Re Library

3 Powerful Examples of the Python Re Library Explore the power of regex and save time in data analysis Continue reading on Towards Data Science » Suraj Gurav Go to original source

January 11, 2025
Solving A Rubik’s Cube with Supervised Learning — Intuitively and Exhaustively Explained

Solving A Rubik’s Cube with Supervised Learning — Intuitively and Exhaustively Explained A Popular Toy in a Brave New World Continue reading on Towards Data Science » Daniel Warfield Go to original source

January 11, 2025
Model Calibration, Explained: A Visual Guide with Code Examples for Beginners

Model Calibration, Explained: A Visual Guide with Code Examples for Beginners MODEL EVALUATION & OPTIMIZATION When all models have similar accuracy, now what? You’ve trained several classification models, and they all seem to be performing well with high accuracy scores. Congratulations! But hold on — is one model truly better than the others? Accuracy alone doesn’t tell the…

January 11, 2025
Sustainable Business Strategy with Data Analytics

Sustainable Business Strategy with Data Analytics Use data analytics to help companies design and implement strategic sustainability roadmaps to reduce their environmental footprint. Sustainable Business Strategy with Analytics — (Image by Samir Saci) Consensus means that everyone agrees to say collectively what no one believes individually. This quote captures a critical issue many companies face during their strategic…

January 11, 2025
Linearizing Llama

Linearizing Llama Speeding up Llama: A hybrid approach to attention mechanisms Source: Image by Author (Generated using Gemini 1.5 Flash) In this article, we will see how to replace softmax self-attention in Llama-3.2-1B with hybrid attention combining softmax sliding window and linear attention. This implementation will help us better understand the growing interest in linear attention…

January 11, 2025
Deep Transfer $Q$-Learning for Offline Non-Stationary Reinforcement Learning

Deep Transfer $Q$-Learning for Offline Non-Stationary Reinforcement Learning arXiv:2501.04870v1 Announce Type: new Abstract: In dynamic decision-making scenarios across business and healthcare, leveraging sample trajectories from diverse populations can significantly enhance reinforcement learning (RL) performance for specific target populations, especially when sample sizes are limited. While existing transfer learning methods primarily focus on linear regression settings,…

January 10, 2025
RieszBoost: Gradient Boosting for Riesz Regression

RieszBoost: Gradient Boosting for Riesz Regression arXiv:2501.04871v1 Announce Type: new Abstract: Answering causal questions often involves estimating linear functionals of conditional expectations, such as the average treatment effect or the effect of a longitudinal modified treatment policy. By the Riesz representation theorem, these functionals can be expressed as the expected product of the conditional expectation…

January 10, 2025
Towards understanding the bias in decision trees

Towards understanding the bias in decision trees arXiv:2501.04903v1 Announce Type: new Abstract: There is a widespread and longstanding belief that machine learning models are biased towards the majority (or negative) class when learning from imbalanced data, leading them to neglect or ignore the minority (or positive) class. In this study, we show that this belief…

January 10, 2025
Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression

Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression arXiv:2501.04898v1 Announce Type: new Abstract: We provide a convergence analysis of deep feature instrumental variable (DFIV) regression (Xu et al., 2021), a nonparametric approach to IV regression using data-adaptive features learned by deep neural networks in two stages. We prove that the DFIV algorithm…

January 10, 2025
Non-asymptotic analysis of the performance of the penalized least trimmed squares in sparse models

Non-asymptotic analysis of the performance of the penalized least trimmed squares in sparse models arXiv:2501.04946v1 Announce Type: new Abstract: The least trimmed squares (LTS) estimator is a renowned robust alternative to the classic least squares estimator and is popular in location, regression, machine learning, and AI literature. Many studies exist on LTS, including its robustness,…

January 10, 2025
The Best Way to Prepare for Data Science and Machine Learning Interviews

The Best Way to Prepare for Data Science and Machine Learning Interviews Never get stumped again Continue reading on Towards Data Science » Marina Wyss – Gratitude Driven Go to original source

January 10, 2025
Sentiment Analysis with Transformers: A Complete Deep Learning Project — PT. I

Sentiment Analysis with Transformers: A Complete Deep Learning Project — PT. I Master Fine-Tuning Transformers, Comparing Deep Learning Architectures, and Deploying Sentiment Analysis Models Continue reading on Towards Data Science » Leo Anello Go to original source

January 10, 2025
What to Do If the Logit Decision Boundary Fails?

What to Do If the Logit Decision Boundary Fails? Feature engineering for classification models using Bayesian Machine Learning Continue reading on Towards Data Science » Lukasz Gatarek Go to original source

January 10, 2025
How to Run Jupyter Notebooks and Generate HTML Reports with Python Scripts

How to Run Jupyter Notebooks and Generate HTML Reports with Python Scripts A step-by-step guide to automating Jupyter Notebook execution and report generation using Python Continue reading on Towards Data Science » Amanda Iglesias Moreno Go to original source

January 10, 2025
Building Autonomous Multi-Tool Agents with Gemini 2.0 and LangGraph

Building Autonomous Multi-Tool Agents with Gemini 2.0 and LangGraph A practical tutorial with full code examples for building and running multi-tool agents Continue reading on Towards Data Science » Youness Mansar Go to original source

January 10, 2025
Mixing Times and Privacy Analysis for the Projected Langevin Algorithm under a Modulus of Continuity

Mixing Times and Privacy Analysis for the Projected Langevin Algorithm under a Modulus of Continuity arXiv:2501.04134v1 Announce Type: new Abstract: We study the mixing time of the projected Langevin algorithm (LA) and the privacy curve of noisy Stochastic Gradient Descent (SGD), beyond nonexpansive iterations. Specifically, we derive new mixing time bounds for the projected LA…

January 9, 2025
Generation from Noisy Examples

Generation from Noisy Examples arXiv:2501.04179v1 Announce Type: new Abstract: We continue to study the learning-theoretic foundations of generation by extending the results from Kleinberg and Mullainathan [2024] and Li et al. [2024] to account for noisy example streams. In the noiseless setting of Kleinberg and Mullainathan [2024] and Li et al. [2024], an adversary picks…

January 9, 2025
Statistical Uncertainty Quantification for Aggregate Performance Metrics in Machine Learning Benchmarks

Statistical Uncertainty Quantification for Aggregate Performance Metrics in Machine Learning Benchmarks arXiv:2501.04234v1 Announce Type: new Abstract: Modern artificial intelligence is supported by machine learning models (e.g., foundation models) that are pretrained on a massive data corpus and then adapted to solve a variety of downstream tasks. To summarize performance across multiple tasks, evaluation metrics are…

January 9, 2025
Circuit Complexity Bounds for Visual Autoregressive Model

Circuit Complexity Bounds for Visual Autoregressive Model arXiv:2501.04299v1 Announce Type: new Abstract: Understanding the expressive ability of a specific model is essential for grasping its capacity limitations. Recently, several studies have established circuit complexity bounds for Transformer architecture. Besides, the Visual AutoRegressive (VAR) model has risen to be a prominent method in the field of…

January 9, 2025
On weight and variance uncertainty in neural networks for regression tasks

On weight and variance uncertainty in neural networks for regression tasks arXiv:2501.04272v1 Announce Type: new Abstract: We consider the problem of weight uncertainty proposed by [Blundell et al. (2015). Weight uncertainty in neural network. In International conference on machine learning, 1613-1622, PMLR.] in neural networks {(NNs)} specialized for regression tasks. {We further} investigate the effect…

January 9, 2025
Missing Data in Time-Series? Machine Learning Techniques (Part 2)

Missing Data in Time-Series? Machine Learning Techniques (Part 2) Using Clustering Algorithms to Handle Missing Time-Series Data Continue reading on Towards Data Science » Sara Nóbrega Go to original source

January 9, 2025
Advanced SQL Techniques for Unstructured Data Handling

Advanced SQL Techniques for Unstructured Data Handling Everything you need to know to get started with text mining Continue reading on Towards Data Science » Jiayan Yin Go to original source

January 9, 2025
Bayesian A/B Testing Falls Short

Bayesian A/B Testing Falls Short Why Bayesian A/B testing can lead to misunderstandings, inflated false positive rates, introduce bias and complicate results (Image generated by the author using Midjourney) Over the past decade, I’ve engaged in countless discussions about Bayesian A/B testing versus Frequentist A/B testing. In nearly every conversation, I’ve maintained the same viewpoint:…

January 9, 2025
Method of Moments Estimation with Python Code

Method of Moments Estimation with Python Code How to understand and implement the estimator from scratch Photo by Petr Macháček on Unsplash Let’s say you are in a customer care center, and you would like to know the probability distribution of the number of calls per minute, or in other words, you want to answer the question:…

January 9, 2025
Statistical Learnability of Strategic Linear Classifiers: A Proof Walkthrough

Statistical Learnability of Strategic Linear Classifiers: A Proof Walkthrough With the help of an intricate geometric construction, we can prove that instance-wise cost functions quickly drive SVC to infinity. In the previous article in this series, we examined the concept of strategic VC dimension (SVC) and its connection to the Fundamental Theorem of Strategic Learning.…

January 9, 2025
Class-Balance Bias in Regularized Regression

Class-Balance Bias in Regularized Regression arXiv:2501.03821v1 Announce Type: new Abstract: Regularized models are often sensitive to the scales of the features in the data and it has therefore become standard practice to normalize (center and scale) the features before fitting the model. But there are many different ways to normalize the features and the choice…

January 8, 2025
Structure-Preference Enabled Graph Embedding Generation under Differential Privacy

Structure-Preference Enabled Graph Embedding Generation under Differential Privacy arXiv:2501.03451v1 Announce Type: new Abstract: Graph embedding generation techniques aim to learn low-dimensional vectors for each node in a graph and have recently gained increasing research attention. Publishing low-dimensional node vectors enables various graph analysis tasks, such as structural equivalence and link prediction. Yet, improper publication opens…

January 8, 2025
Coupled Hierarchical Structure Learning using Tree-Wasserstein Distance

Coupled Hierarchical Structure Learning using Tree-Wasserstein Distance arXiv:2501.03627v1 Announce Type: cross Abstract: In many applications, both data samples and features have underlying hierarchical structures. However, existing methods for learning these latent structures typically focus on either samples or features, ignoring possible coupling between them. In this paper, we introduce a coupled hierarchical structure learning method…

January 8, 2025
Deep Networks are Reproducing Kernel Chains

Deep Networks are Reproducing Kernel Chains arXiv:2501.03697v1 Announce Type: cross Abstract: Identifying an appropriate function space for deep neural networks remains a key open question. While shallow neural networks are naturally associated with Reproducing Kernel Banach Spaces (RKBS), deep networks present unique challenges. In this work, we extend RKBS to chain RKBS (cRKBS), a new…

January 8, 2025
Symmetry and Generalisation in Machine Learning

Symmetry and Generalisation in Machine Learning arXiv:2501.03858v1 Announce Type: cross Abstract: This work is about understanding the impact of invariance and equivariance on generalisation in supervised learning. We use the perspective afforded by an averaging operator to show that for any predictor that is not equivariant, there is an equivariant predictor with strictly lower test…

January 8, 2025
How To Learn Math for Machine Learning, Fast

How To Learn Math for Machine Learning, Fast Even with zero math background Photo by Antoine Dautry on Unsplash Do you want to become a Data Scientist or machine learning engineer, but you feel intimidated by all the math involved? I get it. I’ve been there. I dropped out of High School after 10th grade, so I…

January 8, 2025
How Recurrent Neural Networks (RNNs) Are Revolutionizing Decision-Making Research

How Recurrent Neural Networks (RNNs) Are Revolutionizing Decision-Making Research A deep dive into the world of computational modeling and its applications Continue reading on Towards Data Science » Kaushik Rajan Go to original source

January 8, 2025
Understanding the Evolution of ChatGPT: Part 1—An In-Depth Look at GPT-1 and What Inspired It

Understanding the Evolution of ChatGPT: Part 1—An In-Depth Look at GPT-1 and What Inspired It Tracing the roots of ChatGPT: GPT-1, the foundation of OpenAI’s LLMs (Image from Unsplash) The GPT (Generative Pre-Training) model family, first introduced by OpenAI in 2018, is another important application of the Transformer architecture. It has since evolved through versions like…

January 8, 2025
How to Securely Connect Microsoft Fabric to Azure Databricks SQL API

How to Securely Connect Microsoft Fabric to Azure Databricks SQL API Integration architecture focusing on security and access control Connecting Compute — image by Alexandre Debiève on Unsplash 1. Introduction Microsoft Fabric and Azure Databricks are both powerhouses in the data analytics field. These platforms can be used end-to-end in a medallion architecture, from data ingestion to creating data…

January 8, 2025
How to Build an AI Agent for Data Analytics Without Writing SQL

How to Build an AI Agent for Data Analytics Without Writing SQL Create a comprehensive AI agent from the ground up utilizing LangChain and DuckDB Continue reading on Towards Data Science » Chengzhi Zhao Go to original source

January 8, 2025