Category: aimldsaimlds

Modeling COVID-19 spread in the USA using metapopulation SIR models coupled with graph convolutional neural networks

Modeling COVID-19 spread in the USA using metapopulation SIR models coupled with graph convolutional neural networks arXiv:2501.02043v1 Announce Type: new Abstract: Graph convolutional neural networks (GCNs) have shown tremendous promise in addressing data-intensive challenges in recent years. In particular, some attempts have been made to improve predictions of Susceptible-Infected-Recovered (SIR) models by incorporating human mobility…

January 7, 2025
Majorization-Minimization Dual Stagewise Algorithm for Generalized Lasso

Majorization-Minimization Dual Stagewise Algorithm for Generalized Lasso arXiv:2501.02197v1 Announce Type: new Abstract: The generalized lasso is a natural generalization of the celebrated lasso approach to handle structural regularization problems. Many important methods and applications fall into this framework, including fused lasso, clustered lasso, and constrained lasso. To elevate its effectiveness in large-scale problems, extensive research…

January 7, 2025
Beyond Log-Concavity and Score Regularity: Improved Convergence Bounds for Score-Based Generative Models in W2-distance

Beyond Log-Concavity and Score Regularity: Improved Convergence Bounds for Score-Based Generative Models in W2-distance arXiv:2501.02298v1 Announce Type: new Abstract: Score-based Generative Models (SGMs) aim to sample from a target distribution by learning score functions using samples perturbed by Gaussian noise. Existing convergence bounds for SGMs in the $mathcal{W}_2$-distance rely on stringent assumptions about the data…

January 7, 2025
Robust Multi-Dimensional Scaling via Accelerated Alternating Projections

Robust Multi-Dimensional Scaling via Accelerated Alternating Projections arXiv:2501.02208v1 Announce Type: new Abstract: We consider the robust multi-dimensional scaling (RMDS) problem in this paper. The goal is to localize point locations from pairwise distances that may be corrupted by outliers. Inspired by classic MDS theories, and nonconvex works for the robust principal component analysis (RPCA) problem,…

January 7, 2025
Who Wrote This? Zero-Shot Statistical Tests for LLM-Generated Text Detection using Finite Sample Concentration Inequalities

Who Wrote This? Zero-Shot Statistical Tests for LLM-Generated Text Detection using Finite Sample Concentration Inequalities arXiv:2501.02406v1 Announce Type: new Abstract: Verifying the provenance of content is crucial to the function of many organizations, e.g., educational institutions, social media platforms, firms, etc. This problem is becoming increasingly difficult as text generated by Large Language Models (LLMs)…

January 7, 2025
The State of Quantum Computing: Where Are We Today?

The State of Quantum Computing: Where Are We Today? And what we need to overcome Continue reading on Towards Data Science » Sara A. Metwalli Go to original source

January 7, 2025
Encapsulation: A Software Engineering Concept Data Scientists Must Know To Succeed

Encapsulation: A Software Engineering Concept Data Scientists Must Know To Succeed Simple concepts that differentiate a professional from amateurs Continue reading on Towards Data Science » Benjamin Lee Go to original source

January 7, 2025
In Defense of Statistical Significance

In Defense of Statistical Significance We have to draw the line somewhere Photo by Siora Photography on Unsplash It’s become something of a meme that statistical significance is a bad standard. Several recent blogs have made the rounds, making the case that statistical significance is a “cult” or “arbitrary.” If you’d like a classic polemic (and…

January 7, 2025
AI Agents Hype, Explained — What You Really Need to Know to Get Started

AI Agents Hype, Explained — What You Really Need to Know to Get Started I’ll set the record straight — AI Agents are not new but advanced. Learn how they’ve evolved and where to get started. Continue reading on Towards Data Science » Marc Nehme Go to original source

January 7, 2025
Data behind the Luck, Ambition, and a Billion-Dollar Dream: Lottery

Data behind the Luck, Ambition, and a Billion-Dollar Dream: Lottery Using Seattle’s local retail store data for consumer patterns of the lottery (SQL, Python) Continue reading on Towards Data Science » Sunghyun Ahn Go to original source

January 7, 2025
Guaranteed Nonconvex Low-Rank Tensor Estimation via Scaled Gradient Descent

Guaranteed Nonconvex Low-Rank Tensor Estimation via Scaled Gradient Descent arXiv:2501.01696v1 Announce Type: new Abstract: Tensors, which give a faithful and effective representation to deliver the intrinsic structure of multi-dimensional data, play a crucial role in an increasing number of signal processing and machine learning problems. However, tensor data are often accompanied by arbitrary signal corruptions,…

January 6, 2025
Signal Recovery Using a Spiked Mixture Model

Signal Recovery Using a Spiked Mixture Model arXiv:2501.01840v1 Announce Type: new Abstract: We introduce the spiked mixture model (SMM) to address the problem of estimating a set of signals from many randomly scaled and noisy observations. Subsequently, we design a novel expectation-maximization (EM) algorithm to recover all parameters of the SMM. Numerical experiments show that…

January 6, 2025
Unified Native Spaces in Kernel Methods

Unified Native Spaces in Kernel Methods arXiv:2501.01825v1 Announce Type: new Abstract: There exists a plethora of parametric models for positive definite kernels, and their use is ubiquitous in disciplines as diverse as statistics, machine learning, numerical analysis, and approximation theory. Usually, the kernel parameters index certain features of an associated process. Amongst those features, smoothness…

January 6, 2025
Transfer Neyman-Pearson Algorithm for Outlier Detection

Transfer Neyman-Pearson Algorithm for Outlier Detection arXiv:2501.01525v1 Announce Type: cross Abstract: We consider the problem of transfer learning in outlier detection where target abnormal data is rare. While transfer learning has been considered extensively in traditional balanced classification, the problem of transfer in outlier detection and more generally in imbalanced classification settings has received less…

January 6, 2025
Many of Your DPOs are Secretly One: Attempting Unification Through Mutual Information

Many of Your DPOs are Secretly One: Attempting Unification Through Mutual Information arXiv:2501.01544v1 Announce Type: cross Abstract: Post-alignment of large language models (LLMs) is critical in improving their utility, safety, and alignment with human intentions. Direct preference optimisation (DPO) has become one of the most widely used algorithms for achieving this alignment, given its ability…

January 6, 2025
Weekly Entering & Transitioning – Thread 06 Jan, 2025 – 13 Jan, 2025

Weekly Entering & Transitioning – Thread 06 Jan, 2025 – 13 Jan, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

January 6, 2025
data experience

data experience submitted by /u/fool126 [link] [comments] /u/fool126 Go to original source

January 6, 2025
Do you prepare for interviews first or apply for jobs first?

Do you prepare for interviews first or apply for jobs first? I’ve started looking for a new job and find myself in a bit of a dilemma that I’m hoping you might have some experience with. Every day, I come across roles that seem like a great fit, but I hesitate to apply because I…

January 6, 2025
What’s your biggest time sink as a data scientist?

What’s your biggest time sink as a data scientist? I’ve got a few ideas for DS tooling I was thinking of taking on as a side project, so this is a bit of a market research post. I’m curious what data-scientist specific task/problem is the biggest time suck for you at work. I feel like…

January 6, 2025
How are these companies building video/image generation tools? From scratch, fine-tuning Llama, or something else?

How are these companies building video/image generation tools? From scratch, fine-tuning Llama, or something else? There’s an enormous amount of LLM-based tools popping up lately, especially in video/image generation, each tied to a different company. Meanwhile, we only see a handful of really good open-source LLM models available. So, my question is: How are these…

January 6, 2025
Predicting a Ball Trajectory

Predicting a Ball Trajectory Polynomial Fit in Python with NumPy Continue reading on Towards Data Science » Florian Trautweiler Go to original source

January 6, 2025
LangChain Meets Home Assistant: Unlock the Power of Generative AI in Your Smart Home

LangChain Meets Home Assistant: Unlock the Power of Generative AI in Your Smart Home Learn how to create an agent that understands your home’s context, learns your preferences, and interacts with you and your home to accomplish activities you find valuable. Photo by Igor Omilaev on Unsplash Introduction This article describes the architecture and design of…

January 6, 2025
Awesome Plotly with Code Series (Part 7): Cropping the y-axis in Bar Charts

Awesome Plotly with Code Series (Part 7): Cropping the y-axis in Bar Charts Is there ever a good reason for starting a bar chart above zero? Continue reading on Towards Data Science » Jose Parreño Go to original source

January 6, 2025
Mastering the Basics: How Linear Regression Unlocks the Secrets of Complex Models

Mastering the Basics: How Linear Regression Unlocks the Secrets of Complex Models Full explanation on Linear Regression and how it learns The Crane Stance. Public Domain image from Openverse Just like Mr. Miyagi taught young Daniel LaRusso karate through repetitive simple chores, which ultimately transformed him into the Karate Kid, mastering foundational algorithms like linear regression…

January 5, 2025
Journey to Full-Stack Data Scientist: Model Deployment

Journey to Full-Stack Data Scientist: Model Deployment An introduction to productionizing machine learning models using APIs and Docker. Growing Responsibilities of Data Scientists The title of data scientist is ever-changing and often vague. It usually involves one who is fluent in mathematics, programming, and machine learning. They spend time cleaning data, building models, fine-tuning, and conducting…

January 5, 2025
The Next Frontier in LLM Accuracy

The Next Frontier in LLM Accuracy Exploring the Power of Lamini Memory Tuning Image generated by DALL-E 3 Accuracy is often critical for LLM applications, especially in cases such as API calling or summarisation of financial reports. Fortunately, there are ways to enhance precision. The best practices to improve accuracy include the following steps: You can start…

January 5, 2025
How to Tell Among Two Regression Models with Statistical Significance

How to Tell Among Two Regression Models with Statistical Significance Diving into the F-test for nested models with algorithms, examples and code Continue reading on Towards Data Science » LucianoSphere (Luciano Abriata, PhD) Go to original source

January 4, 2025
The Cultural Impact of AI Generated Content: Part 2

The Cultural Impact of AI Generated Content: Part 2 What can we do about the increasingly sophisticated AI generated content in our lives? Photo by Meszárcsek Gergely on Unsplash In my prior column, I established how AI generated content is expanding online, and described scenarios to illustrate why it’s occurring. (Please read that before you go on…

January 4, 2025
Chi-Squared Test: Comparing Variations Through Soccer

Chi-Squared Test: Comparing Variations Through Soccer Understanding Different Types of Chi-Squared Tests: A/B Testing for Data Science Series (8) Continue reading on Towards Data Science » Sunghyun Ahn Go to original source

January 4, 2025
Non-Technical Principles All Data Scientists Should Have

Non-Technical Principles All Data Scientists Should Have Making you a better data scientist, and enhancing your career. Continue reading on Towards Data Science » Marc Matterson Go to original source

January 4, 2025
What I’m Updating in My AI Ethics Class for 2025

What I’m Updating in My AI Ethics Class for 2025 What happened in 2024 that is new and significant in the world of AI ethics? The new technology developments have come in fast, but what has ethical or values implications that are going to matter long-term? I’ve been working on updates for my 2025 class…

January 4, 2025
Post Launch Evaluation of Policies in a High-Dimensional Setting

Post Launch Evaluation of Policies in a High-Dimensional Setting arXiv:2501.00119v1 Announce Type: new Abstract: A/B tests, also known as randomized controlled experiments (RCTs), are the gold standard for evaluating the impact of new policies, products, or decisions. However, these tests can be costly in terms of time and resources, potentially exposing users, customers, or other…

January 3, 2025
Efficient Human-in-the-Loop Active Learning: A Novel Framework for Data Labeling in AI Systems

Efficient Human-in-the-Loop Active Learning: A Novel Framework for Data Labeling in AI Systems arXiv:2501.00277v1 Announce Type: new Abstract: Modern AI algorithms require labeled data. In real world, majority of data are unlabeled. Labeling the data are costly. this is particularly true for some areas requiring special skills, such as reading radiology images by physicians. To…

January 3, 2025
Different thresholding methods on Nearest Shrunken Centroid algorithm

Different thresholding methods on Nearest Shrunken Centroid algorithm arXiv:2501.00632v1 Announce Type: new Abstract: This article considers the impact of different thresholding methods to the Nearest Shrunken Centroid algorithm, which is popularly referred as the Prediction Analysis of Microarrays (PAM) for high-dimensional classification. PAM uses soft thresholding to achieve high computational efficiency and high classification accuracy…

January 3, 2025
A Distributional Evaluation of Generative Image Models

A Distributional Evaluation of Generative Image Models arXiv:2501.00744v1 Announce Type: new Abstract: Generative models are ubiquitous in modern artificial intelligence (AI) applications. Recent advances have led to a variety of generative modeling approaches that are capable of synthesizing highly realistic samples. Despite these developments, evaluating the distributional match between the synthetic samples and the target…

January 3, 2025
Ensuring superior learning outcomes and data security for authorized learner

Ensuring superior learning outcomes and data security for authorized learner arXiv:2501.00754v1 Announce Type: new Abstract: The learner’s ability to generate a hypothesis that closely approximates the target function is crucial in machine learning. Achieving this requires sufficient data; however, unauthorized access by an eavesdropping learner can lead to security risks. Thus, it is important to…

January 3, 2025
Demand Forecasting with Darts: A Tutorial

Demand Forecasting with Darts: A Tutorial A hands-on tutorial with Python and Darts for demand forecasting, showcasing the power of TiDE and TFT Photo by Victoriano Izquierdo on Unsplash Demand forecasting for retailing companies can become a complex task, as several factors need to be considered from the start of the project to the final deployment. This…

January 3, 2025
The Fallacy of Complacent Distroless Containers

The Fallacy of Complacent Distroless Containers Making containers smaller is the most popular practice when reducing your attack surface. But how real is this sense of security? Continue reading on Towards Data Science » Cristovao Cordeiro Go to original source

January 3, 2025
Mastering Sensor Fusion: Color Image Obstacle Detection with KITTI Data — Part 2

Mastering Sensor Fusion: Color Image Obstacle Detection with KITTI Data — Part 2 Mastering Sensor Fusion: Color Image Obstacle Detection with KITTI Data — Part 2 How to use color image data for object detection in the context of obstacle detection The concept of sensor fusion is a decision-making mechanism that can be applied to different problems and using different…

January 3, 2025
Sensor Fusion — KITTI — ‘Lidar-based Obstacle Detection’ — Part-1

Sensor Fusion — KITTI — ‘Lidar-based Obstacle Detection’ — Part-1 Mastering Sensor Fusion: LiDAR Obstacle Detection with KITTI Data — Part 1 How to use Lidar data for obstacle detection with unsupervised learning Sensor fusion, multi-modal perception, autonomous vehicles — if these keywords pique your interest, this Medium blog is for you. Join me as I explore the fascinating world of LiDAR and color image-based environment…

January 3, 2025
How to Stand Out in The Data Science Job Market

How to Stand Out in The Data Science Job Market How to have the edge in your data science application Continue reading on Towards Data Science » Egor Howell Go to original source

January 3, 2025
AI-Powered Information Extraction and Matchmaking

AI-Powered Information Extraction and Matchmaking Developing an application for extracting key profile information from CVs and recommending jobs aligned with the profile Continue reading on Towards Data Science » Umair Ali Khan Go to original source

January 2, 2025
Scaling Statistics: Incremental Standard Deviation in SQL with dbt

Scaling Statistics: Incremental Standard Deviation in SQL with dbt Why scan yesterday’s data when you can increment today’s? Image by the author SQL aggregation functions can be computationally expensive when applied to large datasets. As datasets grow, recalculating metrics over the entire dataset repeatedly becomes inefficient. To address this challenge, incremental aggregation is often employed — a method…

January 2, 2025
GDD: Generative Driven Design

GDD: Generative Driven Design Reflective generative AI software components as a development paradigm Nowhere has the proliferation of generative AI tooling been more aggressive than in the world of software development. It began with GitHub Copilot’s supercharged autocomplete, then exploded into direct code-along integrated tools like Aider and Cursor that allow software engineers to dictate…

January 2, 2025
Transforming Data into Solutions: Building a Smart App with Python and AI

Transforming Data into Solutions: Building a Smart App with Python and AI Some financial analysts worry that artificial intelligence may not justify the massive investments being made in the field. While I understand their concerns, I see things differently. I’m neither an AI Boomer nor an AI Doomer — I believe AI has the potential to drive…

January 2, 2025
Multi-Agentic RAG with Hugging Face Code Agents

Multi-Agentic RAG with Hugging Face Code Agents Using Qwen2.5–7B-Instruct powered code agents to create a local, open source, multi-agentic RAG system Photo by Jaredd Craig on Unsplash Large Language Models have shown impressive capabilities and they are still undergoing steady improvements with each new generation of models released. Applications such as chatbots and summarisation can directly exploit…

January 1, 2025
Stop the Count! Why Putting A Time Limit on Metrics is Critical for Fast and Accurate Experiments

Stop the Count! Why Putting A Time Limit on Metrics is Critical for Fast and Accurate Experiments Why your experiments might never reach significance Photo by Andrik Langfield on Unsplash Introduction Experiments usually compare the frequency of an event (or some other sum metric) after either exposure (treatment) or non-exposure (control) to some intervention. For example:…

January 1, 2025
Partial Dependence Plots: How to Discover Variables Influencing a Model

Partial Dependence Plots: How to Discover Variables Influencing a Model Have you ever wondered how machine learning models are constructed? ‘Explainability of machine learning models’ and ‘machine learning… Continue reading on Towards Data Science » Mythili Krishnan Go to original source

January 1, 2025
Top 12 Skills Data Scientists Need to Succeed in 2025

Top 12 Skills Data Scientists Need to Succeed in 2025 It’s (not) all about LLMs and AI tools Continue reading on Towards Data Science » Benjamin Bodner Go to original source

January 1, 2025
Creating SMOTE Oversampling from Scratch

Creating SMOTE Oversampling from Scratch A Python tutorial on how to implement oversampling and how to make custom variations Continue reading on Towards Data Science » Hari Devanathan Go to original source

January 1, 2025
Surrogate Modeling for Explainable Predictive Time Series Corrections

Surrogate Modeling for Explainable Predictive Time Series Corrections arXiv:2412.19897v1 Announce Type: new Abstract: We introduce a local surrogate approach for explainable time-series forecasting. An initially non-interpretable predictive model to improve the forecast of a classical time-series ‘base model’ is used. ‘Explainability’ of the correction is provided by fitting the base model again to the data…

December 31, 2024
Confidence Interval Construction and Conditional Variance Estimation with Dense ReLU Networks

Confidence Interval Construction and Conditional Variance Estimation with Dense ReLU Networks arXiv:2412.20355v1 Announce Type: new Abstract: This paper addresses the problems of conditional variance estimation and confidence interval construction in nonparametric regression using dense networks with the Rectified Linear Unit (ReLU) activation function. We present a residual-based framework for conditional variance estimation, deriving nonasymptotic bounds…

December 31, 2024
Deep Generalized Schr”odinger Bridges: From Image Generation to Solving Mean-Field Games

Deep Generalized Schr”odinger Bridges: From Image Generation to Solving Mean-Field Games arXiv:2412.20279v1 Announce Type: new Abstract: Generalized Schr”odinger Bridges (GSBs) are a fundamental mathematical framework used to analyze the most likely particle evolution based on the principle of least action including kinetic and potential energy. In parallel to their well-established presence in the theoretical realms…

December 31, 2024
Distributionally Robust Optimization via Iterative Algorithms in Continuous Probability Spaces

Distributionally Robust Optimization via Iterative Algorithms in Continuous Probability Spaces arXiv:2412.20556v1 Announce Type: new Abstract: We consider a minimax problem motivated by distributionally robust optimization (DRO) when the worst-case distribution is continuous, leading to significant computational challenges due to the infinite-dimensional nature of the optimization problem. Recent research has explored learning the worst-case distribution using…

December 31, 2024
Testing and Improving the Robustness of Amortized Bayesian Inference for Cognitive Models

Testing and Improving the Robustness of Amortized Bayesian Inference for Cognitive Models arXiv:2412.20586v1 Announce Type: new Abstract: Contaminant observations and outliers often cause problems when estimating the parameters of cognitive models, which are statistical models representing cognitive processes. In this study, we test and improve the robustness of parameter estimation using amortized Bayesian inference (ABI)…

December 31, 2024
Lessons from COVID-19: Why Probability Distributions Matter

Lessons from COVID-19: Why Probability Distributions Matter Understanding Distributions with Extremes: Probability for Data Science Series (END) Continue reading on Towards Data Science » Sunghyun Ahn Go to original source

December 31, 2024
From Default Python Line Chart to Journal-Quality Infographics

From Default Python Line Chart to Journal-Quality Infographics Transform boring default Matplotlib line charts into stunning, customized visualizations Cover, image by the Author Everyone who has used Matplotlib knows how ugly the default charts look like. In this series of posts, I’ll share some tricks to make your visualizations stand out and reflect your individual style.…

December 31, 2024
How to Ensure the Stability of a Model Using Jackknife Estimation

How to Ensure the Stability of a Model Using Jackknife Estimation How to ensure the robustness of a model and detect influential data observations Continue reading on Towards Data Science » Paula LC Go to original source

December 31, 2024
Building a Custom AI Jira Agent

Building a Custom AI Jira Agent How I used Google Mesop, Django, LangChain Agents, CO-STAR & Chain-of-Thought (CoT) prompting combined with the Jira API to better automate Jira Photo by Google DeepMind on Unsplash The inspiration for this project came from hosting a Jira ticket creation tool on a web application I had developed for internal users.…

December 31, 2024
Mastering Model Uncertainty: Thresholding Techniques in Deep Learning

Mastering Model Uncertainty: Thresholding Techniques in Deep Learning Image generated by Dall-e A few words on thresholding, the softmax activation function, introducing an extra label, and considerations regarding output activation functions. In many real-world applications, machine learning models are not designed to make decisions in an all-or-nothing manner. Instead, there are situations where it is more…

December 31, 2024
Neural Networks Perform Sufficient Dimension Reduction

Neural Networks Perform Sufficient Dimension Reduction arXiv:2412.19033v1 Announce Type: new Abstract: This paper investigates the connection between neural networks and sufficient dimension reduction (SDR), demonstrating that neural networks inherently perform SDR in regression tasks under appropriate rank regularizations. Specifically, the weights in the first layer span the central mean subspace. We establish the statistical consistency…

December 30, 2024
Adaptive Conformal Inference by Betting

Adaptive Conformal Inference by Betting arXiv:2412.19318v1 Announce Type: new Abstract: Conformal prediction is a valuable tool for quantifying predictive uncertainty of machine learning models. However, its applicability relies on the assumption of data exchangeability, a condition which is often not met in real-world scenarios. In this paper, we consider the problem of adaptive conformal inference…

December 30, 2024
Localized exploration in contextual dynamic pricing achieves dimension-free regret

Localized exploration in contextual dynamic pricing achieves dimension-free regret arXiv:2412.19252v1 Announce Type: new Abstract: We study the problem of contextual dynamic pricing with a linear demand model. We propose a novel localized exploration-then-commit (LetC) algorithm which starts with a pure exploration stage, followed by a refinement stage that explores near the learned optimal pricing policy,…

December 30, 2024
Asymptotically Optimal Search for a Change Point Anomaly under a Composite Hypothesis Model

Asymptotically Optimal Search for a Change Point Anomaly under a Composite Hypothesis Model arXiv:2412.19392v1 Announce Type: new Abstract: We address the problem of searching for a change point in an anomalous process among a finite set of M processes. Specifically, we address a composite hypothesis model in which each process generates measurements following a common…

December 30, 2024
Low-Rank Contextual Reinforcement Learning from Heterogeneous Human Feedback

Low-Rank Contextual Reinforcement Learning from Heterogeneous Human Feedback arXiv:2412.19436v1 Announce Type: new Abstract: Reinforcement learning from human feedback (RLHF) has become a cornerstone for aligning large language models with human preferences. However, the heterogeneity of human feedback, driven by diverse individual contexts and preferences, poses significant challenges for reward learning. To address this, we propose…

December 30, 2024
My Data Science Manifesto from a Self Taught Data Scientist

My Data Science Manifesto from a Self Taught Data Scientist Background I’m a self-taught data scientist, with about 5 years of data analyst experience and now about 5 years as a Data Scientist. I’m more math minded than the average person, but I’m not special. I have a bachelor’s degree in mechanical engineering, and have…

December 30, 2024
Weekly Entering & Transitioning – Thread 30 Dec, 2024 – 06 Jan, 2025

Weekly Entering & Transitioning – Thread 30 Dec, 2024 – 06 Jan, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

December 30, 2024
recommend me the best statistics textbook for data science

recommend me the best statistics textbook for data science I am intermediate level student who already studied stats , But i want to revisit it from DS and ML perspective submitted by /u/Emotional-Rhubarb725 [link] [comments] /u/Emotional-Rhubarb725 Go to original source

December 30, 2024
Looking for some Senior DS Advice

Looking for some Senior DS Advice Hello everyone, I think this is okay to be a post since it’s not about entering/transitioning, but if I need to repost in the weekly threads please let me know! TLDR: I started working as a Data Scientist at a medium to large company almost 3 years ago. I…

December 30, 2024
What are some of the most interesting applied ml papers/blogs you read in 2024 or projects you worked on

What are some of the most interesting applied ml papers/blogs you read in 2024 or projects you worked on I am looking for some interesting successful/unsuccessful real-world machine learning applications. You are also free to share experiences building applications with machine learning that have actually had some real world impact. Something of this type: LinkedIn…

December 30, 2024
Introducing n-Step Temporal-Difference Methods

Introducing n-Step Temporal-Difference Methods Dissecting “Reinforcement Learning” by Richard S. Sutton with custom Python implementations, Episode V Continue reading on Towards Data Science » Oliver S Go to original source

December 30, 2024
Superposition: What Makes it Difficult to Explain Neural Network

Superposition: What Makes it Difficult to Explain Neural Network When there are more features than model dimensions Introduction It would be ideal if the world of neural network represented a one-to-one relationship: each neuron activates on one and only one feature. In such a world, interpreting the model would be straightforward: this neuron fires for…

December 30, 2024
Segmenting Water in Satellite Images Using Paligemma

Segmenting Water in Satellite Images Using Paligemma Some insights on using Google’s latest Vision Language Model Hutt Lagoon, Australia. Depending on the season, time of day, and cloud coverage, this lake changes from red to pink or purple. Source: Google Maps. Multimodal models are architectures that simultaneously integrate and process different data types, such as text, images,…

December 30, 2024
Deep Dive into Multithreading, Multiprocessing, and Asyncio

Deep Dive into Multithreading, Multiprocessing, and Asyncio How to choose the right concurrency model Image by Paul Esch-Laurent from Unsplash Python provides three main approaches to handle multiple tasks simultaneously: multithreading, multiprocessing, and asyncio. Choosing the right model is crucial for maximising your program’s performance and efficiently using system resources. (P.S. It is also a common interview…

December 29, 2024
Measuring Cross-Product Adoption Using dbt_set_similarity

Measuring Cross-Product Adoption Using dbt_set_similarity Enhancing cross-product insights within dbt workflows Introduction For multi-product companies, one critical metric is often what is called “cross-product adoption”. (i.e. understanding how users engage with multiple offerings in a given product portfolio) One measure suggested to calculate cross-product or cross-feature usage in the popular book Hacking Growth [1] is…

December 29, 2024
Building Trust in LLM Answers: Highlighting Source Texts in PDFs

Building Trust in LLM Answers: Highlighting Source Texts in PDFs 100% accuracy isn’t everything: helping users navigate the document is the real value Continue reading on Towards Data Science » Angela & Kezhan Shi Go to original source

December 28, 2024
Introduction to the Finite Normal Mixtures in Regression with

Introduction to the Finite Normal Mixtures in Regression with Introduction to the Finite Normal Mixtures in Regression with R How to make linear regression flexible enough for non-linear data The linear regression is usually considered not flexible enough to tackle the nonlinear data. From theoretical viewpoint it is not capable to dealing with them. However, we…

December 28, 2024
Master Bots Before Starting with AI Agents: Simple Steps to Create a Mastodon Bot with Python

Master Bots Before Starting with AI Agents: Simple Steps to Create a Mastodon Bot with Python I recently published a post on Mastodon that was shared by six other accounts within two minutes. Curious, I visited the profiles and… Continue reading on Towards Data Science » Sarah Lea Go to original source

December 28, 2024
Unlocking the Untapped Potential of Retrieval-Augmented Generation (RAG) Pipelines

Unlocking the Untapped Potential of Retrieval-Augmented Generation (RAG) Pipelines Essential Metrics and Methods to Enhance Performance Across Retrieval, Generation, and End-to-End Pipelines Continue reading on Towards Data Science » Saleh Alkhalifa Go to original source

December 28, 2024
How To Start A Data Science Blog on Medium

How To Start A Data Science Blog on Medium Tips on how to get started, write your first article, and get noticed Continue reading on Towards Data Science » Haden Pelletier Go to original source

December 28, 2024
Track Computer Vision Experiments with MLflow

Track Computer Vision Experiments with MLflow Discover how to set up an efficient MLflow environment to track your experiments, compare and choose the best model for deployment Continue reading on Towards Data Science » Yağmur Çiğdem Aktaş Go to original source

December 27, 2024
Jingle Bells and Statistical Tests

Jingle Bells and Statistical Tests Data Types, Hypotheses and Statistical Tests That Fit Them with Festive Christmas Market Examples🎄🎅🎡 Continue reading on Towards Data Science » Gizem Kaya Go to original source

December 27, 2024
How Neural Networks Learn: A Probabilistic Viewpoint

How Neural Networks Learn: A Probabilistic Viewpoint Understanding loss functions for training neural networks Machine learning is very hands-on, and everyone charts their own path. There isn’t a standard set of courses to follow, as was traditionally the case. There’s no ‘Machine Learning 101,’ so to speak. However, this sometimes leaves gaps in understanding. If you’re…

December 27, 2024
Linearizing Attention

Linearizing Attention Breaking the quadratic barrier: modern alternatives to softmax attention Large Languange Models are great but they have a slight drawback that they use softmax attention which can be computationally intensive. In this article we will explore if there is a way we can replace the softmax somehow to achieve linear time complexity. Image…

December 27, 2024
Understanding the Mathematics of PPO in Reinforcement Learning

Understanding the Mathematics of PPO in Reinforcement Learning Deep dive into RL with PPO for beginners Photo by ThisisEngineering on Unsplash Introduction Reinforcement Learning (RL) is a branch of Artificial Intelligence that enables agents to learn how to interact with their environment. These agents, which range from robots to software features or autonomous systems, learn through…

December 27, 2024
Understanding When and How to Implement FastAPI Middleware (Examples and Use Cases)

Understanding When and How to Implement FastAPI Middleware (Examples and Use Cases) Supercharge Your FastAPI with Middleware: Practical Use Cases and Examples Continue reading on Towards Data Science » Mike Huls Go to original source

December 26, 2024
Three Important Pandas Functions You Need to Know

Three Important Pandas Functions You Need to Know Master these techniques to stand out as a Python developer Continue reading on Towards Data Science » Jiayan Yin Go to original source

December 26, 2024
Decoding the Hack behind Accurate Weather Forecasting: Variational Data Assimilation

Decoding the Hack behind Accurate Weather Forecasting: Variational Data Assimilation Learn how to implement the variational data assimilation, with mathematical details and PyTorch for efficient implementation. Continue reading on Towards Data Science » Wencong Yang, PhD Go to original source

December 26, 2024
Data-Driven Priors in the Maximum Entropy on the Mean Method for Linear Inverse Problems

Data-Driven Priors in the Maximum Entropy on the Mean Method for Linear Inverse Problems arXiv:2412.17916v1 Announce Type: new Abstract: We establish the theoretical framework for implementing the maximumn entropy on the mean (MEM) method for linear inverse problems in the setting of approximate (data-driven) priors. We prove a.s. convergence for empirical means and further develop…

December 25, 2024
An information theoretic limit to data amplification

An information theoretic limit to data amplification arXiv:2412.18041v1 Announce Type: new Abstract: In recent years generative artificial intelligence has been used to create data to support science analysis. For example, Generative Adversarial Networks (GANs) have been trained using Monte Carlo simulated input and then used to generate data for the same problem. This has the…

December 25, 2024
Fr’echet regression for multi-label feature selection with implicit regularization

Fr’echet regression for multi-label feature selection with implicit regularization arXiv:2412.18247v1 Announce Type: new Abstract: Fr’echet regression extends linear regression to model complex responses in metric spaces, making it particularly relevant for multi-label regression, where each instance can have multiple associated labels. However, variable selection within this framework remains underexplored. In this paper, we pro pose…

December 25, 2024
Heterogeneous transfer learning for high dimensional regression with feature mismatch

Heterogeneous transfer learning for high dimensional regression with feature mismatch arXiv:2412.18081v1 Announce Type: new Abstract: We consider the problem of transferring knowledge from a source, or proxy, domain to a new target domain for learning a high-dimensional regression model with possibly different features. Recently, the statistical properties of homogeneous transfer learning have been investigated. However,…

December 25, 2024
A Statistical Framework for Ranking LLM-Based Chatbots

A Statistical Framework for Ranking LLM-Based Chatbots arXiv:2412.18407v1 Announce Type: new Abstract: Large language models (LLMs) have transformed natural language processing, with frameworks like Chatbot Arena providing pioneering platforms for evaluating these models. By facilitating millions of pairwise comparisons based on human judgments, Chatbot Arena has become a cornerstone in LLM evaluation, offering rich datasets…

December 25, 2024
Probability Distributions: Poisson vs. Binomial Distribution

Probability Distributions: Poisson vs. Binomial Distribution Using Soccer to Understand the Difference Between Poisson & Binomial: Probability for Data Science Series (3) Continue reading on Towards Data Science » Sunghyun Ahn Go to original source

December 25, 2024
2024 Survival Guide for Machine Learning Engineer Interviews

2024 Survival Guide for Machine Learning Engineer Interviews A year-end summary for junior-level MLE interview preparation Job-seeking is hard! In today’s market, job-seeking for machine learning-related roles is more complex than ever. Even though public reports claim that the job demand for machine learning engineers (MLE) is fast growing, the fact is that the market has…

December 25, 2024
A Bird’s-Eye View of Linear Algebra: Orthonormal Matrices

A Bird’s-Eye View of Linear Algebra: Orthonormal Matrices Orthonormal matrices: the most elegant matrices in all of linear algebra. Continue reading on Towards Data Science » Rohit Pandey Go to original source

December 25, 2024
I’m Doing the Advent of Code 2024 in Python — Day 4

I’m Doing the Advent of Code 2024 in Python — Day 4 Let’s see how many stars we’ll collect. Continue reading on Towards Data Science » Soner Yıldırım Go to original source

December 25, 2024
Design Patterns with Python for Machine Learning Engineers: Template Method

Design Patterns with Python for Machine Learning Engineers: Template Method Learn how to use the Template design pattern to enhance your code Continue reading on Towards Data Science » Marcello Politi Go to original source

December 25, 2024
Robust random graph matching in dense graphs via vector approximate message passing

Robust random graph matching in dense graphs via vector approximate message passing arXiv:2412.16457v1 Announce Type: new Abstract: In this paper, we focus on the matching recovery problem between a pair of correlated Gaussian Wigner matrices with a latent vertex correspondence. We are particularly interested in a robust version of this problem such that our observation…

December 24, 2024
Fast Multi-Group Gaussian Process Factor Models

Fast Multi-Group Gaussian Process Factor Models arXiv:2412.16773v1 Announce Type: new Abstract: Gaussian processes are now commonly used in dimensionality reduction approaches tailored to neuroscience, especially to describe changes in high-dimensional neural activity over time. As recording capabilities expand to include neuronal populations across multiple brain areas, cortical layers, and cell types, interest in extending Gaussian…

December 24, 2024