Category: datascience

  • What a Drunk Man Can Teach Us About Time Series Forecasting

    What a Drunk Man Can Teach Us About Time Series Forecasting Autocorrelation & The Random Walk explained with a drunk man šŸŗ Let me illustrate this statistical concept with an example we can all visualize. Imagine a drunk man wandering a city. His steps are completely random and unpredictable. Here’s the intuition: – His current…

  • Relationship between ROC AUC and Gain curve?

    Relationship between ROC AUC and Gain curve? Heya, I been studying the gains curve, and I’ve noticed there’s a relationship between the gains curve and ROC curve the smaller the base rate the closer is gains curve is to ROC curve. Anyway onto the point, is if fair to assume that for two models if…

  • Oscillatory Coordination in Cognitive Architectures: Old Dog, New Math

    Oscillatory Coordination in Cognitive Architectures: Old Dog, New Math Been working in AI since before it was cool (think 80s expert systems, not ChatGPT hype). Lately I’ve been developing this cognitive architecture called OGI that uses Top-K gating between specialized modules. Works well, proved the stability, got the complexity down to O(k²). But something’s been…

  • Weekly Entering & Transitioning – Thread 22 Sep, 2025 – 29 Sep, 2025

    Weekly Entering & Transitioning – Thread 22 Sep, 2025 – 29 Sep, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

  • Need input from mid-career dara Scientists (2-5 year range)

    Need input from mid-career dara Scientists (2-5 year range) I am a DS with 2YOE (plus about 6 coops). I’m looking for feedback from folks specifically transitioned out of early career and into mid-career phase. (Unfortunately I don’t have any in my immediate network) Context: I’m coming upto 2 years in my role and have…

  • Is it due to the tech recession?

    Is it due to the tech recession? We know that in many companies Data Scientists are Product Analytics / Data Analysts. I thought it was because MLEs had absorbed the duties of DSs, but i have noticed that this may not be exactly the case. There are basically three distinct roles: Data Analyst / Product…

  • What’s the right thing to say to salary expectations question?

    What’s the right thing to say to salary expectations question? I have come across usually two types of scenarios here and I am not sure what’s the best way to deal. I ask for a range and they give you range. Should you just say you’re okay with the range? But what if I make…

  • Updated based on subreddit feedback. Applying for mid-senior based roles. Thank you

    Updated based on subreddit feedback. Applying for mid-senior based roles. Thank you submitted by /u/StormyT [link] [comments] /u/StormyT Go to original source

  • Weekly Entering & Transitioning – Thread 15 Sep, 2025 – 22 Sep, 2025

    Weekly Entering & Transitioning – Thread 15 Sep, 2025 – 22 Sep, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

  • Has anyone validated synthetic financial data (Gaussian Copula vs CTGAN) in practice?

    Has anyone validated synthetic financial data (Gaussian Copula vs CTGAN) in practice? I’ve been experimenting with generating synthetic datasets for financial indicators (GDP, inflation, unemployment, etc.) and found that CTGAN offered stronger privacy protection in simple linkage tests, but its overall analytical utility was much weaker. In contrast, Gaussian Copula provided reasonably strong privacy and…

  • Texts for creating better visualizations/presentations?

    Texts for creating better visualizations/presentations? I started working for an HR team and have been tasked with creating visualizations, both in PowerPoint (I’ve been using Seaborn and Matplotlib for visualizations) and PowerBI Dashboards. I’ve been having a lot of fun creating visualizations, but I’m looking for a few texts or maybe courses/videos about design. Anything…

  • Does meta only have product analytics?

    Does meta only have product analytics? I have been told that all meta data scientists are all product analysts meaning that they do ab tests and sql. Despite this, i ve been told by friends of mine that google, amazon, uber… they all have two different types of data scientist: one doing product analytics and…

  • Database tools and method for tree structured data?

    Database tools and method for tree structured data? I have a database structure which I believe is very common, and very general, so I’m wondering how this is tackled. The database structured like: -> Project (Name of project) -> Category (simple word, ~20 categories) -> Study Study is a directory containing: – README with date…

  • Weekly Entering & Transitioning – Thread 08 Sep, 2025 – 15 Sep, 2025

    Weekly Entering & Transitioning – Thread 08 Sep, 2025 – 15 Sep, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

  • šŸš€ Perpetual ML Suite: Now Live on the Snowflake Marketplace!

    šŸš€ Perpetual ML Suite: Now Live on the Snowflake Marketplace! submitted by /u/mutlu_simsek [link] [comments] /u/mutlu_simsek Go to original source

  • Europe Salary Thread 2025 – What’s your role and salary?

    Europe Salary Thread 2025 – What’s your role and salary? The yearly Europe-centric salary thread. You can find the last one here: https://old.reddit.com/r/datascience/comments/1fxrmzl/europe_salary_thread_2024_whats_your_role_and/ I think it’s worthwhile to learn from one another and see what different flavours of data scientists, analysts and engineers are out there in the wild. In my opinion, this is especially…

  • Help me evaluate a new job offer – Stay or go?

    Help me evaluate a new job offer – Stay or go? Hi all, I’m having a really hard time deciding whether or not to take an offer I’ve recently received, would really appreciate some advice and a sense check. For context I generally feel my current role is comfortable but i’m starting to plateau after…

  • How to evaluate data transformations?

    How to evaluate data transformations? There are several well-established benchmarks for text-to-SQL tasks like BIRD, Spider, and WikiSQL. However, I’m working on a data transformation system that handles per-row transformations with contextual understanding of the input data. The challenge is that most existing benchmarks focus on either: Pure SQL generation (BIRD, Spider) Simple data cleaning…

  • Weekly Entering & Transitioning – Thread 01 Sep, 2025 – 08 Sep, 2025

    Weekly Entering & Transitioning – Thread 01 Sep, 2025 – 08 Sep, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

  • How do I prepare for my data science job as a new grad?

    How do I prepare for my data science job as a new grad? I just graduated from my bachelors in May. Recently, I’ve been fortunate enough to receive an offer as a data scientist I at a unicorn where most of the people on the ds team have PhDs. My job starts in a month…

  • Let’s Build Something Together

    Let’s Build Something Together Hey everyone, After my last post about my struggles in finding a remote job, I was honestly blown away. I got over 50 messages not with job offers, but with stories, frustrations, and suggestions. The common theme? Many of us are stuck. Some are trying to break into the market, others…

  • Advice for DS/AS/MLE interviews

    Advice for DS/AS/MLE interviews I am looking for data scientist (ML heavy), applied scientist or ML engineer roles in product based companies. For my interview preperation, I am unsure about which book or resources to pick so that I can cover the rigor of ML rounds in these interviews. I have background in CS and…

  • Career Dilemma

    Career Dilemma submitted by /u/NervousVictory1792 [link] [comments] /u/NervousVictory1792 Go to original source

  • Weekly Entering & Transitioning – Thread 25 Aug, 2025 – 01 Sep, 2025

    Weekly Entering & Transitioning – Thread 25 Aug, 2025 – 01 Sep, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

  • Day to day work at lead/principal data scientist

    Day to day work at lead/principal data scientist Hi, I have 9 years of experience in ml/dl. I have been looking for a role in lead/principal ds. Can you tell me what expectations do you guys face at the role. Data science knowledge? Ml ops knowledge? Team management? submitted by /u/sourabharsh [link] [comments] /u/sourabharsh Go…

  • Google’s new Research : Measuring the environmental impact of delivering AI at Google Scale

    Google’s new Research : Measuring the environmental impact of delivering AI at Google Scale Google has dropped in a very important research paper measuring the impact of AI on the environment, suggesting how much carbon emission, water, and energy consumption is done for running a prompt on Gemini. Surprisingly, the numbers have been quite low…

  • Generating passages similar in style to a set of 9 examples (Question)

    Generating passages similar in style to a set of 9 examples (Question) Hello everyone I hope I can find some guidance here for a project in generative AI. I have a set of 9 short passages from a TOEFL-like English test. I need to generate more passages that match the style of the examples set.…

  • NVIDIA new paper : Small Language Models are the Future of Agentic AI

    NVIDIA new paper : Small Language Models are the Future of Agentic AI NVIDIA have just published a paper claiming SLMs (small language models) are the future of agentic AI. They provide a number of claims as to why they think so, some important ones being they are cheap. Agentic AI requires just a tiny…

  • Weekly Entering & Transitioning – Thread 18 Aug, 2025 – 25 Aug, 2025

    Weekly Entering & Transitioning – Thread 18 Aug, 2025 – 25 Aug, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

  • Dijkstra defeated: New Shortest Path Algorithm revealed

    Dijkstra defeated: New Shortest Path Algorithm revealed Dijkstra, the goto shortest path algorithm (time complexity nlogn) has now been outperformed by a new algorithm by top Chinese University which looks like a hybrid of bellman ford+ dijsktra algorithm. Paper : https://arxiv.org/abs/2504.17033 Algorithm explained with example : https://youtu.be/rXFtoXzZTF8?si=OiB6luMslndUbTrz submitted by /u/Technical-Love-8479 [link] [comments] /u/Technical-Love-8479 Go to…

  • Curious to know about people who switched from DS to DE or SWE or Solutions Architect

    Curious to know about people who switched from DS to DE or SWE or Solutions Architect Hello, I was just curious to know about people who have switched from DS to DE or SWE or Solutions Architect. If you have done it, what was your rationale behind doing it, what pushed or motivated you for…

  • R-Zero : Self-Evolving Reasoning LLM from Zero Data

    R-Zero : Self-Evolving Reasoning LLM from Zero Data R-Zero by Tencent introduces a concept to train LLMs without any labelled data and aims towards self-improving AI without human intervention. It works on the similar principle of GANs i.e. involving a Challenger and Solver where one generates questions and other Solves them. Paper : https://arxiv.org/abs/2508.05004?ref=mackenziemorehead.com Video…

  • How different is “Senior Data Analyst” from “Data Scientist”?

    How different is “Senior Data Analyst” from “Data Scientist”? I often see Senior DA roles that seem focused on using R/Python for analysis (vs. Excel and Power BI), but don’t have any insight into the day-to-day of theese roles. At the senior level, how different is Data Analyst from Data Scientist? submitted by /u/empirical-sadboy [link]…

  • Weekly Entering & Transitioning – Thread 11 Aug, 2025 – 18 Aug, 2025

    Weekly Entering & Transitioning – Thread 11 Aug, 2025 – 18 Aug, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

  • Catch-22: Learning R through “hands on” Projects

    Catch-22: Learning R through “hands on” Projects I often get told “learn data science by doing hands-on projects” and then I get all fired up and motivated to learn, and then I open up R…. And then I stare at a blank screen because I don’t know the syntax from memory. And then I tell…

  • AI isn’t taking your job. Executives are.

    AI isn’t taking your job. Executives are. If AI is ready to replace developers, why aren’t developers replacing themselves with AI and just taking it easy at work? I’m a Director at my company. I’m in the meetings and helping set up the tools that cost people their jobs. Here’s how they work: Claude AI…

  • Burnout, disillusionment, and imposter syndrome after 1 year in DS. Am I just an API monkey? Reality check needed.

    Burnout, disillusionment, and imposter syndrome after 1 year in DS. Am I just an API monkey? Reality check needed. Hey folks, I am about a year into my first data science job. It took roughly a year and more than 400 applications to land it, so the idea of another long search is scary. Early…

  • Business focused data science

    Business focused data science As a microbiology researcher, I’m far away from the business world. I do more -omics and growth curves and molecular techniques, but I want to move away from biology. I believe the bridge that can help me do that is data. I have got experience with R and excel. I’m looking…

  • Personal projects and skill set

    Personal projects and skill set Hi everyone, I was just wondering how do you guys specify personal acquired skills from your personal projects in your CV. I’m in the midst of a pretty large project – end to end pipeline for predicting real time probabilities of winning chances in a game. This includes a lot…

  • Weekly Entering & Transitioning – Thread 04 Aug, 2025 – 11 Aug, 2025

    Weekly Entering & Transitioning – Thread 04 Aug, 2025 – 11 Aug, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

  • Built this out of pure laziness for all my Feature engineering/model training jobs

    Built this out of pure laziness for all my Feature engineering/model training jobs Built this out of pure laziness A lightweight Telegram bot that lets me: – Get Databricks job alerts – Check today’s status – Repair failed runs – Pause/reschedule , All from my phone. No laptop. No dashboard. Just / Commands. submitted by…

  • Is there a term for internal processing vs data that needs to be stakeholding/customer facing?

    Is there a term for internal processing vs data that needs to be stakeholding/customer facing? For example I had my physical credit card stolen. I was trying to get information from the CC company about when the card was used so that the local PD could check security cameras. (We thought it was particular person…

  • Hi! i am a junior dev need advice regarding fraud/risk scoring (not credit) on my rules based fraud detection system.

    Hi! i am a junior dev need advice regarding fraud/risk scoring (not credit) on my rules based fraud detection system. so i our team has developed a rules based fraud detecton system….now we have received a new requirement that we have to score every transaction as how much risky or if flagged as fraud how…

  • Weekly Entering & Transitioning – Thread 28 Jul, 2025 – 04 Aug, 2025

    Weekly Entering & Transitioning – Thread 28 Jul, 2025 – 04 Aug, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

  • New Grad Data Scientist feeling overwhelmed and disillusioned at first job

    New Grad Data Scientist feeling overwhelmed and disillusioned at first job Hi all, I recently graduated with a degree in Data Science and just started my first job as a data scientist. The company is very focused on staying ahead/keeping up with the AI hype train and wants my team (which has no other data…

  • why OneHotEncoder give better results than get.dummies/reindex?

    why OneHotEncoder give better results than get.dummies/reindex? I can’t figure out why I get a better score with OneHotEncoder : preprocessor = ColumnTransformer( transformers=[ (‘cat’, categorical_transformer, categorical_cols) ], remainder=’passthrough’ # <– this keeps the numerical columns ) model_GBR = GradientBoostingRegressor(n_estimators=1100, loss=’squared_error’, subsample = 0.35, learning_rate = 0.05,random_state=1) GBR_Pipeline = Pipeline(steps=[(‘preprocessor’, preprocessor),(‘model’, model_GBR)]) than get.dummies/reindex: X_test…

  • Can LLMs Reason – I don’t know, depends on the definition of reasoning. Denny Zhou – Founder/Lead of Google Deepmind LLM Reasoning Team

    Can LLMs Reason – I don’t know, depends on the definition of reasoning. Denny Zhou – Founder/Lead of Google Deepmind LLM Reasoning Team AI influencers: LLMs can think given this godly prompt bene gesserit oracle of the world blahblah, hence xxx/yyy/zzz is dead. See more below. Meanwhile, literally the founder/lead of the reasoning team: https://preview.redd.it/z9uwnummqeff1.png?width=652&format=png&auto=webp&s=c84727d328d059504adf64768b8badac45d20611…

  • Anomoly detection with only categorical variables

    Anomoly detection with only categorical variables Hello everyone, I have an anomoly detection project but all of my data is categorical. I suppose I could try and ask them to change it prediction but does anyone have any advice. The goal is to there are groups within the data and and do an analysis to…

  • Weekly Entering & Transitioning – Thread 21 Jul, 2025 – 28 Jul, 2025

    Weekly Entering & Transitioning – Thread 21 Jul, 2025 – 28 Jul, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

  • Company Killed University Programs

    Company Killed University Programs Normally, I would have a post around this time hyping up fall recruiting and trying to provide pointers. The company I work for has decided to hire no additional entry level data scientists this year outside of intern return offers. They have also cut the number of intern positions in half…

  • Detect LLM hallucinations using uncertainty quantification techniques with UQLM

    Detect LLM hallucinations using uncertainty quantification techniques with UQLM UQLM (uncertainty quantification for language models) is an open source Python package for generation time, zero-resource hallucination detection. It leverages state-of-the-art uncertainty quantification (UQ) techniques from the academic literature to compute response-level confidence scores based on response consistency (in multiple responses to the same prompt), token…

  • How would you structure a project (data frame) to scrape and track listing changes over time?

    How would you structure a project (data frame) to scrape and track listing changes over time? I’m working on a project where I want to scrape data daily (e.g., real estate listings from a site like RentFaster or Zillow) and track how each listing changes over time. I want to be able to answer questions…

  • Generating random noise for media data

    Generating random noise for media data Hey everyone – I work on an ML team in the industry, and I’m currently building a predictive model to catch signals in live media data to sense when potential viral moments or crises are happening for brands. We have live media trackers at my company that capture all…

  • Weekly Entering & Transitioning – Thread 14 Jul, 2025 – 21 Jul, 2025

    Weekly Entering & Transitioning – Thread 14 Jul, 2025 – 21 Jul, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

  • How much DSA for FAANG+ ?

    How much DSA for FAANG+ ? Hello all, I am going to be graduating in 6 months and have been practicing Leetcode as I believe this to be my weakest point. I have solved 250 LC with 130 Easy and 120 Hard, covering concepts like arrays, hashing, binary trees, SQL, linked list, two pointers, stack,…

  • Toto: A Foundation Time-Series Model Optimized for Observability Data

    Toto: A Foundation Time-Series Model Optimized for Observability Data Datadog open-sourced Toto (Time Series Optimized Transformer for Observability), a model purpose-built for observability data. Toto is currently the most extensively pretrained time-series foundation model: The pretraining corpus contains 2.36 trillion tokens, with ~70% coming from Datadog’s private telemetry dataset. Also, Toto currently ranks 2nd in…

  • How do you efficiently traverse hundreds of features in the dataset?

    How do you efficiently traverse hundreds of features in the dataset? Currently, working on a fintech classification algorithm, with close to a thousand features which is very tiresome. I’m not a domain expert, so creating sensible hypotesis is difficult. How do you tackle EDA and forming reasonable hypotesis in these cases? Even with proper documentation…

  • The right questions to find clusters (tangles)

    The right questions to find clusters (tangles) Hey everyone, I’m currently working on my bachelor’s thesis and I’m hitting a creative block on a central part – maybe you have some ideas or impulses for me. My dataset consists of 100,000 cleaned job postings from Kaggle (title + description). The goal of my thesis is…

  • Weekly Entering & Transitioning – Thread 07 Jul, 2025 – 14 Jul, 2025

    Weekly Entering & Transitioning – Thread 07 Jul, 2025 – 14 Jul, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

  • Long-timers at companies — what’s your secret?

    Long-timers at companies — what’s your secret? Hi everyone, I’ve been a job hopper throughout my career—never stayed at one place for more than 1-2 years, usually for various reasons. Now, I’m entering a phase where I want to get more settled. I’m about to start a new job and would love to hear from…

  • Reliable DS Adjacent Fields Hiring for Bachelor’s Degree?

    Reliable DS Adjacent Fields Hiring for Bachelor’s Degree? Hello all. To try and condense a lot of context for this question, I am an adult who went back to school to complete my bachelor’s, in order to support myself and my partner on one income. Admittedly, I did this because I heard how good data…

  • A Brief Guide to UV

    A Brief Guide to UV Python has been largely devoid of easy to use environment and package management tooling, with various developers employing their own cocktail of pip, virtualenv, poetry, and conda to get the job done. However, it looks like uv is rapidly emerging to be a standard in the industry, and I’m super…

  • With Generative AI looking so ominous, would there be any further research in any other domains like Computer Vision or NLP or Graph Analytics ever?

    With Generative AI looking so ominous, would there be any further research in any other domains like Computer Vision or NLP or Graph Analytics ever? So as the title suggest, last few years have been just Generative AI all over the place. Every new research is somehow focussed towards it. So does this mean other…

  • Weekly Entering & Transitioning – Thread 30 Jun, 2025 – 07 Jul, 2025

    Weekly Entering & Transitioning – Thread 30 Jun, 2025 – 07 Jul, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

  • ICs who pivoted: did you go engineering or management?

    ICs who pivoted: did you go engineering or management? Hitting that point where I feel like I need to pick a lane. Curious what others did. Did you double down on technical stuff (data engineering/MLE/SWE), switched to the product side, or did you move into people management? submitted by /u/ergodym [link] [comments] /u/ergodym Go to…

  • Unpopular Opinion: These are the most useless posters on LinkedIn

    Unpopular Opinion: These are the most useless posters on LinkedIn LinkedIn influencers love to treat the two roles as different species. In most enterprises, especially in mid to small orgs, these roles are largely overlapping. submitted by /u/OverratedDataScience [link] [comments] /u/OverratedDataScience Go to original source

  • How’s the job market for Bayesian statistics?

    How’s the job market for Bayesian statistics? I’m a data scientist with 1 YOE. mostly worked on credit scoring models, sql, and Power BI. Lately, I’ve been thinking of going deeper into bayesian statistics and I’m currently going through the statistical rethinking book. But I’m wondering. is it worth focusing heavily on bayesian stats? Or…

  • Is ML/AI engineering increasingly becoming less focused on model training and more focused on integrating LLMs to build web apps?

    Is ML/AI engineering increasingly becoming less focused on model training and more focused on integrating LLMs to build web apps? One thing I’ve noticed recently is that increasingly, a lot of AI/ML roles seem to be focused on ways to integrate LLMs to build web apps that automate some kind of task, e.g. chatbot with…

  • Weekly Entering & Transitioning – Thread 23 Jun, 2025 – 30 Jun, 2025

    Weekly Entering & Transitioning – Thread 23 Jun, 2025 – 30 Jun, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

  • [Project] I just open-sourced a plugin to stop AI from hallucinating your schemas

    [Project] I just open-sourced a plugin to stop AI from hallucinating your schemas Hey r/datascience šŸ‘‹ Using AI tools like Copilot or Cursor can be a total headache for data science work. You’re trying to join tables, and it confidently suggests customer_id when your table actually uses cust_pk. Or worse, it just invents tables that…

  • I have run DS interviews and wow!

    I have run DS interviews and wow! Hey all, I have been responsible for technical interviews for a Data Scientist position and the experience was quite surprising to me. I thought some of you may appreciate some insights. A few disclaimers: I have no previous experience running interviews and have had no training at all…

  • Would you do this job if you were rich enough to retire?

    Would you do this job if you were rich enough to retire? Curious your perspective on this. Many of us got into the field because it was lucrative and ensures a stable living, But it also is intrinsically interesting to study and challenge yourself. The personalities attracted to tech are often fun and make work…

  • ML case study rounds

    ML case study rounds I am asking this from context of interview. In almost every company these days, there is an ML case study round where the focus is on solving a real world case study. Idk if this is somewhat similar to ML system design or not (I think ML system design rounds are…

  • Weekly Entering & Transitioning – Thread 16 Jun, 2025 – 23 Jun, 2025

    Weekly Entering & Transitioning – Thread 16 Jun, 2025 – 23 Jun, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

  • Don’t be the data scientist who’s in love with models, be the one who solves real problems

    Don’t be the data scientist who’s in love with models, be the one who solves real problems work at a company with around 100 data scientists, ML and data engineers. The most frustrating part of working with many data scientists and honestly, I see this on this sub all the time too, is how obsessed…

  • Books on applied data science for B2B marketing?

    Books on applied data science for B2B marketing? There’s this thread from 3 years ago: https://www.reddit.com/r/datascience/comments/ram75g/books_on_applied_data_science_for_b2b_marketing/ Unfortunately, it never got any book recommendations – I’m in pretty much the exact same position as the OP of the linked thread and am looking for resources that explain the best methods and provide practical how-tos for marketing…

  • “Data Annotation” spam

    “Data Annotation” spam Anyone else’s job search site just absolutely spammed by Data Annotation? If I look up Data, ML, AI, or anything similar in my area I get 2-3 pages of there job posting. submitted by /u/MahaloMerky [link] [comments] /u/MahaloMerky Go to original source

  • Significant humor

    Significant humor Saw this and found it hilarious , thought I’d share it here as this is one of the few places this joke might actually land. Datetime.now() + timedelta(days=4) submitted by /u/MamboAsher [link] [comments] /u/MamboAsher Go to original source

  • Weekly Entering & Transitioning – Thread 09 Jun, 2025 – 16 Jun, 2025

    Weekly Entering & Transitioning – Thread 09 Jun, 2025 – 16 Jun, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

  • PhD vs Masters prepared data scientist expectations.

    PhD vs Masters prepared data scientist expectations. Is there anything more that you expect from a data scientist with a PhD versus a data scientist with just a master’s degree, given the same level of experience? For the companies that I’ve worked with, most data science teams were mixes of folks with master’s degrees and…

  • What is your domain and what are the most important technical skills that help you stand out in your domain?

    What is your domain and what are the most important technical skills that help you stand out in your domain? Aside from soft skills and domain expertise, ofc those are a given. I’m manufacturing-adjacent (closer to product development and validation). Design of experiments has been my most useful data-related skill. I’m always being asked “We…

  • Data analyst vs. engineer? At non-profit

    Data analyst vs. engineer? At non-profit Hi all, I am the only Data Analyst at a medium-sized company related to shared transportation (adjacent to Lime Scooter/Bike). I’m pretty early in my career (grad from college 3 years ago). My role encompasses a LOT of responsibilities that aren’t traditionally under “data analyst”, the biggest of which…

  • You can now automate deep dives, with clear actionable recommendations based on data.

    You can now automate deep dives, with clear actionable recommendations based on data. submitted by /u/phicreative1997 [link] [comments] /u/phicreative1997 Go to original source

  • Weekly Entering & Transitioning – Thread 02 Jun, 2025 – 09 Jun, 2025

    Weekly Entering & Transitioning – Thread 02 Jun, 2025 – 09 Jun, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

  • How I scraped 4.1 million jobs with GPT4o-mini

    How I scraped 4.1 million jobs with GPT4o-mini Background: During my PhD in Data Science at Stanford, I got sick and tired of ghost jobs & 3rd party offshore agencies on LinkedIn & Indeed. So I wrote a script that fetches jobs from 100k+ company websites’ career pages and uses GPT4o-mini to extract relevant information…

  • Can data science be used in computer networking (if not can it be used in cybersecurity)?

    Can data science be used in computer networking (if not can it be used in cybersecurity)? Hi, I’m a high schooler (junior year) who is extremely interested in data science to the point where it is the main career field I want to go into. However, I got enrolled in a program where we train…

  • Advice on processing ~1M jobs/month with LLaMA for cost savings

    Advice on processing ~1M jobs/month with LLaMA for cost savings I’m using GPT-4o-mini to process ~1 million jobs/month. It’s doing things like deduplication, classification, title normalization, and enrichment. This setup is fast and easy, but the cost is starting to hurt. I’m considering distilling this pipeline into an open-source LLM, like LLaMA 3 or Mistral,…

  • What is your functional area?

    What is your functional area? I don’t mean industry. I mean product, operations, etc. I work in operations. I don’t grow the business. I keep the business alive. submitted by /u/Trick-Interaction396 [link] [comments] /u/Trick-Interaction396 Go to original source

  • Weekly Entering & Transitioning – Thread 26 May, 2025 – 02 Jun, 2025

    Weekly Entering & Transitioning – Thread 26 May, 2025 – 02 Jun, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

  • 2025 stack check: which DS/ML tools am I missing?

    2025 stack check: which DS/ML tools am I missing? Hi all, I work in ad-tech, where my job is to improve the product with data-driven algorithms, mostly on tabular datasets (CTR models, bidding, attribution, the usual). Current work stack (quite classic I guess) pandas, numpy, scikit-learn, xgboost, statsmodels PyTorch (light use) JupyterLab & notebooks matplotlib,…

  • Found a really amazing video , providing context to the breakthrough as well as the misconceived hype around Alphaevolve

    Found a really amazing video , providing context to the breakthrough as well as the misconceived hype around Alphaevolve I am sure by now most of us would have seen or atleast heard about AlphaEvolve and it’s many breakthroughs including the 4*4 MM improvement. While this was a fantastic step forward in constrained optimisation problems…

  • Can you explain to me the product analytics job?

    Can you explain to me the product analytics job? I ve watched videos about Data Scientist Product Analytics but i still dont understand if the job would excite me. Can someone explain it more in depth so that i can understand if i like it? I like the data science job (i am pursuing a…

  • Is studying Data Science still worth it?

    Is studying Data Science still worth it? Hi everyone, I’m currently studying data science, but I’ve been hearing that the demand for data scientists is decreasing significantly. I’ve also been told that many data scientists are essentially becoming analysts, while the machine learning side of things is increasingly being handled by engineers. Does it still…

  • Weekly Entering & Transitioning – Thread 19 May, 2025 – 26 May, 2025

    Weekly Entering & Transitioning – Thread 19 May, 2025 – 26 May, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

  • Weekly Entering & Transitioning – Thread 12 May, 2025 – 19 May, 2025

    Weekly Entering & Transitioning – Thread 12 May, 2025 – 19 May, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

  • Weekly Entering & Transitioning – Thread 05 May, 2025 – 12 May, 2025

    Weekly Entering & Transitioning – Thread 05 May, 2025 – 12 May, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

  • Weekly Entering & Transitioning – Thread 28 Apr, 2025 – 05 May, 2025

    Weekly Entering & Transitioning – Thread 28 Apr, 2025 – 05 May, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

  • Weekly Entering & Transitioning – Thread 21 Apr, 2025 – 28 Apr, 2025

    Weekly Entering & Transitioning – Thread 21 Apr, 2025 – 28 Apr, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

  • Weekly Entering & Transitioning – Thread 14 Apr, 2025 – 21 Apr, 2025

    Weekly Entering & Transitioning – Thread 14 Apr, 2025 – 21 Apr, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

  • Weekly Entering & Transitioning – Thread 07 Apr, 2025 – 14 Apr, 2025

    Weekly Entering & Transitioning – Thread 07 Apr, 2025 – 14 Apr, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…