{"id":355,"date":"2024-12-04T07:02:35","date_gmt":"2024-12-04T07:02:35","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2024\/12\/04\/becoming-a-data-scientist-what-i-would-do-if-i-had-to-start-over-655f0476b462\/"},"modified":"2024-12-04T07:02:35","modified_gmt":"2024-12-04T07:02:35","slug":"becoming-a-data-scientist-what-i-would-do-if-i-had-to-start-over-655f0476b462","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2024\/12\/04\/becoming-a-data-scientist-what-i-would-do-if-i-had-to-start-over-655f0476b462\/","title":{"rendered":"Becoming a Data Scientist: What I Would Do If I Had to Start Over"},"content":{"rendered":"<p>    Becoming a Data Scientist: What I Would Do If I Had to Start Over<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<h4>Breaking into data science: The Good, the Bad, and the Python\u00a0Bugs<\/h4>\n<figure><img decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1024\/0*AElcN6BNlVV21Wy-\"><figcaption>Photo by <a href=\"https:\/\/unsplash.com\/@markusspiske?utm_source=medium&amp;utm_medium=referral\">Markus Spiske<\/a> on\u00a0<a href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\">Unsplash<\/a><\/figcaption><\/figure>\n<p>Martin Luther King Jr. is famous for his speech, <strong>\u201c<\/strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/I_Have_a_Dream\">I Have a Dream<\/a>.<strong>\u201d<\/strong> He delivered it at the Lincoln Memorial in Washington, D.C., on August 28, 1963, in front of approximately 250,000 persons. It\u2019s considered one of the most important speeches of the 20th century. It played a crucial role in the civil rights movement for Black Americans.<\/p>\n<p>During this speech, he said that he dreamed of a day when his four children would live in a nation where people will not be judged by the color of their skin but by the content of their character.<\/p>\n<p>I also had a dream several years ago. It was not as glorious or reshaped the course of history as Martin Luther King&#8217;s. I aspired to become a data scientist.<\/p>\n<p>It wasn\u2019t for the prestige or because it was trendy (and still is) but because I genuinely love working with data, solving complex problems, and leveraging insights to drive business results. Becoming a data scientist was where my unique skills and passions met. You know, that <a href=\"https:\/\/medium.com\/towards-data-science\/the-one-mindset-change-that-launched-me-into-data-science-3f72bd1df46f\">sweet spot that leads to a fulfilling career<\/a>.<\/p>\n<p>My journey wasn\u2019t straightforward. I didn\u2019t know where to start, nor did I know what to do next. I took various courses, many of which turned out to be unhelpful. I also read countless articles about data science. While becoming a data scientist requires hard work, I spent a lot of effort on things that ultimately weren\u2019t necessary.<\/p>\n<p>I wish someone had given me the guidance I\u2019m about to share with you. This is the purpose of this article. The good news? Following these steps won\u2019t guarantee a job as a data scientist, but they will significantly improve your chances\u2026 even without a PhD! I know several professionals who have excelled as data scientists without doctorates. Success in this field is mainly about persistence and practical experience.<\/p>\n<h3>Start Somewhere, Start\u00a0Now<\/h3>\n<blockquote><p>\u201c<a href=\"https:\/\/en.wikiquote.org\/wiki\/Plato\">The beginning is the most important part of any\u00a0work<\/a>.\u201d<\/p><\/blockquote>\n<blockquote><p>\u2014 Plato<\/p><\/blockquote>\n<p><a href=\"https:\/\/pmc.ncbi.nlm.nih.gov\/articles\/PMC3591461\/\">Research shows<\/a> that a toddler takes about 14,000 steps and experiences 100 falls per day over 2\u20133 months before mastering walking. Yet, they persist, never considering giving\u00a0up.<\/p>\n<p>In contrast, as adults, we often do the opposite. We tend to abandon as soon as we encounter obstacles. Where an adult might see 100 failures, a baby sees 100 learning opportunities. The baby doesn\u2019t overanalyze its failure or overcalculate the risks. It simply starts, tries, falls, and tries\u00a0again!<\/p>\n<p>Consider the story of Justin Kan, the co-founder of Twitch. His entrepreneurial journey didn\u2019t start with a blockbuster success. It began with what he called a \u201c<a href=\"https:\/\/www.linkedin.com\/posts\/justinkan_shitty-first-startups-activity-6870154231052025856-gzFV\/\">shitty first startup<\/a>\u201d named Kiko, an online calendar app. Kiko was competing against giants like Google Calendar, but it was eventually sold on eBay for $258,100!<\/p>\n<p>Next, he launched Justin.tv, a platform where he live-streamed his life 24\/7. Justin.tv eventually became Twitch, a live-streaming platform focused on gaming. In 2014, Amazon acquired Twitch for $970\u00a0million!<\/p>\n<p>As Justin Kan stated, \u201c<a href=\"https:\/\/www.linkedin.com\/posts\/justinkan_shitty-first-startups-activity-6870154231052025856-gzFV\/\">Don\u2019t wait. Go build your first shitty startup\u00a0now.<\/a>\u201d<\/p>\n<p>This advice applies to your journey into data science as well. Start somewhere. Begin your learning process now. Even if your first attempt feels \u201cshitty\u201d and you\u2019re unsure of where to start, it\u2019s okay. You can build upon your initial efforts, and nothing prevents you from adjusting your direction as you progress. You need to start now and somewhere.<\/p>\n<figure><img decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1024\/0*DmPNJbH838EBYLrF\"><figcaption>Photo by <a href=\"https:\/\/unsplash.com\/@vladbagacian?utm_source=medium&amp;utm_medium=referral\">Vlad Bagacian<\/a> on\u00a0<a href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\">Unsplash<\/a><\/figcaption><\/figure>\n<h3>So\u2026 Where Do I\u00a0Start?<\/h3>\n<p>The <a href=\"https:\/\/en.wikipedia.org\/wiki\/Beauvais_Cathedral\">Cathedral of Beauvais<\/a> in France was intended to be the tallest cathedral in the world during the 13th century. Its ambitious design pushed the limits of Gothic architecture. However, one notable collapse occurred in 1284 when the choir vault fell due to insufficient foundations and structural support. It remains unfinished to this\u00a0day.<\/p>\n<p>This serves as a strong analogy for your journey into data science. You may be tempted (we all are) to dive directly into the exciting parts, such as deep learning models, LLMs, or the latest machine learning frameworks. But like the Cathedral of Beauvais, your ambitious plan could fail without a solid foundation. Learning the basics first is crucial to ensure your knowledge is robust enough to support more advanced concepts.<\/p>\n<h4><strong>Mathematics: Your Universal Language<\/strong><\/h4>\n<p>Think of mathematics as the language of patterns. There is mathematics everywhere. And honestly, if you don\u2019t like mathematics, perhaps a career in data science isn\u2019t the right choice for\u00a0you.<\/p>\n<p>You don\u2019t need to become a mathematician, but you do need to understand the following key concepts\u00a0:<\/p>\n<ul>\n<li>\n<strong>Linear algebra (<\/strong>matrices, vectors, etc.): Think of matrices and vectors as the language in which data communicates. Understanding these concepts allows you to manipulate data structures for machine learning algorithms.<\/li>\n<li>\n<strong>Calculus (<\/strong>differentiation, integration, gradient, etc.<strong>)<\/strong>: They are essential for optimizing models, like gradients in training neural networks.<\/li>\n<li>\n<strong>Statistics (<\/strong>distributions, descriptive statistics, etc.): This is where you learn to interpret the stories data tells. Understanding concepts like distributions and descriptive statistics allows you to make informed decisions based on patterns in\u00a0data.<\/li>\n<\/ul>\n<h4>Diving into Programming<\/h4>\n<p>With your mathematical foundation in place, programming will bring your ideas to life. While some will argue to learn R in data science, Python stands out for its versatility and widespread use in the industry. Furthermore, most people I know use Python. It will be more than good enough for most use cases. Focus\u00a0on:<\/p>\n<ul>\n<li>\n<strong>Basic syntax and functions<\/strong>: understand how Python works at a fundamental level. It\u2019s like learning an alphabet before writing\u00a0stories.<\/li>\n<li>\n<strong>Data structures<\/strong>: lists, dictionaries, tuples\u200a\u2014\u200aknow how to use them. It\u2019s crucial for handling real-world data.<\/li>\n<li>\n<strong>Control flow statements<\/strong>: master \u201cif statements,\u201d \u201cfor loops,\u201d and \u201cwhile loops.\u201d These allow you to implement logic that can solve complex problems. With simple statements, you can accomplish much more than you\u00a0think!<\/li>\n<li>\n<strong>Object-oriented programming<\/strong>: understand the concept of classes, functions, and objects. This allows you to write efficient, reusable code. It also facilitates collaboration with\u00a0others.<\/li>\n<\/ul>\n<h4>SQL: Your Database\u00a0Language<\/h4>\n<p>Data is often stored in databases that you need to access and manipulate. SQL is your language to interact with this\u00a0data.<\/p>\n<ul>\n<li>\n<strong>Interacting with databases<\/strong>: Learn basic SQL commands to retrieve, update, and manage\u00a0data.<\/li>\n<\/ul>\n<h4><strong>Machine Learning: Turning Data into\u00a0Insights<\/strong><\/h4>\n<p>Next, you can move on to machine learning after understanding mathematics, programming, and data handling. Focus\u00a0on:<\/p>\n<ul>\n<li>\n<strong>Understanding algorithms:<\/strong> start by learning algorithms like linear regression, decision trees, and clustering methods. These are the basics for more complex\u00a0models.<\/li>\n<li>\n<strong>Supervised vs unsupervised learning:<\/strong> understand the difference between these two core types of machine learning. Supervised learning involves training models with labeled data, whereas unsupervised learning involves unlabeled data.<\/li>\n<li>\n<strong>Model evaluation:<\/strong> Learn how to assess the performance of your models using metrics like F1 score for classification models, word error rate for speech recognition, or RMSE for time-series analysis.<\/li>\n<li>\n<strong>Feature engineering<\/strong>: It\u2019s the art of transforming your raw data so your models can understand it. Often, this makes more of a difference than using a fancy algorithm. You can see an example\u00a0<a href=\"https:\/\/levelup.gitconnected.com\/want-to-decrease-your-models-prediction-errors-by-20-follow-this-simple-trick-97354102098e\">here<\/a>.<\/li>\n<li>\n<strong>Libraries and frameworks:<\/strong> Familiarize yourself with popular Python libraries for machine learning, such as scikit-learn, TensorFlow, and\u00a0PyTorch.<\/li>\n<\/ul>\n<p>Remember, machine learning is not just about applying algorithms. It\u2019s about understanding the problem you\u2019re trying to solve and choosing the right approach.<\/p>\n<h4>Business Sense: Turning Technical Skill into Business\u00a0Impact<\/h4>\n<p>Many people contact me about starting a career in data science. They typically have impressive qualifications, such as Ph.D.s and a strong background in mathematics. However, even with these impressive credentials, many struggle to break into the field. The reason? They lack business\u00a0sense.<\/p>\n<p>Technical skills are essential. However, here\u2019s the truth. The best AI model will have a 0$ value if it doesn\u2019t solve a business problem. I\u2019ve seen brilliant data scientists fail because they built sophisticated models that no one used. The key? Learn to think like a business\u00a0owner.<\/p>\n<p>For instance:<\/p>\n<ul>\n<li>\n<strong>Translating business problems<\/strong>: Instead of just building a predictive model, you should ask, \u201cHow does this model support decision-making within the business?\u201d<\/li>\n<li>\n<strong>Prioritizing impact<\/strong>: Focus on problems where data science can generate the most value rather than pursuing complex solutions that don\u2019t solve a business\u00a0problem.<\/li>\n<\/ul>\n<h4>Focus on the Essentials<\/h4>\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Vilfredo_Pareto\">Vilfredo Pareto<\/a> was an Italian polymath who contributed to multiple fields, such as economics and sociology. One of the concepts he is known for is the Pareto optimality. It describes a situation where resources are allocated the most economically efficiently, so no one can be made better off without making someone else worse\u00a0off.<\/p>\n<p>However, the most famous observation he is known for was while studying wealth distribution in Italy. He discovered that 20% of the population owned 80% of the land. He also noticed the same pattern in Prussia, England, France,\u00a0etc.<\/p>\n<p>This observation led to the formulation of what we know today as the Pareto Principle or the 80\/20 rule. In other words, 20% of the causes are responsible for 80% of the\u00a0effects.<\/p>\n<p>For example, in business, it\u2019s often observed that 80% of sales come from 20% of customers. In quality control, 80% of problems are caused by 20% of defects. In the workplace, 20% of our tasks contribute to 80% of what we deliver. We tend to use about 20% of what we own 80%. And the list goes\u00a0on.<\/p>\n<p>The same idea applies to your journey of becoming a data scientist. Instead of trying to master every possible topic, focus on taking just one course for each key area: mathematics for data science, Python, SQL, machine learning, and business analytics. That\u2019s it. Focus on the core 20% of skills (or even less), yielding 80% of your\u00a0results.<\/p>\n<p>Remember, don\u2019t get caught in the trap of \u201ctutorial hell,\u201d where you constantly consume new content but never deeply understand what you\u2019re learning. Becoming a skilled data scientist is mostly about gaining experience, like any other job. It\u2019s applying what you\u2019ve learned to real-world projects.<\/p>\n<p>When you don\u2019t understand something, search for it, learn it, and then return to your project. Repeat this process to reinforce your knowledge and skills as much as required.<\/p>\n<figure><img decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1024\/0*aCkcUDVwjNk1JtOc\"><figcaption>Photo by <a href=\"https:\/\/unsplash.com\/@austindistel?utm_source=medium&amp;utm_medium=referral\">Austin Distel<\/a> on\u00a0<a href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\">Unsplash<\/a><\/figcaption><\/figure>\n<h3>Create Your Own Work Experience<\/h3>\n<blockquote><p>\u201c<a href=\"https:\/\/en.wikipedia.org\/wiki\/Ut_est_rerum_omnium_magister_usus\">Experience is the teacher of all\u00a0things<\/a>.\u201d<\/p><\/blockquote>\n<blockquote><p>\u2014 Julius\u00a0Caesar<\/p><\/blockquote>\n<p>After completing the basic courses, enhance your skills by applying what you\u2019ve learned to real-world projects.<\/p>\n<p>Building expertise in any field requires significant dedication and practice. Ericsson, Krampe, and Tesch-R\u00f6mer&#8217;s <a href=\"https:\/\/psycnet.apa.org\/buy\/1993-40718-001\">study<\/a> highlighted that developing expertise in any field typically requires around 10,000 hours of deliberate practice. Elite performers, such as concert musicians and professional athletes, often dedicate around four hours of focused practice per day to perfect their\u00a0skills.<\/p>\n<p>The same principle applies to data science. Mastery doesn\u2019t happen overnight. It requires consistent effort and experience. By dedicating time daily to apply what you\u2019ve learned and solve real-world problems, you\u2019re moving closer to becoming an expert in the\u00a0field.<\/p>\n<h4>Ok\u2026 But How Do I Gain Experience?<\/h4>\n<p>It\u2019s simpler than what most people think. Yet, many get paralyzed trying to figure out the \u201cperfect\u201d starting point. As I said earlier, the most crucial step is to start now and somewhere. It\u2019s okay to make mistakes and adapt your approach as you\u00a0learn.<\/p>\n<p>Your professional background isn\u2019t a limitation, even if it\u2019s not in data science. It\u2019s quite the opposite. It\u2019s an\u00a0asset.<\/p>\n<p>Every field, whether marketing, healthcare, finance, or law, has problems that can be solved with data. A marketer might analyze customer engagement patterns. Someone with a finance background might want to forecast the stock\u00a0market.<\/p>\n<p>I once advised someone I was coaching with a background in finance. The person didn\u2019t know where to start. I advised him to create an ARIMA model to forecast Canadian housing prices (ARIMA is quite a simple\u00a0model).<\/p>\n<p>It was nothing groundbreaking but real and relevant. Not only did it leverage his domain expertise and technical skills, but that person was focusing on a topic that was high in demand (Canadian housing\u00a0prices).<\/p>\n<p>If you\u2019re still unsure, start with something you genuinely enjoy. This is the key. When you&#8217;re truly interested, you will most likely go through those 10,000 hours of practice we discussed earlier. You\u2019re also more likely to approach challenges with determination and view setbacks as learning opportunities rather than a reason to\u00a0quit.<\/p>\n<p>It can be anything. If you\u2019re an artist, you may use computer vision to analyze visual patterns or create generative art with neural networks. A healthcare worker may want to predict patient outcomes. Someone in environmental science might model climate change impacts using large datasets. The list goes\u00a0on.<\/p>\n<p>If possible, consider using Large Language Models (LLMs). It\u2019s definitively not mandatory. However, LLMs have become popular recently, especially after ChatGPT\u2019s launch in late 2022. Companies are rapidly adopting them. It offers a fantastic opportunity to develop expertise in a cutting-edge field.<\/p>\n<p>There are several frameworks to build an application using LLMs. One of them is <a href=\"https:\/\/python.langchain.com\/docs\/introduction\/\">LangChain<\/a>. But again, LLMs should complement, not replace, your understanding of basic machine learning. If you find LLMs too complex, start with something simple.<\/p>\n<p>Once you\u2019ve built something, share it with the world. Write articles on Medium or publish your code on GitHub. It will showcase your work. Start with a basic model or project. Then, iteratively enhance\u00a0it.<\/p>\n<p>For example, you could start with a simple ARIMA model to forecast housing prices. Then, you could switch to a more sophisticated multi-variate model (like a transformer-based time series model). You could incorporate features such as interest rate, income to debt, and unemployment rate. Finally, you could compare that model to your baseline.<\/p>\n<p>As you incorporate additional features or refine your algorithms, update your GitHub repository and write follow-up articles on your progress. It demonstrates your skills and commitment to continuous learning. It\u2019s one of the best (if not the best) ways to learn and showcase your capabilities.<\/p>\n<h3>Conclusion<\/h3>\n<p>Thank you for reading the article! Again, remember. As Voltaire wisely said, \u201c<a href=\"https:\/\/en.wikipedia.org\/wiki\/Perfect_is_the_enemy_of_good\">Perfect is the enemy of good.<\/a>\u201d Just start now and somewhere. You don\u2019t need to wait for the perfect project or idea to take action. As you gain hands-on experience, it will become clearer what your next steps should\u00a0be.<\/p>\n<h3>Liked this article? Show your\u00a0support!<\/h3>\n<p>\ud83d\udc4f Clap it up to 50\u00a0times<\/p>\n<p>\ud83e\udd1d <a href=\"https:\/\/www.linkedin.com\/in\/philippe-ostiguy\/\">Connect with me on LinkedIn<\/a> to stay in touch and discuss opportunities.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/medium.com\/_\/stat?event=post.clientViewed&amp;referrerSource=full_rss&amp;postId=655f0476b462\" width=\"1\" height=\"1\" alt=\"\"><\/p>\n<hr>\n<p><a href=\"https:\/\/towardsdatascience.com\/becoming-a-data-scientist-what-i-would-do-if-i-had-to-start-over-655f0476b462\">Becoming a Data Scientist: What I Would Do If I Had to Start Over<\/a> was originally published in <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a> on Medium, where people are continuing the conversation by highlighting and responding to this story.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Philippe Ostiguy, M. Sc.<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/medium.com\/m\/global-identity-2?redirectUrl=https%3A%2F%2Ftowardsdatascience.com%2Fbecoming-a-data-scientist-what-i-would-do-if-i-had-to-start-over-655f0476b462\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Becoming a Data Scientist: What I Would Do If I Had to Start Over Breaking into data science: The Good, the Bad, and the Python\u00a0Bugs Photo by Markus Spiske on\u00a0Unsplash Martin Luther King Jr. is famous for his speech, \u201cI Have a Dream.\u201d He delivered it at the Lincoln Memorial in Washington, D.C., on August [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,277,83,70,397,398],"tags":[399,84,106],"class_list":["post-355","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-careers","category-data-science","category-machine-learning","category-productivity","category-work","tag-becoming","tag-data","tag-scientist"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/355"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=355"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/355\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=355"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=355"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=355"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}