{"id":872,"date":"2024-12-30T07:03:13","date_gmt":"2024-12-30T07:03:13","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2024\/12\/30\/my_data_science_manifesto_from_a_self_taught_data\/"},"modified":"2024-12-30T07:03:13","modified_gmt":"2024-12-30T07:03:13","slug":"my_data_science_manifesto_from_a_self_taught_data","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2024\/12\/30\/my_data_science_manifesto_from_a_self_taught_data\/","title":{"rendered":"My Data Science Manifesto from a Self Taught Data Scientist"},"content":{"rendered":"<p>    My Data Science Manifesto from a Self Taught Data Scientist<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<!-- SC_OFF --><\/p>\n<div class=\"md\">\n<p><strong>Background<\/strong><\/p>\n<p>I\u2019m a self-taught data scientist, with about 5 years of data analyst experience and now about 5 years as a Data Scientist. I\u2019m more math minded than the average person, but I\u2019m not special. I have a bachelor\u2019s degree in mechanical engineering, and have worked alongside 6 data scientists, 4 of which have PHDs and the other 2 have a masters. Despite being probably, the 6th out of 7 in natural ability, I have been the 2nd most productive data scientist out of the group.<\/p>\n<p><strong>Gatekeeping<\/strong><\/p>\n<p>Every day someone on this subreddit asks some derivative of \u201cwhat do I need to know to get started in ML\/DS?\u201d The answers are always smug and give some insane list of courses and topics one must master. As someone who\u2019s been on both sides, this is attitude extremely annoying and rampart in the industry. I don\u2019t think you can be bad at math and have no pre-requisite knowledge, and be successful, but the levels needed are greatly exaggerated. Most of the people telling you these things are just posturing due to insecurity.<\/p>\n<p>As a mechanical engineering student, I had at least 3 calculus courses, a linear algebra course, and a probability course, but it was 10+ years before I attempted to become a DS, and I didn\u2019t remember much at all. This sub, and others like it, made me think I had to be an expert in all these topics and many more to even think about trying to become a data scientist. <\/p>\n<p>When I started my journey, I would take coding, calculus, stats, linear algebra, etc. courses. I\u2019d take a course, do OK in it, and move onto the next thing. However, eventually I\u2019d get defeated because I realized I couldn\u2019t remember much from the courses I took 3 months prior. It just felt like too much information for me to hold at a single time while working a full-time job.<\/p>\n<p><strong>What you actually need<\/strong><\/p>\n<p>The reality is, 95% of the time you only need a basic understanding of these topics. Specific projects, often require a deeper dive into SOMETHING else, but that&#8217;s a case by case basis, and you figure that out as you go.<\/p>\n<p>For calculus, you don&#8217;t need to know how to integrate multivariable functions by hand. You need to know that derivatives create a function that represents the slope of the original function, and that where the derivative = 0 is a local min\/max. You need to know integrals are area under the curve.<\/p>\n<p>For stats, you need to understand what a p value represents. You don&#8217;t need to know all the different tests, and when to use them. You need to know that they exist and why you need them. When it&#8217;s time to use one, just google it, and figure out which one best suits your use case.<\/p>\n<p>For linear algebra, you don&#8217;t need to know how to solve for eigenvectors by hand, or whatever other specific things you do in that class. You need to know how to \u2018read\u2019 it. It is also helpful to know properties of linear algebra. Like the cross product of 2 vectors yields a vector perpendicular to both.<\/p>\n<p>For probability, you need to understand basic things, but again, just google your specific problem.<\/p>\n<p>You don&#8217;t need to be an expert software dev. You need to write ok code, and be able to use chatGPT to help you improve it little by little.<\/p>\n<p>You don&#8217;t need to know how to build all the algorithms by hand. A general understanding of how they work is enough in 95% of cases.<\/p>\n<p>Of all of those things, the only thing you absolutely NEED to get started is basic coding ability. <\/p>\n<p>By far the number one technical ability needed to &#8216;master&#8217; is understanding how to test and evaluate your models\/algos. If you can ensure that you&#8217;re accurately evaluating your model, with metrics that correctly align with the use case, that&#8217;s enough to start providing some real value. I often see people asking things like &#8220;should I do this feature engineering technique for this problem?&#8221; or \u201cwhich of these algorithms will perform best?\u201d. The answer is &#8220;I don&#8217;t know, try it, measure it, and see&#8221;. Understanding how the algorithms work can give you clues into what you should try, but at the end of the day, you should just try it and see. <\/p>\n<p>Despite the posturing in the industry, very few people are experts in all these domains. Some people are better at talking the talk than others, but at the end of the day, you WILL have to constantly research and learn on a project by project basis. That\u2019s what makes it fun and interesting. You do not need to be an expert before getting started.<\/p>\n<p>The reason I\u2019m near the top in productivity while being near the bottom in natural and technical ability is my 5 years of experience as a data analyst at my company. During this time, I got really good at exploring my companies\u2019 data. When you are stumped on problem, intelligently visualizing the data often reveals the solution. I\u2019ve also had the luxury of analyzing our data from all different perspectives. I\u2019d have assignments from marketing, product, tech support, customer service, software, firmware, and other technical teams. I understand the complete company better than the other data scientists. I\u2019m also just aware of more \u2018tips and tricks\u2019 than anyone else. <\/p>\n<p>Good domain knowledge and data exploration skills with average technical skills will outperform good technical skills with average domain knowledge and data exploration almost every time. <\/p>\n<p><strong>Advice for those self taught<\/strong><\/p>\n<p>I\u2019ve been on the hiring side of things a few times now, and the market is certainly difficult. I think it would be very difficult for someone to online course and side project themselves directly into a DS job. The side project would have to be extremely impressive to be considered. However, I think my path is repeatable.<\/p>\n<p>I taught myself basic SQL and Tableau and completed a few side projects. I accepted a job as a data analyst, in a medium sized (100-200 total employees) on a team where DS and DA shared the same boss. <\/p>\n<p>The first year or two I excelled at my role as a DA. I made my boss aware that I wanted to become a DS eventually. He started to make me a small part of some DS projects, running queries, building dashboards to track performance and things like that. I was also a part of some of the meetings, so I got some insight into how certain problems were approached. <\/p>\n<p>My boss made me aware that I would need to teach myself to code and machine learning. My role in the data science projects grew over time, but I was ultimately blocked from becoming a DS because I kept trying and failing to learn to code and the 25 areas of expertise reddit tells you that you need by taking MOOCs. <\/p>\n<p>Eventually, I paid up for DataQuest. I naively thought the course would teach me everything I needed to know. While you will not be proficient in anything DS upon completing, the interactive format made it easy to jump into 30-60 minutes of structured coding every day. Like a real language consistency is vital. <\/p>\n<p>Once I got to the point where I could do some basic coding, I began my own side project. This is where the real learning began. Titanic problem is fine for day 1, but you really need a project of your own. I picked a project that I was interested in and had a function that I would use (I&#8217;m on V3 of this project and it&#8217;s grown to a level that I never could&#8217;ve dreamed of at the time). This was crucial in ensuring that I stuck with the project. When I didn\u2019t know how to do something in the project, I would research it and figure it out. This is how it works in the real world.<\/p>\n<p>After 3 months of Dataquest and another 3 of a project (along with 4 years of being a data analyst) I convinced my boss to assign me DS project. I worked alongside another data scientist, but I owned the project, and they were mostly there for guidance, and coded some of the more complex things. I excelled at that project, and was promoted to data scientist, and began getting projects of my own, with less and less oversight. I&#8217;ve been promoted twice since then.<\/p>\n<p>I&#8217;d like to add that you can almost certainly do all this in less time than it took me. I wasted a lot of time spinning my wheels. ChatGPT is also a great resource that could also increase your learning speed. Don&#8217;t blindly use it, but it&#8217;s a great resource.<\/p>\n<p><strong>Tldr:<\/strong> Sir this is Wendy\u2019s.<\/p>\n<p><strong>Edit:<\/strong> I\u2019m not saying to never go deeper into things, I\u2019m literally always learning. I go deeper into things all the time. Often in very specific domains that are never mentioned here, but you don&#8217;t need to be a master in all things to excel. Be able to understand generalities of those domains, and dig deeper when the problem calls for it. Learning a concept when you have a direct application is much more likely to stick.<\/p>\n<p>I thought it went without saying, but I\u2019m not saying those things I listed are literally the only things you need to know about those topics, I was just giving examples of where relatively simple concepts were way more important than specifics.<\/p>\n<\/p><\/div>\n<p><!-- SC_ON -->   submitted by   <a href=\"https:\/\/www.reddit.com\/user\/irndk10\"> \/u\/irndk10 <\/a> <br \/> <span><a href=\"https:\/\/www.reddit.com\/r\/datascience\/comments\/1hp7pim\/my_data_science_manifesto_from_a_self_taught_data\/\">[link]<\/a><\/span>   <span><a href=\"https:\/\/www.reddit.com\/r\/datascience\/comments\/1hp7pim\/my_data_science_manifesto_from_a_self_taught_data\/\">[comments]<\/a><\/span>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    \/u\/irndk10<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/www.reddit.com\/r\/datascience\/comments\/1hp7pim\/my_data_science_manifesto_from_a_self_taught_data\/\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>My Data Science Manifesto from a Self Taught Data Scientist Background I\u2019m a self-taught data scientist, with about 5 years of data analyst experience and now about 5 years as a Data Scientist. I\u2019m more math minded than the average person, but I\u2019m not special. I have a bachelor\u2019s degree in mechanical engineering, and have [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,99],"tags":[1016,84,106],"class_list":["post-872","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-datascience","tag-courses","tag-data","tag-scientist"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/872"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=872"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/872\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=872"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=872"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=872"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}