{"id":5273,"date":"2025-07-14T07:02:21","date_gmt":"2025-07-14T07:02:21","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/07\/14\/the_right_questions_to_find_clusters_tangles\/"},"modified":"2025-07-14T07:02:21","modified_gmt":"2025-07-14T07:02:21","slug":"the_right_questions_to_find_clusters_tangles","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/07\/14\/the_right_questions_to_find_clusters_tangles\/","title":{"rendered":"The right questions to find clusters (tangles)"},"content":{"rendered":"<p>    The right questions to find clusters (tangles)<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<!-- SC_OFF --><\/p>\n<div class=\"md\">\n<p><strong>Hey everyone,<\/strong><\/p>\n<p>I\u2019m currently working on my bachelor\u2019s thesis and I\u2019m hitting a creative block on a central part \u2013 maybe you have some ideas or impulses for me.<\/p>\n<p>My dataset consists of 100,000 cleaned job postings from Kaggle (title + description). The goal of my thesis is to use a method called <strong>Tangles<\/strong> (probably no one knows it, it\u2019s a rather specific approach from my studies) to find interesting clusters in this data \u2013 similar to embedding-based clustering methods, but with the key difference that it requires <strong>interpretable, binary decisions<\/strong>. Sounds theoretical, but it\u2019s actually pretty cool:<\/p>\n<p>You ask the dataset <strong>yes\/no questions<\/strong> (e.g., <em>\u201cDoes the job require a lot of travel?\u201d<\/em>), and based on the answer patterns, a kind of profile emerges \u2013 and from these profiles, groups that belong together can be formed.<\/p>\n<p>The goal is to group jobs that don\u2019t obviously belong together at first glance, but do share certain underlying similarities (e.g., requirements, tasks) that cause them to respond similarly to the questions.<\/p>\n<p><strong>One example:<\/strong><\/p>\n<p>Questions like:<\/p>\n<ul>\n<li>Does the job require a lot of travel?<\/li>\n<li>Do you need a driver\u2019s license?<\/li>\n<li>Do you have to be physically fit?<\/li>\n<\/ul>\n<p>=&gt; could group <em>Sales Managers<\/em> and <em>Truck Drivers<\/em> together \u2013 even though those jobs seem very different at first. These kinds of connections are what I find exciting.<\/p>\n<p>What I\u2019m <strong>not<\/strong> looking for are questions like:<\/p>\n<ul>\n<li>Is this a data science job?<\/li>\n<li>Do you need to know how to code?<\/li>\n<li>Is it IT-related?<\/li>\n<\/ul>\n<p>To me, those are more like categories or classifications that make the clustering too obvious \u2013 they just confirm what you already know. I\u2019m more interested in <strong>surprising, layered similarities<\/strong>.<\/p>\n<p>So here\u2019s my question for you:<\/p>\n<p>Do you have any interesting <strong>yes\/no questions<\/strong> from your daily work or knowledge that could be applied to any kind of job posting \u2013 and that might result in <strong>interesting, possibly unexpected groupings<\/strong>?<\/p>\n<p>Whether you work in trades, healthcare, IT, management, or research \u2013 <strong>every perspective helps!<\/strong><\/p>\n<p>In the end, I need at least 40 such questions (the more, the better), but right now I\u2019m really struggling to come up with good ones. Even GPT &amp; co. haven\u2019t been much help \u2013 they usually just spit out generic stuff.<\/p>\n<p>Even <strong>one<\/strong> good question from you would be incredibly helpful. \ud83d\ude4f OR advice on how to find these questions\/if my idea is right or not, would help.<\/p>\n<p>Thanks in advance for thinking along!<\/p>\n<\/p><\/div>\n<p><!-- SC_ON -->   submitted by   <a href=\"https:\/\/www.reddit.com\/user\/juggerjaxen\"> \/u\/juggerjaxen <\/a> <br \/> <span><a href=\"https:\/\/www.reddit.com\/r\/datascience\/comments\/1lyciw1\/the_right_questions_to_find_clusters_tangles\/\">[link]<\/a><\/span>   <span><a href=\"https:\/\/www.reddit.com\/r\/datascience\/comments\/1lyciw1\/the_right_questions_to_find_clusters_tangles\/\">[comments]<\/a><\/span>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    \/u\/juggerjaxen<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/www.reddit.com\/r\/datascience\/comments\/1lyciw1\/the_right_questions_to_find_clusters_tangles\/\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The right questions to find clusters (tangles) Hey everyone, I\u2019m currently working on my bachelor\u2019s thesis and I\u2019m hitting a creative block on a central part \u2013 maybe you have some ideas or impulses for me. My dataset consists of 100,000 cleaned job postings from Kaggle (title + description). The goal of my thesis is [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,99],"tags":[918,108,1342],"class_list":["post-5273","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-datascience","tag-job","tag-my","tag-questions"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/5273"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=5273"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/5273\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=5273"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=5273"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=5273"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}