{"id":10498,"date":"2026-02-16T07:02:24","date_gmt":"2026-02-16T07:02:24","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2026\/02\/16\/best_technique_for_training_models_on_a_sample_of\/"},"modified":"2026-02-16T07:02:24","modified_gmt":"2026-02-16T07:02:24","slug":"best_technique_for_training_models_on_a_sample_of","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2026\/02\/16\/best_technique_for_training_models_on_a_sample_of\/","title":{"rendered":"Best technique for training models on a sample of data?"},"content":{"rendered":"<p>    Best technique for training models on a sample of data?<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<!-- SC_OFF --><\/p>\n<div class=\"md\">\n<p>Due to memory limits on my work computer I&#8217;m unable to train machine learning models on our entire analysis dataset. Given my data is highly imbalanced I&#8217;m under-sampling from the majority class of the binary outcome. <\/p>\n<p>What is the proper method to train ML models on sampled data with cross-validation and holdout data?<\/p>\n<p>After training on my under-sampled data should I do a final test on a portion of &#8220;unsampled data&#8221; to choose the best ML model?<\/p>\n<\/p><\/div>\n<p><!-- SC_ON -->   submitted by   <a href=\"https:\/\/www.reddit.com\/user\/RobertWF_47\"> \/u\/RobertWF_47 <\/a> <br \/> <span><a href=\"https:\/\/www.reddit.com\/r\/datascience\/comments\/1r53jy3\/best_technique_for_training_models_on_a_sample_of\/\">[link]<\/a><\/span>   <span><a href=\"https:\/\/www.reddit.com\/r\/datascience\/comments\/1r53jy3\/best_technique_for_training_models_on_a_sample_of\/\">[comments]<\/a><\/span>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    \/u\/RobertWF_47<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/www.reddit.com\/r\/datascience\/comments\/1r53jy3\/best_technique_for_training_models_on_a_sample_of\/\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Best technique for training models on a sample of data? Due to memory limits on my work computer I&#8217;m unable to train machine learning models on our entire analysis dataset. Given my data is highly imbalanced I&#8217;m under-sampling from the majority class of the binary outcome. What is the proper method to train ML models [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,99],"tags":[1015,84,73],"class_list":["post-10498","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-datascience","tag-best","tag-data","tag-models"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/10498"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=10498"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/10498\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=10498"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=10498"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=10498"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}