Tag: clean
-
Has anyone tried training models on raw discussions instead of curated datasets?
Has anyone tried training models on raw discussions instead of curated datasets? I’ve always followed the usual advice when training models, like clean the data, normalize everything, remove noise, structure it nicely Recently I tried something different. Instead of polished datasets, I fed models long, messy discussion threads, real conversations, people arguing, correcting themselves, misunderstanding…