{"id":6315,"date":"2025-08-25T07:03:20","date_gmt":"2025-08-25T07:03:20","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/08\/25\/generating_passages_similar_in_style_to_a_set_of\/"},"modified":"2025-08-25T07:03:20","modified_gmt":"2025-08-25T07:03:20","slug":"generating_passages_similar_in_style_to_a_set_of","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/08\/25\/generating_passages_similar_in_style_to_a_set_of\/","title":{"rendered":"Generating passages similar in style to a set of 9 examples (Question)"},"content":{"rendered":"<p>    Generating passages similar in style to a set of 9 examples (Question)<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<!-- SC_OFF --><\/p>\n<div class=\"md\">\n<p>Hello everyone<br \/> I hope I can find some guidance here for a project in generative AI.<\/p>\n<p>I have a set of 9 short passages from a TOEFL-like English test. I need to generate more passages that match the style of the examples set. The passages are 50 &#8211; 100 words, and are cut at the end in the middle of a sentence, and the examinees&#8217; task is to choose the correct answer that completes the text correctly, out of 4 options.<\/p>\n<p>Here&#8217;s what I considered:<\/p>\n<ol>\n<li>Ask ChatGPT to generate a similar passage using few-shot prompting.<\/li>\n<li>Build a scoring \/ distance method to measure the distance between the generated passage and the examples set.<\/li>\n<li>Ask ChatGPT to alter the passage until I&#8217;m satisfied with the score.<\/li>\n<\/ol>\n<p>Some questions:<br \/> 1. For the scoring method, I&#8217;m considering TFIDF of POS (part of speech) and function words. Is that a good idea? Any other suggestions? I did consider embeddings, but wouldn&#8217;t that lead to passages similar in content rather than in style? 2. How would you generate 3 wrong answers that also fit the style of the wrong answers in the examples? I thought I&#8217;d cluster the examples&#8217; wrong answers into 3 categories using k-means, figure out what distinguishes each class from the others, and ask ChatGPT to generate one wrong answer from each category (e.g. bad grammar \/ contradictory information \/ etc.). 3. Any other approaches that you&#8217;d suggest? Could i build a generative model that takes in an article (e.g. Wikipedia article) and modifies it so the format and style matches the examples&#8217;, or is the examples set too small for that?<\/p>\n<\/p><\/div>\n<p><!-- SC_ON -->   submitted by   <a href=\"https:\/\/www.reddit.com\/user\/RunOrDieTrying\"> \/u\/RunOrDieTrying <\/a> <br \/> <span><a href=\"https:\/\/www.reddit.com\/r\/datascience\/comments\/1mz89ao\/generating_passages_similar_in_style_to_a_set_of\/\">[link]<\/a><\/span>   <span><a href=\"https:\/\/www.reddit.com\/r\/datascience\/comments\/1mz89ao\/generating_passages_similar_in_style_to_a_set_of\/\">[comments]<\/a><\/span>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    \/u\/RunOrDieTrying<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/www.reddit.com\/r\/datascience\/comments\/1mz89ao\/generating_passages_similar_in_style_to_a_set_of\/\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Generating passages similar in style to a set of 9 examples (Question) Hello everyone I hope I can find some guidance here for a project in generative AI. I have a set of 9 short passages from a TOEFL-like English test. I need to generate more passages that match the style of the examples set. [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,99],"tags":[200,3577,3578],"class_list":["post-6315","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-datascience","tag-examples","tag-passages","tag-style"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/6315"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=6315"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/6315\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=6315"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=6315"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=6315"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}