{"id":567,"date":"2024-12-14T07:03:29","date_gmt":"2024-12-14T07:03:29","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2024\/12\/14\/is-complex-writing-nothing-but-formulas-289e0a33793f\/"},"modified":"2024-12-14T07:03:29","modified_gmt":"2024-12-14T07:03:29","slug":"is-complex-writing-nothing-but-formulas-289e0a33793f","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2024\/12\/14\/is-complex-writing-nothing-but-formulas-289e0a33793f\/","title":{"rendered":"Is Complex Writing Nothing But Formulas?"},"content":{"rendered":"<p>    Is Complex Writing Nothing But Formulas?<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<h4>Text analytics hints at how volumes of writing get\u00a0created<\/h4>\n<p>In the broadest of strokes, Natural Language Processing transforms language into constructs that can be usefully manipulated. Since deep-learning embeddings have proven so powerful, they\u2019ve also become the default: pick a model, embed your data, pick a metric, do some RAG. To add new value, it helps to have a different take on crunching language. <br \/>The one I\u2019ll share today started years ago, with a single\u00a0book.<\/p>\n<p><em>The Orchid Thief <\/em>is both non-fiction and full of mischief. I had first read it in my 20s, skipping most of the historical anecdata, itching for its first-person accounts. At the time, I laughed out loud but turned the pages in quiet fury, that someone could live so deeply and write so well. I wasn\u2019t all that sure these were different things.<\/p>\n<p>Within a year I had moved to London to start anew. <br \/>I went into financial services, which is like a theme park for nerds. And, for the ensuing decade, would only take jobs with lots of\u00a0writing.<\/p>\n<p>Lots being the operative word.<\/p>\n<p>Behind the modern fa\u00e7ade of professional services, British industry is alive to its old factories and shipyards. It employs Alice to do a thing, and then hand it over to Bob; he turns some screws, and it\u2019s on to Charlie. One month on, we all do it again. As a newcomer, I noticed habits weren\u2019t so much a ditch to fall into, but a mound to\u00a0stake.<\/p>\n<p>I was also reading lots. Okay, I was reading the <em>New Yorker<\/em>. My most favourite thing was to flip a fresh one on its cover, open it from the back, and read the opening sentences of one, Anthony Lane, who writes film reviews. Years and years, not once did I go see a\u00a0movie.<\/p>\n<p>Every now and again, a flicker would catch me off-guard. A barely-there thread between the <em>New Yorker <\/em>corpus and my non-Pulitzer outputs. In both corpora, each piece was different to its siblings, but also\u2026<em>not quite. <\/em>Similarities echoed. And I knew the ones in my work had arisen out of a repetitive process.<\/p>\n<p>In 2017 I began meditating on the threshold separating writing that <em>feels formulaic<\/em> from one that can be explicitly written out <em>as a\u00a0formula<\/em>.<\/p>\n<p>The argument goes like this: volume of repetition hints at a (typically tacit) form of algorithmic decision-making. But procedural repetition leaves fingerprints. Trace the fingerprints to surface the procedure; suss out the algorithm; and the software practically writes\u00a0itself.<\/p>\n<p>In my last job, I was no longer writing lots. My software\u00a0was.<\/p>\n<p>Companies can, in principle, learn enough about their own flows to reap enormous gains, but few bother. Folks seem far more enthralled with what <em>somebody else<\/em> is\u00a0doing.<\/p>\n<p>For example, my bosses, and later my clients, kept wishing their staff could mimic the <em>Economist<\/em>\u2019s house style. But how would you find which steps the <em>Economist<\/em> takes to end up sounding the way it\u00a0does?<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A82IO4SAXlPwtsyX8ih9bPw.png?ssl=1\"><figcaption>Image by\u00a0author<\/figcaption><\/figure>\n<h4>Enter Text Analytics<\/h4>\n<p>Read a single <em>Economist<\/em> article, and it feels breezy and confident. Read lots of them, and they sound kind of alike. A full printed magazine comes out once a week. Yeah, I was betting on\u00a0process.<\/p>\n<p>For fun, let\u2019s apply a readability function (measured in years of education) to several hundred <em>Economist<\/em> articles. Let\u2019s also do the same to hundreds of articles published by a frustrated European asset\u00a0manager.<\/p>\n<p>Then, let\u2019s get ourselves a histogram to see how those readability scores are distributed.<\/p>\n<p>Just two functions, and look at the insights we\u00a0get!<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A8LrA79XdxEheUQRW_243BQ.png?ssl=1\"><figcaption>Readability profile. Source:\u00a0FinText<\/figcaption><\/figure>\n<p>Notice how separated the curves are; this asset manager is <em>not<\/em> sounding like the <em>Economist<\/em>. We could drill further to see what\u2019s causing this disparity. (For a start, it\u2019s often <a href=\"https:\/\/www.fintext.io\/blog\/in-charts-asset-managers-struggle-with-long-sentences\/\">crazy-long sentences<\/a>.)<\/p>\n<p>But also, notice how the <em>Economist<\/em> puts a hard limit on the readability score they allow. The curve is inorganic, betraying they apply a strict readability check in their editing\u00a0process.<\/p>\n<p>Finally\u200a\u2014\u200aand many of my clients struggled with this\u200a\u2014\u200athe <em>Economist<\/em> vows to write plainly enough that an average highschooler could take it\u00a0in.<\/p>\n<p>I had expected these charts. I had scribbled them on paper. But when a real one first lit up my screen, it was as though language herself had\u00a0giggled.<\/p>\n<p>Now, I wasn\u2019t exactly the first on the scene. In 1964, statisticians Frederick Mosteller and David Wallace landed on the cover of <em>Time<\/em> magazine, their forensic literary analysis <a href=\"https:\/\/web.stanford.edu\/group\/cslipublications\/cslipublications\/site\/1575865521.shtml\">settling a 140-year old debate<\/a> over the authorship of a famed dozen of anonymously-written essays.<\/p>\n<p>But forensic analytics always looks at the single item in relation to two corpora: the one created by the suspected author, and the null hypothesis. Comparative analytics only cares about comparing bodies of\u00a0text.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2Adbtq5FGuzlI6hWF5p29rpQ.png?ssl=1\"><figcaption>Image by\u00a0author<\/figcaption><\/figure>\n<h4><strong>Building A Text Analytics Engine<\/strong><\/h4>\n<p>Let\u2019s retrace our steps: given a corpus, we applied the same function on each of the texts (the readability function). This mapped the corpus onto a set (in this case, numbers). On this set we applied another function (the histogram). Finally, we did it to two different corpora\u200a\u2014\u200aand compared the\u00a0results.<\/p>\n<p>If you squint, you\u2019ll see I\u2019ve just described Excel.<\/p>\n<p>What looks like a table is actually <strong>a <em>pipeline<\/em><\/strong><em>, <\/em>crunching columns sequentially. First along the column, followed by functions on the results, followed by comparative analysis functions.<\/p>\n<p>Well, I wanted Excel, but for\u00a0text.<\/p>\n<p>Not strings\u200a\u2014\u200atext. I wanted to apply functions like Count Verbs or First Paragraph Subjector First Important Sentence. And it had to be flexible enough so I could ask <em>any question<\/em>; who knows what would end up mattering?<\/p>\n<p>In 2020 this kind of solution did not exist, so I built it. And boy did this software not \u2018practically write itself\u2019! Making it possible to ask any question needed some good architecture decisions, which I got wrong twice before ironing out the\u00a0kinks.<\/p>\n<p>In the end, functions are defined once, by what they do to a single input text. Then, you pick and choose the pipeline steps, and the corpora on which they\u00a0act.<\/p>\n<p>With that, I started a writing-tech consulting company, <a href=\"https:\/\/www.fintext.io\/\">FinText<\/a>. I planned to build while working with clients, and see what\u00a0sticks.<\/p>\n<h4><strong>What the Market\u00a0Said<\/strong><\/h4>\n<p>The first commercial use case I came up with was <a href=\"https:\/\/www.fintext.io\/case-studies\/benchmarking\/social-listening-for-investment-marketing\/\">social listening<\/a>. Market research and polling are big business. It\u2019s now the height of the pandemic, everyone\u2019s at home. I figured that processing active chatter on dedicated online communities could be a new way to access client thinking.<\/p>\n<p>Any first software client would have felt special, but <a href=\"https:\/\/www.fintext.io\/case-studies\/benchmarking\/help-them-buy-your-esg-funds\/\">this one<\/a> was thrilling, because my concoction actually helped real people get out of a tight\u00a0spot:<\/p>\n<p>Working towards a big event, they had planned to launch a flagship report, with data from a paid YouGov survey. But its results were tepid. So, with their remaining budget, they bought a FinText study. It was our findings that they put front and centre in their <a href=\"https:\/\/uksif.org\/public-and-investor-attitudes-to-good-money-2020\/\">final\u00a0report<\/a>.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AW_q__FqPYGsMQYcd7VuiDg.png?ssl=1\"><figcaption>Social listening on Reddit \u2018Investing\u2019, 2020. Source:\u00a0FinText<\/figcaption><\/figure>\n<p>But social listening did not take off. Investment land is quirky because pools of money will always need a home; the only question is who\u2019s the landlord. Industry people I talked to mostly wanted to know what their competitors were up\u00a0to.<\/p>\n<p>So the second use case\u200a\u2014\u200acompetitive content analytics\u200a\u2014\u200awas met with warmer response. I sold about half a dozen companies on this solution (including, for example, <a href=\"https:\/\/www.fintext.io\/case-studies\/benchmarking\/competitive-content-analytics-for-aviva-investors\/\">Aviva Investors<\/a>).<\/p>\n<p>All along, our engine was collecting data no one else had. Such was my savvy, it wasn\u2019t even my idea to run training sessions, a client first asked for one. That\u2019s how I learned companies like buying training.<\/p>\n<p>Otherwise, my steampunk take on writing was proving tricky to sell. It was all too abstract. What I needed was a dashboard: pretty charts, with real numbers, crunched from live data. A pipeline did the crunching, and I hired a small team to do the pretty\u00a0charts.<\/p>\n<p><iframe loading=\"lazy\" src=\"https:\/\/cdn.embedly.com\/widgets\/media.html?src=https%3A%2F%2Fplayer.vimeo.com%2Fvideo%2F1038176512%3Fapp_id%3D122963&amp;dntp=1&amp;display_name=Vimeo&amp;url=https%3A%2F%2Fvimeo.com%2F1038176512&amp;image=https%3A%2F%2Fi.vimeocdn.com%2Fvideo%2F1960321860-fccae5ab38b85c30ae2fec7c810e6f6d078d59c01c7f9980e458ec65ee6e21e6-d_1280&amp;type=text%2Fhtml&amp;schema=vimeo\" width=\"1280\" height=\"720\" frameborder=\"0\" scrolling=\"no\"><a href=\"https:\/\/medium.com\/media\/390bcbefe41a5a37c51aa4fdef0744ab\/href\">https:\/\/medium.com\/media\/390bcbefe41a5a37c51aa4fdef0744ab\/href<\/a><\/iframe><\/p>\n<p>Within the dashboard, two charts showed a breakdown of topics, and the rest dissected the writing style. I\u2019ll say a few words about this\u00a0choice.<\/p>\n<p>Everyone believes what they say matters. If others don\u2019t care, really it\u2019s a <em>moral<\/em> failure, of weighing style over substance. A bit like how bad taste is something only other people\u00a0have.<\/p>\n<p>Scientists have counted clicks, tracked eyes, monitored scrolls, timed attention. We know it takes a split second for readers to decide whether something is \u201cfor them\u201d, and they decide by vaguely comparing new information to what they already like. Style is an entry\u00a0pass.<\/p>\n<h4><strong>What The Dashboard Showed<\/strong><\/h4>\n<p>Before, I hadn\u2019t been tracking the data being collected, but now I had all those pretty charts. And they were showing I had been both right, and very, very\u00a0wrong.<\/p>\n<p>Initially, I only had direct knowledge of a few large investment firms, and had suspected their competitors\u2019 flows look much the same. This proved\u00a0correct.<\/p>\n<p>But I had also assumed that slightly smaller companies would have only slightly fewer outputs. This just isn\u2019t\u00a0true.<\/p>\n<p>Text analytics proved helpful if a company already had writing production capacity. Otherwise, what they needed was a working factory. There were too few companies in the first bucket, because everyone else was crowding the\u00a0second.<\/p>\n<h4><strong>Epilogue<\/strong><\/h4>\n<p>As a product, text analytics has been a mixed bag. It made some money, could have probably made some more, but was unlikely to become a runaway\u00a0success.<\/p>\n<p>Also, I\u2019d lost my appetite for the <em>New Yorker<\/em>. At some point it all tipped too far on the side of formulaic, and the magic was\u00a0gone.<\/p>\n<p>Words are now in their wholesale era, what with large language models like ChatGPT. Early on, I considered applying pipelines to discern whether text is machine generated, but what would be the\u00a0point?<\/p>\n<p>Instead, in late 2023 I began working on a solution that helps companies expand their capacity to write for expert clients. It\u2019s an altogether different adventure, still in its\u00a0infancy.<\/p>\n<p>In the end, I came to think of text analytics as an extra pair of glasses. On occasion, it turns fuzziness sharp. I keep it in my pocket, just in\u00a0case.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/medium.com\/_\/stat?event=post.clientViewed&amp;referrerSource=full_rss&amp;postId=289e0a33793f\" width=\"1\" height=\"1\" alt=\"\"><\/p>\n<hr>\n<p><a href=\"https:\/\/towardsdatascience.com\/is-complex-writing-nothing-but-formulas-289e0a33793f\">Is Complex Writing Nothing But Formulas?<\/a> was originally published in <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a> on Medium, where people are continuing the conversation by highlighting and responding to this story.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Vered Zimmerman<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/medium.com\/m\/global-identity-2?redirectUrl=https%3A%2F%2Ftowardsdatascience.com%2Fis-complex-writing-nothing-but-formulas-289e0a33793f\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Is Complex Writing Nothing But Formulas? Text analytics hints at how volumes of writing get\u00a0created In the broadest of strokes, Natural Language Processing transforms language into constructs that can be usefully manipulated. Since deep-learning embeddings have proven so powerful, they\u2019ve also become the default: pick a model, embed your data, pick a metric, do some [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,247,83,240,260,680],"tags":[267,16,681],"class_list":["post-567","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-automation","category-data-science","category-editors-pick","category-nlp","category-text-analytics","tag-but","tag-so","tag-writing"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/567"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=567"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/567\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=567"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=567"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=567"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}