{"id":1352,"date":"2025-01-22T07:03:00","date_gmt":"2025-01-22T07:03:00","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/01\/22\/lyrec-a-song-recommender-that-reads-between-the-lyrics-eca5ea2ae8c8\/"},"modified":"2025-01-22T07:03:00","modified_gmt":"2025-01-22T07:03:00","slug":"lyrec-a-song-recommender-that-reads-between-the-lyrics-eca5ea2ae8c8","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/01\/22\/lyrec-a-song-recommender-that-reads-between-the-lyrics-eca5ea2ae8c8\/","title":{"rendered":"LyRec: A Song Recommender That Reads Between the Lyrics"},"content":{"rendered":"<p>    LyRec: A Song Recommender That Reads Between the Lyrics<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<h4>This is how I built an emotionally intelligent LLM-powered song recommendation system.<\/h4>\n<figure><img decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1024\/0*ZKFSA1Mk60ex3J3o\"><figcaption>Photo by <a href=\"https:\/\/unsplash.com\/@davfts?utm_source=medium&amp;utm_medium=referral\">David Pup\u0103z\u0103<\/a> on\u00a0<a href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\">Unsplash<\/a><\/figcaption><\/figure>\n<p>Do you remember the last time you found yourself obsessing over a song? Maybe it was the raw emotion that resonated with you, or perhaps it was the lyrics that kept you hooked. Or maybe you loved the story it tells. Wouldn\u2019t it be nice if there was a way to find songs that express a similar emotion, share similar lyrical elements, or paint similar\u00a0imagery?<\/p>\n<p>In this article, I\u2019ll show you how I built <strong><em>LyRec <\/em><\/strong>(see what I did there? \ud83d\ude09), a recommendation system for songs that lets you do this! Here\u2019s a little demo (it runs on my\u00a0Mac!).<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/800\/1%2AfEr12zSJf0XGBOQpXbC4Sg.gif?ssl=1\"><\/figure>\n<p>Don\u2019t worry; you won\u2019t need to know the nitty-gritty of recommendation systems to understand this. Along the way, I\u2019ll explain my design choices, and hopefully, you\u2019ll learn a thing or two about semantic search and Retrieval-Augmented Generation (RAG). So let\u2019s get\u00a0started!<\/p>\n<h3>Settings the Goals\u00a0\ud83c\udfaf<\/h3>\n<p>First, I needed to set some <strong>realistic<\/strong> goals for <strong><em>LyRec<\/em><\/strong>. Here\u2019s the list I finalized.<\/p>\n<blockquote><p>1. Given a (seen or unseen) song lyrics, <strong>LyRec<\/strong> should be able to suggest similar songs from its database.<br \/>2. Given a free-form text input, <strong>LyRec<em> <\/em><\/strong>should be able to find songs that match this description. The text input may describe an emotion, mood, story, specific elements, and so on.<br \/>3. Given both lyrics and description, <strong>LyRec<em> <\/em><\/strong>should be able to find the best matching\u00a0songs.<\/p><\/blockquote>\n<h3>The Approach\u00a0\ud83e\udd14<\/h3>\n<p>With the goals set, it was time to come up with a suitable approach for these tasks. I decided to go with an <strong>embedding-based<\/strong> <strong>semantic search<\/strong>. If you are unfamiliar with this concept, here\u2019s a quick overview.<\/p>\n<blockquote><p>\n<strong>Semantic search<\/strong> is a technique where we search (given a query) a database by understanding the contextual meaning of words rather than relying solely on keyword matching. The increasingly popular way of doing this is with higher dimensional <strong>vector embeddings<\/strong>. First, using a language model, we compute a vector representation (embedding) for each entry in our database and store it alongside. When a new query comes, we generate its embedding using the same model and compute the similarity between the query embedding and all the embeddings stored in the database. Finally, we return the entry with the highest embedding similarity.<\/p><\/blockquote>\n<p>For <strong><em>LyRec<\/em><\/strong>, the entries would be song lyrics, and the query would be either new song lyrics or a user-written free-form text. So, we would essentially compute semantic similarity between lyrics embeddings and the query. Initially, this is where I\u00a0stopped.<\/p>\n<p>After giving it some more thought, I realized that while computing the similarity between two lyrics embeddings was fine, it was probably not ideal to compute the similarity between textual description and lyrics. So, I decided to generate another set of embeddings that would capture the description of the songs. But how do I get those song descriptions? Easy! I just asked another LLM to summarize each song in our database. Then, I used the previous model to generate embeddings from these summaries. So, when the query is a free-form text, <strong><em>LyRec<\/em><\/strong> would use these summary embeddings to compute semantic similarities.<\/p>\n<p>Okay, now <strong><em>LyRec<\/em><\/strong> has two ways to find similar songs. But what if we wanted to use both song lyrics and a description for a recommendation? Well, there could be many ways to combine lyrics and summary similarity scores. But I took a slightly different route of <strong>re-ranking<\/strong>.<\/p>\n<blockquote><p>In RAGs, <strong>re-ranking<\/strong> is often done with a model (trained for predicting relevance scores), which is more accurate (but costlier) than embedding similarity. The initially retrieved documents (based on embedding similarity) and the query are passed to this re-ranker, which assigns a relevance score to each document. Then, based on this new score, the documents are reordered.<\/p><\/blockquote>\n<p>Taking inspiration from this, I came up with the following approach. First, <strong><em>LyRec<\/em><\/strong> will fetch the most similar songs based on the lyrics embedding, and then these songs will be re-ranked based on the summary embedding similarity scores. It\u2019s worth pointing out that I am not using a re-ranker model. Instead, I\u2019m just using a different embedding for the re-ranking step. You may ask, why not use summary embeddings in the first step and lyrics embeddings in the second? Well, to be honest, I don\u2019t have a good answer to that. I just preferred the outputs more (for a small set of queries) using this method. Maybe you can try the other\u00a0way!<\/p>\n<p>I hope the overall approach is now clear. Let\u2019s get into the implementation!<\/p>\n<h3>The Implementation \ud83d\udc68\u200d\ud83d\udcbb<\/h3>\n<h4>Dataset<\/h4>\n<p>Of course, the first thing I needed was a song lyrics dataset. Fortunately, I found one on Kaggle! This dataset is under a Creative Commons (CC0: Public Domain)\u00a0license.<\/p>\n<p><a href=\"https:\/\/www.kaggle.com\/datasets\/notshrirang\/spotify-million-song-dataset\/data\">Spotify Million Song Dataset<\/a><\/p>\n<p>This dataset contains about <em>60K<\/em><strong><em> <\/em><\/strong>song lyrics along with the title and artist name. I know <em>60K<\/em> might not cover all the songs you love, but I think it\u2019s a good starting point for\u00a0<strong><em>LyRec.<\/em><\/strong><\/p>\n<pre>songs_df = pd.read_csv(f\"{root_dir}\/spotify_millsongdata.csv\")<br>songs_df = songs_df.drop(columns=[\"link\"])<br>songs_df[\"song_id\"] = songs_df.index + 1<\/pre>\n<p>I didn\u2019t need to perform any pre-processing on this data. I just removed the <em>link<\/em> column and added an <em>ID<\/em> for each\u00a0song.<\/p>\n<h4>Models<\/h4>\n<p>I needed to select two LLMs: One for computing the embeddings and another for generating the song summaries. Picking the correct LLM for your task may be a little tricky because of the sheer number of them! It\u2019s a good idea to look at the leaderboard to find the current best ones. For the embedding model, I checked the MTEB leaderboard hosted by HuggingFace.<\/p>\n<p><a href=\"https:\/\/huggingface.co\/spaces\/mteb\/leaderboard\">MTEB Leaderboard &#8211; a Hugging Face Space by mteb<\/a><\/p>\n<p>I was looking for a smaller model (obviously!) without compromising too much accuracy; hence, I decided on <a href=\"https:\/\/huggingface.co\/Alibaba-NLP\/gte-Qwen2-1.5B-instruct\"><strong>GTE-Qwen2-1.5B-Instruct<\/strong><\/a>.<\/p>\n<pre>from sentence_transformers import SentenceTransformer<br>import torch<br><br>model = SentenceTransformer(<br>    \"Alibaba-NLP\/gte-Qwen2-1.5B-instruct\",<br>    model_kwargs={\"torch_dtype\": torch.float16}<br>)<\/pre>\n<p>For the summarizer, I just needed a small enough instruction following LLM, so I went with <a href=\"https:\/\/huggingface.co\/google\/gemma-2-2b-it\"><strong>Gemma-2\u20132b-It<\/strong><\/a>. In my experience, it\u2019s one of the best small models as of\u00a0now.<\/p>\n<pre>import torch<br>from transformers import pipeline<br><br>pipe = pipeline(<br>    \"text-generation\",<br>    model=\"google\/gemma-2-2b-it\",<br>    model_kwargs={\"torch_dtype\": torch.bfloat16},<br>    device=\"cuda\",<br>)<\/pre>\n<h4>Pre-computing the Embeddings<\/h4>\n<p>Computing the lyrics embeddings was pretty straightforward. I just used the\u00a0<em>.encode(\u2026)<\/em> method with a <em>batch_size<\/em> of 32 for faster processing.<\/p>\n<pre>song_lyrics = songs_df[\"text\"].values<br><br>lyrics_embeddings = model.encode(<br>    song_lyrics,<br>    batch_size=32,<br>    show_progress_bar=True<br>)<br><br>np.save(f\"{root_dir}\/60k_song_lyrics_embeddings.npy\", lyrics_embeddings)<\/pre>\n<p>At this point, I stored these embeddings in a\u00a0<em>.npy<\/em> file. I could have used a more structured format, but it did the job for\u00a0me.<\/p>\n<p>Coming to the summary embeddings, I first needed to generate the summaries. I had to ensure that the summary captured the emotion and the song\u2019s theme while not being too lengthy. So, I came up with the following prompt for\u00a0Gemma-2.<\/p>\n<pre>You are an expert song summarizer. <br>You will be given the full lyrics to a song. <br>Your task is to write a concise, cohesive summary that <br>captures the central emotion, overarching theme, and <br>narrative arc of the song in 150 words.<br><br>{song lyrics}<\/pre>\n<p>Here\u2019s the code snippet for summary generation. For simplicity, the following shows a sequential processing. I have included the batch-processing version in the GitHub\u00a0repo.<\/p>\n<pre>def get_summary(song_lyrics):<br>    messages = [<br>        {\"role\": \"user\", <br>         \"content\": f'''You are an expert song summarizer. <br>You will be given the full lyrics to a song. <br>Your task is to write a concise, cohesive summary that <br>captures the central emotion, overarching theme, and <br>narrative arc of the song in 150 words.nn{song_lyrics}'''},<br>    ]<br><br>    outputs = pipe(messages, max_new_tokens=256)<br>    assistant_response = outputs[0][\"generated_text\"][-1][\"content\"].strip()<br>    return assistant_response<br><br>songs_df[\"summary\"] = songs_df[\"text\"].progress_apply(get_description)<\/pre>\n<p>Unsurprisingly, this step took the most time. Luckily, this needs to be done only once, and of course, when we want to update the database with new\u00a0songs.<\/p>\n<p>Then, I computed and stored the embedding just like the last\u00a0time.<\/p>\n<pre>song_summary = songs_df[\"summary\"].values<br><br>summary_embeddings = model.encode(<br>    song_summary,<br>    batch_size=32,<br>    show_progress_bar=True<br>)<br><br>np.save(f\"{root_dir}\/60k_song_summary_embeddings.npy\", summary_embeddings)<\/pre>\n<h4>Vector Search<\/h4>\n<p>With the embeddings in place, it was time to implement the semantic search based on embedding similarity. There are a lot of awesome open-source vector databases available for this job. I decided to use a simple one called <strong>FAISS<\/strong> (Facebook AI Similarity Search). It just takes two lines to add the embeddings into the database. First, we create a FAISS index. Here, we need to mention the similarity metric you want to utilize for searching and the dimension of the vectors. I used the <strong>dot product<\/strong> (inner product) as the similarity measure. Then, we add the embeddings to the\u00a0index.<\/p>\n<blockquote><p>Note: Our database is small enough to do an exhaustive search using dot product. For larger databases, it\u2019s recommended to perform an approximate nearest neighbor (ANN) search. FAISS has support for\u00a0that.<\/p><\/blockquote>\n<pre>import faiss<br><br>lyrics_embeddings = np.load(f\"{root_dir}\/60k_song_lyrics_embeddings.npy\")<br>lyrics_index = faiss.IndexFlatIP(lyrics_embeddings.shape[1])<br>lyrics_index.add(lyrics_embeddings.astype(np.float32))<br><br>summary_embeddings = np.load(f\"{root_dir}\/60k_song_summary_embeddings.npy\")<br>summary_index = faiss.IndexFlatIP(summary_embeddings.shape[1])<br>summary_index.add(summary_embeddings.astype(np.float32))<\/pre>\n<p>To find the most similar songs given a query, we first need to generate the query embedding and then call the\u00a0<em>.search(\u2026) <\/em>method on the index. Under the hood, this method computes the similarity between the query and every entry in our database and returns the top <em>k<\/em> entries and the corresponding scores. Here\u2019s the code performing a semantic search on lyrics embeddings.<\/p>\n<pre>query_lyrics = 'Imagine the last song you fell in love with'<br>query_embedding = model.encode(f'''Instruct: Given the lyrics, <br>retrieve relevant songsnQuery: {query_lyrics}''')<br>query_embedding = query_embedding.reshape(1, -1).astype(np.float32)<br>lyrics_scores, lyrics_ids = lyrics_index.search(query_embedding, 10)<\/pre>\n<p>Notice that I added a simple prompt in the query. This is recommended for this model. The same applies to the summary embeddings.<\/p>\n<pre>query_description = 'Describe the type of song you wanna listen to'<br>query_embedding = model.encode(f'''Given a description, <br>retrieve relevant songsnQuery: {query_description}''')<br>query_embedding = query_embedding.reshape(1, -1).astype(np.float32)<br>summary_scores, summary_ids = summary_index.search(query_embedding, k)<\/pre>\n<blockquote><p>Pro tip: How do you do a sanity check?<br \/>Just put any entry from the database in the query and see if the search returns the same as the top-scoring entry!<\/p><\/blockquote>\n<h4>Implementing the\u00a0Features<\/h4>\n<p>At this stage, I had the building blocks of <strong><em>LyRec<\/em><\/strong>. Now, it was the time to put these together. Remember the three goals I set in the beginning? Here\u2019s how I implemented those.<\/p>\n<p>To keep things tidy, I created a class named <strong><em>LyRec <\/em><\/strong>that would have a method for each feature. The first two features are pretty straightforward to implement.<\/p>\n<p>The method\u00a0.<em>get_songs_with_similar_lyrics(\u2026) <\/em>takes a song (lyrics) and a whole number <em>k <\/em>as input and returns a list of <em>k<\/em> most similar songs based on the lyrics similarity. Each element in the list is a dictionary containing the artist\u2019s name, song title, and\u00a0lyrics.<\/p>\n<p>Similarly,\u00a0<em>.get_songs_with_similar_description(\u2026) <\/em>takes a free-form text and a whole number <em>k <\/em>as input and returns a list of <em>k<\/em> most similar songs based on the description.<\/p>\n<p>Here\u2019s the relevant code\u00a0snippet.<\/p>\n<pre>class LyRec:<br>    def __init__(self, songs_df, lyrics_index, summary_index, embedding_model):<br>        self.songs_df = songs_df<br>        self.lyrics_index = lyrics_index<br>        self.summary_index = summary_index<br>        self.embedding_model = embedding_model<br><br>    def get_records_from_id(self, song_ids):<br>        songs = []<br>        for _id in song_ids:<br>            songs.extend(self.songs_df[self.songs_df[\"song_id\"]==_id+1].to_dict(orient='records'))<br>        return songs<br><br>    def get_songs_with_similar_lyrics(self, query_lyrics, k=10):<br>        query_embedding = self.embedding_model.encode(<br>            f\"Instruct: Given the lyrics, retrieve relevant songsn Query: {query_lyrics}\"<br>        ).reshape(1, -1).astype(np.float32)<br><br>        scores, song_ids = self.lyrics_index.search(query_embedding, k)<br>        return self.get_records_from_id(song_ids[0])<br><br>    def get_songs_with_similar_description(self, query_description, k=10):<br>        query_embedding = self.embedding_model.encode(<br>            f\"Instruct: Given a description, retrieve relevant songsn Query: {query_description}\"<br>        ).reshape(1, -1).astype(np.float32)<br><br>        scores, song_ids = self.summary_index.search(query_embedding, k)<br>        return self.get_records_from_id(song_ids[0])<\/pre>\n<p>The final feature was a little tricky to implement. Recall that we need to first retrieve the top songs based on lyrics and then re-rank them based on the textual description. The first retrieval was easy. For the second one, we only need to consider the top-scoring songs. I decided to create a temporary FAISS index with the top songs and then search for the songs with the highest summary similarity scores. Here\u2019s my implementation.<\/p>\n<pre>def get_songs_with_similar_lyrics_and_description(self, query_lyrics, query_description, k=10):<br>    query_lyrics_embedding = self.embedding_model.encode(<br>        f\"Instruct: Given the lyrics, retrieve relevant songsn Query: {query_lyrics}\"<br>    ).reshape(1, -1).astype(np.float32)<br><br>    scores, song_ids = self.lyrics_index.search(query_lyrics_embedding, 500)<br>    top_k_indices = song_ids[0]<br><br>    summary_candidates = []<br>    for idx in top_k_indices:<br>        emb = self.summary_index.reconstruct(int(idx))<br>        summary_candidates.append(emb)<br>    summary_candidates = np.array(summary_candidates, dtype=np.float32)<br><br>    temp_index = faiss.IndexFlatIP(summary_candidates.shape[1])<br>    temp_index.add(summary_candidates)<br><br>    query_description_embedding = self.embedding_model.encode(<br>        f\"Instruct: Given a description, retrieve relevant songsn Query: {query_description}\"<br>    ).reshape(1, -1).astype(np.float32)<br><br>    scores, temp_ids = temp_index.search(query_description_embedding, k)<br>    final_song_ids = [top_k_indices[i] for i in temp_ids[0]]<br><br>    return self.get_records_from_id(final_song_ids)<\/pre>\n<p>Voila! Finally, <strong><em>LyRec<\/em><\/strong> is ready. You can find the complete implementation on this repo. Please leave a star if you find this helpful!\u00a0\ud83d\ude03<\/p>\n<p><a href=\"https:\/\/github.com\/Suji04\/LyRec\">GitHub &#8211; Suji04\/LyRec: LyRec: Recommending Songs from User Query using LLMs<\/a><\/p>\n<h3>The Result\u00a0\ud83e\udd41<\/h3>\n<h4>Using Lyrics<\/h4>\n<p>Now it\u2019s time to see <strong><em>LyRec<\/em><\/strong> in action. For the first example, I\u2019m taking Ed Sheeran\u2019s <em>Perfect<\/em> <em>\u2764\ufe0f. <\/em>Here are the top few songs suggested by <strong><em>LyRec<\/em><\/strong> solely based on the lyrics. If you listen to these songs (which I highly recommend you do), you\u2019ll find many similar elements to <em>Perfect\u2019s<\/em> lyrics!<\/p>\n<p><iframe loading=\"lazy\" src=\"https:\/\/cdn.embedly.com\/widgets\/media.html?src=https%3A%2F%2Fopen.spotify.com%2Fembed%2Ftrack%2F6LxpgHc7rOPEa89SQNkcWl%3Futm_source%3Doembed&amp;display_name=Spotify&amp;url=https%3A%2F%2Fopen.spotify.com%2Ftrack%2F6LxpgHc7rOPEa89SQNkcWl&amp;image=https%3A%2F%2Fimage-cdn-ak.spotifycdn.com%2Fimage%2Fab67616d00001e02aca693379ab1a76a5fbac026&amp;type=text%2Fhtml&amp;schema=spotify\" width=\"456\" height=\"152\" frameborder=\"0\" scrolling=\"no\"><a href=\"https:\/\/medium.com\/media\/97894843921047d6ee4ae87a475eed46\/href\">https:\/\/medium.com\/media\/97894843921047d6ee4ae87a475eed46\/href<\/a><\/iframe><iframe loading=\"lazy\" src=\"https:\/\/cdn.embedly.com\/widgets\/media.html?src=https%3A%2F%2Fopen.spotify.com%2Fembed%2Ftrack%2F6zC0mpGYwbNTpk9SKwh08f%3Futm_source%3Doembed&amp;display_name=Spotify&amp;url=https%3A%2F%2Fopen.spotify.com%2Ftrack%2F6zC0mpGYwbNTpk9SKwh08f&amp;image=https%3A%2F%2Fimage-cdn-ak.spotifycdn.com%2Fimage%2Fab67616d00001e026f093a6ae88a5ca8ed53b9f7&amp;type=text%2Fhtml&amp;schema=spotify\" width=\"456\" height=\"152\" frameborder=\"0\" scrolling=\"no\"><a href=\"https:\/\/medium.com\/media\/89856ff3c4ab2170bd4979b5ee38943f\/href\">https:\/\/medium.com\/media\/89856ff3c4ab2170bd4979b5ee38943f\/href<\/a><\/iframe><iframe loading=\"lazy\" src=\"https:\/\/cdn.embedly.com\/widgets\/media.html?src=https%3A%2F%2Fopen.spotify.com%2Fembed%2Ftrack%2F5FrQ7gD8TngRoWdbQA5K5f%3Futm_source%3Doembed&amp;display_name=Spotify&amp;url=https%3A%2F%2Fopen.spotify.com%2Ftrack%2F5FrQ7gD8TngRoWdbQA5K5f&amp;image=https%3A%2F%2Fimage-cdn-ak.spotifycdn.com%2Fimage%2Fab67616d00001e025f5a5b5eeacbd0de4d5e75e4&amp;type=text%2Fhtml&amp;schema=spotify\" width=\"456\" height=\"152\" frameborder=\"0\" scrolling=\"no\"><a href=\"https:\/\/medium.com\/media\/e1597bb92259b8628be3c7be66490c4d\/href\">https:\/\/medium.com\/media\/e1597bb92259b8628be3c7be66490c4d\/href<\/a><\/iframe><\/p>\n<h4>Using Description<\/h4>\n<p>Let\u2019s try the search by prompt feature. I gave <strong><em>LyRec <\/em><\/strong>this description.<\/p>\n<blockquote><p>\n<strong>Prompt:<\/strong> I want a dreamy, soft song about reminiscing on childhood memories, with a bittersweet feeling of nostalgia and the desire to return to simpler\u00a0times.<\/p><\/blockquote>\n<p><strong><em>LyRec<\/em><\/strong> obliged my request and returned the following. I think they are pretty good suggestions! Please listen for yourself.<\/p>\n<p><iframe loading=\"lazy\" src=\"https:\/\/cdn.embedly.com\/widgets\/media.html?src=https%3A%2F%2Fopen.spotify.com%2Fembed%2Ftrack%2F6KpLEkhkg0FR4J9x0fbIRP%3Futm_source%3Doembed&amp;display_name=Spotify&amp;url=https%3A%2F%2Fopen.spotify.com%2Ftrack%2F6KpLEkhkg0FR4J9x0fbIRP&amp;image=https%3A%2F%2Fimage-cdn-ak.spotifycdn.com%2Fimage%2Fab67616d00001e024820e7c4a2998d0a06eea546&amp;type=text%2Fhtml&amp;schema=spotify\" width=\"456\" height=\"152\" frameborder=\"0\" scrolling=\"no\"><a href=\"https:\/\/medium.com\/media\/b7116db8fae778caf6b5860fc79af216\/href\">https:\/\/medium.com\/media\/b7116db8fae778caf6b5860fc79af216\/href<\/a><\/iframe><iframe loading=\"lazy\" src=\"https:\/\/cdn.embedly.com\/widgets\/media.html?src=https%3A%2F%2Fopen.spotify.com%2Fembed%2Ftrack%2F0dDept3NG63mlj7iJ8sVho%3Futm_source%3Doembed&amp;display_name=Spotify&amp;url=https%3A%2F%2Fopen.spotify.com%2Ftrack%2F0dDept3NG63mlj7iJ8sVho&amp;image=https%3A%2F%2Fimage-cdn-ak.spotifycdn.com%2Fimage%2Fab67616d00001e02ce8443b89b042130a2e6115b&amp;type=text%2Fhtml&amp;schema=spotify\" width=\"456\" height=\"152\" frameborder=\"0\" scrolling=\"no\"><a href=\"https:\/\/medium.com\/media\/b24180eda162f5f2bd886b38be05dec9\/href\">https:\/\/medium.com\/media\/b24180eda162f5f2bd886b38be05dec9\/href<\/a><\/iframe><iframe loading=\"lazy\" src=\"https:\/\/cdn.embedly.com\/widgets\/media.html?src=https%3A%2F%2Fopen.spotify.com%2Fembed%2Ftrack%2F02yEDRRkdaj37Gh6x0wlQr%3Futm_source%3Doembed&amp;display_name=Spotify&amp;url=https%3A%2F%2Fopen.spotify.com%2Ftrack%2F02yEDRRkdaj37Gh6x0wlQr&amp;image=https%3A%2F%2Fimage-cdn-fa.spotifycdn.com%2Fimage%2Fab67616d00001e02e6d8335484840fbd06de9235&amp;type=text%2Fhtml&amp;schema=spotify\" width=\"456\" height=\"152\" frameborder=\"0\" scrolling=\"no\"><a href=\"https:\/\/medium.com\/media\/e0d2fbc1dd30d18261ef43d5bb004bc1\/href\">https:\/\/medium.com\/media\/e0d2fbc1dd30d18261ef43d5bb004bc1\/href<\/a><\/iframe><\/p>\n<h4>Using Lyrics + Description<\/h4>\n<p>Okay, so, finally, let\u2019s try the last feature that allows both lyrics and description as input. My input to <strong><em>LyRec <\/em><\/strong>is the following.<\/p>\n<blockquote><p>\n<strong>Lyrics:<\/strong> Blinding Lights by The Weeknd<br \/><strong>Prompt:<\/strong> I\u2019m looking for an upbeat pop track that references nighttime energy.<\/p><\/blockquote>\n<p>Let\u2019s see what <strong><em>LyRec<\/em><\/strong> has to offer this\u00a0time.<\/p>\n<p><iframe loading=\"lazy\" src=\"https:\/\/cdn.embedly.com\/widgets\/media.html?src=https%3A%2F%2Fopen.spotify.com%2Fembed%2Ftrack%2F4wDezvrA2mv0JTSHkLwOTW%3Futm_source%3Doembed&amp;display_name=Spotify&amp;url=https%3A%2F%2Fopen.spotify.com%2Ftrack%2F4wDezvrA2mv0JTSHkLwOTW&amp;image=https%3A%2F%2Fimage-cdn-ak.spotifycdn.com%2Fimage%2Fab67616d00001e025a0718a187d947570d56cab3&amp;type=text%2Fhtml&amp;schema=spotify\" width=\"456\" height=\"152\" frameborder=\"0\" scrolling=\"no\"><a href=\"https:\/\/medium.com\/media\/8885eef21dddd644abbb89fa03e23748\/href\">https:\/\/medium.com\/media\/8885eef21dddd644abbb89fa03e23748\/href<\/a><\/iframe><iframe loading=\"lazy\" src=\"https:\/\/cdn.embedly.com\/widgets\/media.html?src=https%3A%2F%2Fopen.spotify.com%2Fembed%2Ftrack%2F7ceSE8oIDnbxNxqz96zOrN%3Futm_source%3Doembed&amp;display_name=Spotify&amp;url=https%3A%2F%2Fopen.spotify.com%2Ftrack%2F7ceSE8oIDnbxNxqz96zOrN&amp;image=https%3A%2F%2Fimage-cdn-ak.spotifycdn.com%2Fimage%2Fab67616d00001e02c648a42b5dad72c8aafceeec&amp;type=text%2Fhtml&amp;schema=spotify\" width=\"456\" height=\"152\" frameborder=\"0\" scrolling=\"no\"><a href=\"https:\/\/medium.com\/media\/818713c690aae19225984faf7c2eeca7\/href\">https:\/\/medium.com\/media\/818713c690aae19225984faf7c2eeca7\/href<\/a><\/iframe><iframe loading=\"lazy\" src=\"https:\/\/cdn.embedly.com\/widgets\/media.html?src=https%3A%2F%2Fopen.spotify.com%2Fembed%2Ftrack%2F6XzfupaB5ikgUMnrFgJBvS%3Futm_source%3Doembed&amp;display_name=Spotify&amp;url=https%3A%2F%2Fopen.spotify.com%2Ftrack%2F6XzfupaB5ikgUMnrFgJBvS&amp;image=https%3A%2F%2Fimage-cdn-fa.spotifycdn.com%2Fimage%2Fab67616d00001e0217dd812df38fed44d6d2036e&amp;type=text%2Fhtml&amp;schema=spotify\" width=\"456\" height=\"152\" frameborder=\"0\" scrolling=\"no\"><a href=\"https:\/\/medium.com\/media\/e6585d907c58fa8225a8b24406855837\/href\">https:\/\/medium.com\/media\/e6585d907c58fa8225a8b24406855837\/href<\/a><\/iframe><\/p>\n<p>I think these are pretty good suggestions! I\u2019d highly encourage you to play with <strong><em>LyRec<\/em><\/strong> on your own. The embedding model is comparatively light-weight and can be run without expensive GPUs. I ran it on my M1 pro. I have included the lyrics dataset (with the generated summaries) and the embeddings in the\u00a0repo.<\/p>\n<h3>The UI\u00a0\u2728<\/h3>\n<p>I don\u2019t want to spend too much time talking about how I build the UI for <strong><em>LyRec<\/em><\/strong> as this is not the focus of this article. You can find the UI code on my repo. I am mentioning a few key points\u00a0here.<\/p>\n<ul>\n<li>ChatGPT helped me create the web\u00a0app!<\/li>\n<li>Tech stack: Flask, HTML,\u00a0CSS<\/li>\n<li>For some reason, FAISS was not working on my Mac, so, I used another similar library called <strong>Annoy<\/strong> (by Spotify!) for the web app. Everything else is kept unchanged.<\/li>\n<\/ul>\n<p>Here are a few screenshots. All images, unless otherwise mentioned, are by the\u00a0author.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AZ9tZOg0ky6SL7uNxvrwR8g.png?ssl=1\"><\/figure>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AfjeNhi--13EedTZUxWHyOw.png?ssl=1\"><\/figure>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AE-vET779SKyQwUIUnodlFw.png?ssl=1\"><\/figure>\n<h3>What\u2019s Next?\u00a0\u23ed\ufe0f<\/h3>\n<p>Now that I have shown you how <strong><em>LyRec<\/em><\/strong> works. Let\u2019s talk about some of the limitations and possible improvements.<\/p>\n<h4>Song Popularity<\/h4>\n<p>While experimenting with <strong><em>LyRec<\/em><\/strong>, I realized it\u2019d sometimes recommend songs that rarely attract any listeners. While it\u2019s great for song (and artist) discovery, I guess popularity can be a helpful signal for quality. So, the final recommendation list may be sorted by song popularity to ensure robustness.<\/p>\n<h4>Song Metadata<\/h4>\n<p>Currently, <strong><em>LyRec<\/em><\/strong> uses only the lyrics, but songs are often associated with rich metadata, e.g., <em>genre, tempo, key, valence score (Spotify), artist name, release date,<\/em> etc. If included in the song summary, these features could improve the search and, hence, the recommendation.<\/p>\n<h4>Prompt Expansion<\/h4>\n<p>Let\u2019s be honest, you don\u2019t always want to write a detailed prompt for the text input. Here, we can use an LLM to write a better prompt from the sloppy user input and then use it as the query. This, in theory, should result in better retrieval.<\/p>\n<p>That\u2019s all I had for you today. I hope you enjoyed the reading. Until next time\u2026Happy learning!<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/medium.com\/_\/stat?event=post.clientViewed&amp;referrerSource=full_rss&amp;postId=eca5ea2ae8c8\" width=\"1\" height=\"1\" alt=\"\"><\/p>\n<hr>\n<p><a href=\"https:\/\/towardsdatascience.com\/lyrec-a-song-recommender-that-reads-between-the-lyrics-eca5ea2ae8c8\">LyRec: A Song Recommender That Reads Between the Lyrics \ud83c\udfb6<\/a> was originally published in <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a> on Medium, where people are continuing the conversation by highlighting and responding to this story.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Sujan Dutta<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/medium.com\/m\/global-identity-2?redirectUrl=https%3A%2F%2Ftowardsdatascience.com%2Flyrec-a-song-recommender-that-reads-between-the-lyrics-eca5ea2ae8c8\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>LyRec: A Song Recommender That Reads Between the Lyrics This is how I built an emotionally intelligent LLM-powered song recommendation system. Photo by David Pup\u0103z\u0103 on\u00a0Unsplash Do you remember the last time you found yourself obsessing over a song? Maybe it was the raw emotion that resonated with you, or perhaps it was the lyrics [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,69,83,71,70,537],"tags":[1405,1407,1406],"class_list":["post-1352","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-artificial-intelligence","category-data-science","category-large-language-models","category-machine-learning","category-recommendation-system","tag-lyrec","tag-lyrics","tag-song"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1352"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=1352"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1352\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=1352"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=1352"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=1352"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}