“Men lie, women lie, numbers don’t” – Jay Z
Among the many things rappers like to boast about, some are relatively easy to quantify, like money, whereas rhyming skills are something that have been very difficult to measure – up till now. In this post, I’ll present Raplyzer, a computer program which automatically detects rhymes from rap lyrics and which is used to rank popular rappers based on their average Rhyme factor. I’ll also present another program called BattleBot, which is a search engine for rhyming rap lines based on the algorithm used in Raplyzer.
Rap Rhyming 101
In rap lyrics, assonance, where words don’t have necessarily the same ending but they share a vowel sound, is the most typical form of rhyming nowadays . In multi-syllable rhymes (multis), it is not only the last syllable but multiple syllables that share a vowel sound. For example:
“This is a job – I get paid to sling some raps,
What you made last year was less than my income tax” 
As one author puts it: “Multis are hallmarks of all the dopest flows, and all the best rappers use them” .
Automatic Rhyme Detection
I’ve developed an algorithm called Raplyzer for detecting assonance rhymes. If you’re not interested in the technical details, you might want to skip directly to the results in the next section.
In order to detect rhymes, the key thing is to find matching vowel sound sequences. Unfortunately, vowel sounds can’t be trivially extracted from English text since words are not pronounced as they are written (as opposed to the Finnish language for which I originally developed Raplyzer). Luckily, there’s a great open source speech synthesizer, eSpeak, which can be used to obtain a phonetic transcription of the lyrics.
From the phonetic transcription, we remove everything but vowels and do the following:
- Go through a song word by word.
- For each word, find the longest matching vowel sequence that ends with one of the 15 previous words (start with the last vowels of two words, if they’re the same, proceed to the second to last vowels, third to last, and so on. Proceed ignoring word boundaries until the first non-matching vowels have been encountered).
- Compute the average rhyme length (= Rhyme factor) by averaging the lengths of the longest matching vowel sequences of all words.
When finding the longest matching vowel sequence, we do not accept matches where any of the rhyming words are exactly the same, since typically some phrases, e.g., in the chorus, are repeated several times and these shouldn’t count as rhymes. Also, before running the phonetic transcription, we remove all duplicate lines from the text to normalize the lyrics, as in some cases the lyrics contain the chorus repeated many times, whereas in some cases they might just have “Chorus 4X”. And when matching vowels, we accept certain pairs of vowel phonemes that sound very similar (as specified here).
Most of the rhymes are typically located in the end of a line but since it’s not always straightforward to infer line endings from the lyrics files, rhyme lengths are averaged over all words. This way we can capture not only end rhymes but also internal rhymes. On the other hand, the algorithm might detect some matching vowel sequences that are not intended as rhymes. In order to suppress the effect of such false positives, we only consider the rhymes that consist of at least two vowels.
For more details about how Raplyzer works, you might want to go directly to the source code which is freely available on GitHub or just ask me.
The Longest Multis Are Written by…
I scraped the lyrics of 94 rap artists from a lyrics website. Intro, Outro, Skit, and Interlude tracks where filtered out, leaving me with a total of 10,082 songs. For each artist, I computed the Rhyme factor averaged over all the songs of the artist. The FULL RESULTS can be found HERE. In the table below, I’ve listed the top-5 and a selection of other artists that are the most familiar to me.
|26.||Jedi Mind Tricks||1.067|
|30.||The Notorious B.I.G.||1.059|
|94.||The Lonely Island||0.870|
Some of the results are not too surprising; for instance Rakim, who is #2, is known for “his pioneering use of internal rhymes and multisyllabic rhymes” according to Wikipedia. Similarly, Inspectah Deck (#1) from Wu-Tang Clan uses lots of multis in his lyrics. As a benchmark, I took all the poems by William Shakespeare and computed Shakespeare’s Rhyme factor (0.952). He falls way behind the majority of rappers, which is understandable since multis are not commonly used in poetry.
Here are some of the longest multis detected by Raplyzer (rhyming part is shown in boldface, diphthongs are counted as two distinct vowels):
Tech N9ne — It’s Alive: “Six six triple eight forty-six ninety-nine three / Sick with nickel plates whorry chicks mighty mine be”(15 rhyming vowels)
Shai Linne — Solus Christus: “My vision is clear, my eyesight more vivid / I’m commissioned here by Christ, I’m salt in it”(13 rhyming vowels)
MF Doom — Born Like This: “Dimes quiet as minds by design, mighty fine / Slight rewind, tightly bind, blind lead blind”(12 rhyming vowels)
Rakim — I Know: “You gone love this, it’s marvelous, baby / It gotta thug’s twist-it start to get crazy” (9 rhyming vowels)
One should note that Raplyzer assumes a typical American English pronunciation of the words (as defined by the eSpeak software) so some artists, who use a lot of multis but construct them often by bending words, are not as high on the list as one might expect. One example of such an artist is Eminem who, according to Stat Quo, “makes words rhyme that typically don’t rhyme together—he’s good at that. It’s about how he pronounces it”.
Another way of analyzing rap lyrics computationally is to estimate the size of artists’ vocabularies as famously done by Matt Daniels . In order to get a more holistic picture of the rhyming skills of different rappers, I computed both the Rhyme factor and the vocabulary size and plotted the results. Instead of 35,000 first words, I used 20,000 first words for the vocabulary size estimation in order to be able to include more artists, which seemed to have little effect on the order of the artists. (You can view the figure below in full resolution by clicking it.)
One interesting point made by Daniels is that Jay Z contrasts his lyrical skills with Common and Talib Kweli in his track “Moment of Clarity” saying that he has had to dumb down his lyrics to double his dollars. Daniels points out that both Common and Kweli rank higher on the vocabulary scale, and interestingly, this also holds for the Rhyme factor scale.
“They said I rap like a robot, so call me rap-bot”
To further demonstrate that the algorithm described above is able to find meaningful rhymes and to let people play with it, I decided to create a website called BattleBot, with the help of my friend Stephen Fenech, who took care of web programming. On this site you can “spit” any line that comes to your mind and BattleBot will respond with a list of the best rhyming lines found among the half a million lines from the 94 artists I’ve analyzed for this post. Check it out at:
What sets good rap lyrics apart from the rest is not only how well they are written technically. A good rapper can simultaneously write complex multisyllable rhymes and tell a coherent story. Furthermore, he or she may spice up the lyrics with some clever wordplays and metaphors that make the listener either to think or even laugh.
BattleBot exemplifies the fact that it’s possible to come up with lines that form a technically good rhyme but are otherwise totally unrelated. However, rappers beware! I’m currently working on an improved version of BattleBot which tries to find lines that both rhyme well and are semantically similar 😉
While doing this analysis I’ve discovered several great artists that were previously unknown to me and also learned a lot about rap. I’d like to thank Joonas “Skandaali” Palmgren and Tommi Terä for teaching me many new things about rhyming. I would also like to thank Jouni Harjumäki, for linguistic consultation, Niki Paajala, who suggested several key artists I had overlooked in my initial analyses, and many other people, with whom I have had the chance to talk about this project, giving me lots of useful ideas.
 Edwards, P. How to Rap: The Art and Science of the Hip-Hop MC, 2009.
59 thoughts on “Algorithm That Counts Rap Rhymes and Scouts Mad Lines”
Kewl. Check out this other cornerstone of dropping science: http://genius.com/posts/1669-The-rapper-s-flow-encyclopedia
Analysis about using different flows. Would be a nice addition to have automatic analysis tool for this aspect to measure who’s the best on all meaningful aspects, although I guess it’s much harder to recongize speech (rhythm) over beats – should be doable!
That’s a nice post! Indeed it would be nice to be able to discover that kind of rhytmic patterns by analyzing the soundtrack directly. It could also allow to detect bended rhymes.
Not an expert on speech recognition but I think you’re right that it would be very challenging to recognize speech with a beat in the background. Even for a human it’s sometimes very challenging when it comes to rap 🙂
” be very challenging to recognize speech with a beat in the background. ”
Very true — but you can just remove the beat. People do Vocals Only all the time
Example : https://www.youtube.com/watch?v=T8a-4WOaBK4
Pingback: Der Battlebot findet sich reimende Rap-Lines | Ti_Leo meint:
Pingback: How does Shakespeare compare with modern rap artists? | Northern Bytes
Pretty interesting results, but I definitely see room for improvement though I admit I don’t completely follow the logic for a rhyming vowel. For example MF Doom’s — Born Like This: “Dimes quiet as minds by design, mighty fine / Slight rewind, tightly bind, blind lead blind”(12 rhyming vowels)
Every word except for “as” is part of the rhyme pattern.
Raplyzer assumes that the matching vowel sequences are continuous (i.e. there are no extra vowels within either part of the multisyllabic rhyme). That’s why “as” is breaking the pattern.
This is cool Shai is a beast but there’s more to rapping then rhymes. .. I wonder if the algorithm took into account delivery; cultural relevance and overall song topic focus. Also different styles of rap beats need more dense lyrical content… just a brief mech eng perspective
Hi Justinn. Sure, Raplyzer only measures one aspect of the technical skills of the rapper.
I don’t see the Beastie Boys on the list, could you analyze them? thx
Why does Paleface have rhyme factor of 1.222 at https://blogs.aalto.fi/mining4meaning/2014/08/25/rap_algoritmi/ but 1.132? Is it a consequence of using different data sets, which would mean these two values are not comparable in any way?
Yes, for the Finnish version of this analysis I only used the Finnish lyrics by Paleface, whereas now I’m only using the English lyrics by him.
And in any case the numbers are not comparable. English words are typically shorter and thus lines have more words than in Finnish. Therefore I’m now scanning 15 previous words when looking for a rhyme, whereas for the Finnish version I only scanned 10 previous words.
This is so cool man. Great project here. I wonder if you can test out Bun B, Just curious where he would land on this list.
No love for Chiddy Bang?
You might like my Twitter bot, @gutendelight, which rhymes a line of classic hip-hop with a line of classic literature from Project Gutenberg:
Well, I’m Imp the Dimp, the ladies’ pimp,
What is that to thee, you ugly imp?
He said, “Sit down, punk, I wanna talk to you
Each has his lot, and bears the fate he drew.”
She came into the bar, she came into the scene
You are down and out and don’t you forget it, old bean.
LikeLiked by 1 person
No Canibus? No Big L? No Big Pun?
How is Chief Keef in the 7th place???
Pretty awesome, but I don’t see Eminem on the list of top rhymers. Accurate? You tell me, but my guess is that his words don’t rhyme at face value, he’s just a master of bending sounds to keep rhymes and assonance going line after line.
e.g. – orange four-inch door hinge
I agree, it’s quite impressive how Eminem is able to bend the words to make them rhyme! Although, that particular rhyme you mention, will be recognized by Raplyzer as such.
Will the -ge and -ch also be recognized and not just vowels?
They are both palato-alveolar stop consonants.. one is voiced, the other is unvoiced.
i dont get it,where is too short?where chuck d?flava?sir mix a lot?this machine lies!!!
As a general comment: I tried my best to include all the artists I could find on some top lists and who had enough lyrics available. Nevertheless, people have suggested several artists I think should have been included. If I decide to update the results at some point, I will try to take these suggestions into account.
In the meanwhile, I recommend to try out the program by yourself. If you have any experience in programming, it should be very straightforward: https://github.com/ekQ/raplysaattori
Reblogged this on Ideas, thoughts and Free Software and commented:
Incredible analytics on hiphop and can tell why I love true rappers and measure the way they can outperform other famous but ridiculous rappers that are all images and no substance. I read the article and it goes in depth to what a complex rhyme is about. Multi-sylabic rhymes are not the most easy thing to imagine, but having this open source, you could get some kind of mathematical rating, and measure you favorite songs.
No R.A The Rugged Man?
He’s included in the full results as linked above: http://koti.kapsi.fi/emalmi/raplyzer_results.html
Pingback: Podcast Episode #205 - The FCC Impasse - Stolendroids
Pingback: Créol Brothers | Raplyzer: Algorithm That Counts Rap Rhymes and Scouts Mad Lines
Pingback: Raplyzer: el algoritmo que revela las mejores rimas del hip hop | Noticias TLN
Pingback: Raplyzer: el algoritmo que revela las mejores rimas del hip hop - La Tlayuda News
Uhm…excuse me but, I’m confused. Wiz Khalifa, Nicki Minaj and Lil Wayne are ranked higher than Eminem? Even WITH his bended words he should be WAY high on that list! You’re system is broken and terribly flawed. Points for the ingenuity and effort though.
Awesome idea, and interesting way to calculate one of the world’s most frequently asked and un proven questions ever. There is a million and one rappers so it would take awhile but at least this algorithm is a step in the right direction.
Pingback: Machine-Learning Algorithm Mines Rap Lyrics, Then Writes Its Own « Another Word For It
Pingback: Cet algorithme peut clasher n'importe quel rappeur
No Del the funkee homosapien?(deltron 3030)
If you do include it. Deltron 3030 is the album.
Also. No one from Rhymesayers? Hieroglyphics?
Great joob! I’m really intertested by making my own version of your analyzer for the french language (cuz i’m french). Is battlebot also open source?
Is battlebot also open source? I’m rellay interested by forking your version of the analyzer and making my own for the french language!
Currently, only Raplyzer (https://github.com/ekQ/raplysaattori) is open source. We might open source other components, like BattleBot, in the future, but you should not except this to happen very soon, as it would require quite a lot of cleaning and additional documentation.
However, I’d be very happy to see Raplyzer extended to French and other languages, which should be actually pretty straightforward to do, since eSpeak already supports dozens of different languages.
Hi Eric — Very cool program here. I am an English professor who loves thinking about language and rhyme, but I’m new to both looking at rap and to the whole field of datamining. I wonder if you might be willing to help me out with something. I am interested in writing a bit about the Broadway musical “Hamilton” and I suspect that the lyrics of the musical, overall, would end up being very high on the scale for use of multis. Would it be possible to have your program analyze the text of the lyrics? (They are available on Genius.com: http://genius.com/albums/Lin-manuel-miranda/Hamilton-original-broadway-cast-recording ) If this is something you are interested in and would be willing to run the program on this data, I would really love to see the results. (I printed out the lyrics and have been going through with an actual highlighter, marking the multis. I am a bit of a dinosaur and would love to do this in a more 21st-century way, but don’t have the technical know-how.)
Hi Sara, that’s an interesting idea! I did run the lyrics of the musical through the program, and their rhyme density is 0.915. This result is comparable to the rhyme densities of the rappers in the low-end of the ranking list, which, I think, is pretty impressive for a musical.
It would be, of course, more fair to compare Hamilton with other musicals, and I think it might stand out in that comparison since the lyrics seem to contain a lot of multis. Here are a few multis picked up by the algorithm:
“Diametric’ly opposed, foes.
Previously closed, Bros.”
“Washington hires Hamilton right on sight
But Hamilton still wants to fight not write.”
Oh, thanks so much Eric! I am very excited to have that number. It also occurred to me that it would be interesting to run the songs through individually, since some are more clearly rap-influenced and some are not. And I do think the overall number would stand out in comparison to musicals. I think the fact that the lyrics get close to Ice Cube’s number is great. Another couple of my favorite lengthy multis are:
“Jefferson appears with a dinner and invite/ Madison responds with Virginian insight”
“I’m the oldest and the prettiest and the gossip in New York City is insidious”
There are so many great four- and five-syllable rhymes and other complexities. So glad I happened to find your blog. I will keep following.
Thanks a lot for the interesting article 🙂 I read it almost a year ago via Engadget and because of it I found eSpeak. This helped me a lot in realizing my Rapping.Reviews concept, which is doing NLP in the opposite direction compared to Raplyzer (but it is similar to BattleBot).
Instead of analyzing existing rhymes, Rapping.Reviews automatically analyzes the reviews from Movies and TV Series and presents them in the form of a rap battle music video. So if you’re ever in doubt which movie or serie to watch, or you’re just looking for some entertainment I hope you’ll check out https://rapping.reviews. Currently the site contains raps about more than 13.000 Movies and TV Series.
Some raps work well, some don’t – that’s part of the joke. Personally I liked the rap from ‘The Hateful Eight’: https://rapping.reviews/video/tt3460252/
The rhymes generated by Rapping.Reviews are definitely a lot simpler than the ones you’ve analyzed though (only end rhyme). Anyway I hope you like it.
Thanks for the interesting research! 🙂
P.S. the JetPack comments plugin was acting funny when I submitted my reply earlier, so I’m trying again.
Thanks for your comment and for Rapping.Reviews – really neat work!! Battle rapping is something where I would like to take BattleBot / DeepBeat in the future as well. Let’s continue the discussion over email.
Very cool tool. I’m having a lot of problem getting it to work on my Mac. I seem not to be able to find a command line version of espeak for Mac. Trying to run it on my work PC is even worse: it insists there is no module named NumPy, and all I can do is shout “yes there is! It’s right there!” at the screen, which doesn’t really help. I doubt you have the time to walk someone through it, but if you do, I’d appreciate it. I can see getting a lot of use out of this for my own work.
First make sure you have Homebrew installed. Then you can install eSpeak by:
brew install espeak
You can test that it works by running
espeak -xq “phonetics”
and making sure it outputs: “f@n’EtIks”
NumPy can be installed using pip:
pip install numpy
In case you don’t have pip, you can get it by:
brew install python
Hope this helps!
Clearly I am late to the party here — but had to post a quick comment to highlight the fact that I am going to come back to this particular page to read it in more detail, but more importantly, peep your extended site here and try to hit you up at some point.
I am the other person on the planet who has a deep analytics background, development experience (albeit modest), and has an affinity for multi-syllabic rap lyrics lol…like 3 and a half years ago I dropped https://www.youtube.com/watch?v=jkA4lnsIqKE (“wanna make your own economy?” ), which, while the delivery could stand for plenty of improvement, the lyricism is on point by almost anyone’s standards (and is all based on the same tri-syllabic rhyme).
Anyway dude – just wanted to say that I was GLAD to stumble across this post and your blog, and hope your still active as it’s one of the rare blog sites that truly piqued my interest and made me want to engage — and write a post about the fact!
Always nice to see multiple subject matters and skill-sets converge into something that most people can appreciate from some angle, but few appreciate from all angles.
Thanks a lot for your comment – I don’t feel alone in this world anymore 😛 And cool lyrics indeed!
LikeLiked by 1 person
Pingback: “Hamilton” and Digital Humanities
Pingback: Shai Linne | Wake Up Your Alive Video Review |@ShaiLinne @Chicangeorge @Trackstarz #Throwbacktheology - Trackstarz
Pingback: Analyzing Rap Lyrics – Part 3: Analytical Results and Implications | Dataffiti
Pingback: Rime Rap | Riconoscerle e costruirle | Beatzunami
Please see if you find this useful to improve upon your work.
View at Medium.com
This is gold. Any type of quantifiable data that can be used in the ever popular “Who is the greatest rapper of all-time” is very useful. Right now Black Thought of the Roots is a very hot topic in that debate, especially after his HOT97 verse that dropped a few months ago. Where would he rate on this scale? Im sure people would also love to see Kendrick Lamar, Mos Def, Andre 3000(by himself) and Lupe Fiasco rated as well.
Pingback: The FCC Impasse – Stolendroids
You’re interested enough in rap that you came up with a Rhyme Factor but Kool G Rap isn’t familiar to you?
He is included in the full results which are mentioned in the text: http://koti.kapsi.fi/emalmi/raplyzer_results.html
In fact, he’s almost at the top 11 out of 94.