martes, 17 de junio de 2008

Textmining And The Politization of Beatles Lyrics

In this post I will show you a little experiment I designed to use some basic text-mining concepts applied to the analysis of rock lyrics. I not a huge fan of The Beatles, but I really enjoy listening to the some of the latest albums and I questioned myself if those album where the most politically inclined or not. The answer was YES.

I compared the bag (multiset) of word of every Beatles Studio Album with a reference bag of words extracted from the Wikipedia article for Politics. The comparation included two similarity measures, Jaccard similarity and a Sorensen-like similarity I deviced (1.0 means very similar to politics article and 0.0 means nothing in common with the article).


For the former measure (Jaccard) the most political album resulted Sgt. Pepper's Lonely Hearts Club Band and for the latter was The White Album, my favorite Beatles album! This latter measure is more robust because uses word frequencies instead or word # of appearances. After that album a strong de-politization of the lyrics is observed.