Final Report

Zach Loery (zloery)

What makes music pop?: An exploration of popularity in modern music


For this project, I explored the factors behind popular music, with the end goal of finding a way to predict a song’s popularity based on its features. I acquired features for 1 million songs from the Million Song Database, a collection of about 1 million songs released between 1922 and 2013. I additionally scraped the Billboard Top 100 Chart historical data, and matched songs from the database to this chart to determine their popularity. I ran several machine learning techniques to predict popularity from database features, but ultimately found nothing. I then switched to analyzing artists, and considered the terms used to describe artists as well as two different measures of popularity. This analysis proved more effective, and I found some interesting insights into the definitions of various genres as well as a useful metric for determining whether an artist is rising or falling in popularity.



Data from the Million Song database is easily acquired from their website ( While the full dataset is over 300GB, they also offer a smaller 300MB subset that keeps only metadata about the songs without the musical analysis. They also offer several SQL databases, including one containing the terms associated with each artists (according to Musicbrainz, an online music analysis website.) Due to my lack of music theory background I chose to only use the smaller subset and used 6 simple metadata features (duration, key, loudness, mode, tempo, and time signature) as inputs to the various machine learning algorithms I used.

However, the only measure of song popularity in this database was a measure called “hotttnesss”, an algorithmic blend of sales, number of plays, mentions and reviews in music blogs, and more. Unfortunately, the number given represented “hotttnesss at a certain point in time” – as all songs were pulled into the database in late August 2013, any song released much earlier had a very low hotttnesss, even if it was popular when it was released.

To compensate for this with separate popularity data, I scraped the Billboard Top 100 Chart historical data using a publicly available (but unofficial) API available at I then matched each song in the MSD with the Billboard data by finding entries with the same song title and musician. I did clean the data slightly at this step as the Billboard and MSD handled featured artists differently, with MSD ignoring them and Billboard listing artists as “Artist #1 Featuring Artist #2”. However, other than removing any text occurring after “ Featuring”, I did not modify the data.

Matching using song titles and artists, I found that roughly 1% of the songs in the MSD were present on the Billboard chart at some point, with roughly .5% peaking at position 40 or higher and roughly .8% being on the chart for 15 weeks or more. I used these three metrics to create three separate sets of binary labels (“popular” vs. “unpopular”), which I used as true labels for the purposes of machine learning.

My first attempt at machine learning was to use the labels with the 6 features mentioned above, then train different classifiers (Naïve Bayes, logistic regression, and SVM) to predict popularity. While these classifiers were reaching extremely high levels of accuracy (99+%), I realized they were simply predicting “unpopular” for every song due to the large disparity between unpopular (99 to 99.5%) and popular (.5 to 1%) labels. I also measured the Matthews correlation coefficient, which was under .1 for all classifiers and thus confirmed my suspicion that they were not accurately classifying the data.

This was solved using SMOTE, a well-established method of oversampling that generates new examples of a minority class as combinations of current examples. However, even this was not enough to build an effective classifier, as these new classifiers had accuracies barely above random and continued to have very low Matthews correlation coefficients.

At this point I began making simple scatterplots of features and looking at simple statistics like mean and standard deviation. These simple checks confirmed that there were no significant differences between the classes along any one feature, and ultimately led to me dropping this line of insight. It’s clearer in hindsight that these features would likely not be strong enough to predict popularity, but given that I don’t have much music theory background I felt that I would not be able to create useful features or interpret results that came from using the more complex features available in the dataset.

At this point in the project, I shifted my focus to analyzing the popularity of artists, rather than individual songs. The benefit of this was that artists had two useful features that songs did not – the first was a list of terms describing the artist (generally related to genre, but also location or time period) from Musicbrainz, and the second was a measure called “familiarity”. While the specifics of how this feature are generated are not available, it’s described in the abstract as the probability a random person will recognize this artist. Like hotttnesss, it’s dependent on time, meaning that the database contained the familiarity of each artist on a certain date in 2013, rather than in general.

The first idea I had was to explore different genres of music using the tags. I hypothesized that different genres of music would have different levels of popularity and as such would have more similar artists within the genre. To define the concept of a “genre”, I clustered the artists into groups using KMeans clustering with K=5. With 7943 unique terms, I chose to make the feature vectors simple a binary list of length 7943 where each item represented a specific term being present or not present.

The results of this clustering were promising – the data was split into 5 clusters, and looking at the most common terms of each cluster I found that they roughly split into distinct genres such as “rock/ alternative/indie” and “electronic/house”. Additionally, when performing PCA to reduce the 7943 features to 2 for the sake of plotting, it was clear that the clusters had fairly distinct locations, but varying sizes. This difference in sizes (representing a greater variety of terms within a cluster) supported my hypothesis that certain genres would have more similar artists, and confirmed my thinking that there is useful insight present in the tags for each artist. However, due to time constraints I was unable to explore this area further.

The other interesting aspect of the dataset I explored was the familiarity of each artist. While the field cannot be used with the Billboard data for the same reason as the hotttnesss field (being time dependent, it represents the familiarity of an artist in late 2013 rather than over time), it can be used in combination with hotttnesss to get a better idea of an artist’s real popularity. I hypothesized that the ratio between the two could be a good indication of an artist’s rising or falling popularity: a high ratio of hotttnesss to familiarity would indicate an artist is surging in popularity but still unknown, while a low ratio would indicate an artist people have heard of but no longer listen to.

Taking this ratio for each artist and plotting it, the data seems to confirm my hypothesis fairly strongly. Of the 53,947 artists, those with the lowest ratios are famous older artists (The Kinks, Santana, Crosby and Nash), while those with the highest ratios are up and coming artists at the very beginning of their fame (M.I.A, Missy Elliot). One issue with this ratio, however, is that hotttnesss is heavily biased by recent events. For example, within one week of the data being pulled there was an NSYNC reunion concert and a memorial concert for Elliot Smith, both of whom experienced a surge in hotttnesss and ultimately had very high ratios.

Additionally, anyone actually using this ratio as a metric for popularity would need to threshold at a certain minimum level of familiarity. The very highest ratios come from bands with virtually 0 familiarity – if they have any hotttnesss at all (such as their first review in a music blog), their ratio spikes much higher than artist with any level of familiarity could reach. However, excluding all artists with a familiarity below roughly .4 does a reasonable job of ensuring these outliers aren’t considered, which would be useful for a music label or anyone else concerned with the changing popularities of different artists.



Ultimately, I completed most of what I had planned to do in the midterm report. Even though my initial analysis failed, I did find some useful insights into what genre can help a song be popular and found a useful new metric for clarifying this popularity. As planned in the midterm report, I dropped the idea of splitting the songs by decade of release after I failed to find any meaningful results using the metadata features, though I feel I made up for this with the artist analysis I did instead. Additionally, I had to cut back on the visualizations I made due to time constraints and the fact that my group shrank from 3 people to 1. However, I still feel that I was able to accurately show the conclusions I found, so I consider that part of the project fairly successful.

While this project was a first analysis of the Million Song Database, there are still plenty of useful insights I chose not to explore. In particular, someone with a stronger background in music theory might be able to make use of the wide array of musical analysis features (including timing information of every section and beat) to predict a song’s popularity better than I could. It’s also possible that even these features aren’t enough – especially in modern music, popularity may be as much a combination of the artist’s fame, record label, and social media presence as it is of the music itself, and if this is the case then analysis of the music will ultimately fail to find useful insights. Still, a classifier capable of determining a song’s popularity would be only a step away from a generator of popular music, which would be incredibly lucrative, so it may still be worth investigating.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s