Will you love your favorite musician’s next album? Whatever you suggested, Shanda Innovations (tech incubator of China’s Shanda corporation) may know better than even you. A Music Data Science Hackathon, which was oraganised by Data Science London, using Kaggle platform, and sponsored by EMI and EMC, challenged data teams to answer this question about taste by developing an algorithm to predict “listener’s appreciation of songs and artists, based on listeners demographics, listeners words of appreciation, and interviews contained in EMI’s One Million Interview Dataset,” as noted on EMI’s music data science website. Shanda won the competition over 138 competitors that vied to create predictive analytics that took into account age, geography and tastes to predict how a person would rate a song. Alex Knapp’s Forbes article on the competition shows that the competing teams’ analyses challenge traditional marketing approaches and beliefs.
In her 2011 Ted talk, “Social Media and the End of Gender,” Johanna Blakley explains that while advertisers are using the “same ole, same ole,” demographic information of age, race and gender, to predict who to market to and how, communities of interest have changed. Now, people are linked not only by identity categories, but according to tastes and preference. In line with Blakley’s research, Knapp notes that Shanda’s analysis shows that age and socioeconomic data “weren’t accurate predictors of songs,” rather, “general interests and attitudes were much better drivers of predictions.”
In a Kaggle blog post about their winning algorithm, the Shanda team shares: “We were very surprised to find that the variation of the track scores given by different people was a lot more than we expected. For instance, User ID 41072 scored 100 to track 156 whereas User ID 41286 gave merely 4 to the same track! It was very interesting to find that people were so different in music preference and we believed that was why so many different types of music existed.” Anthony Goldbloom, Kaggle President, notes that what competitors found with regards to age contradicted common assumptions. According to Goldbloom: “As it turns out, older, retired people were much less discriminating and more open in their musical taste than younger people, which is the opposite of the stereotype.”
Shanda also explains, to develop their analysis Shanda mapped words that participants in EMI’s interview dataset used to describe artists “to some keyword IDs and used these IDs in the logistic regression model, which greatly improved the performance…the main machine learning methods were SVD++ and Logistic Regression.” The team used C++ and Python programming languages and thanks the APEX team for its SVDFeature toolkit, which it also employed. More innovative projects like the EMI’s impressive dataset, hackathons with some of the world’s most brilliant data scientists will undoubtedly produce even more disruptive marketing insights and enhanced music personalization technology.