pandoraWith a monthly audience of more than 72 million active listeners, Pandora’s lead in the streaming radio market is a significant one. But the Oakland-based company’s advantage over its competitors – which now include Apple’s AAPL -0.09% iTunes Radio –  goes beyond monthly listener metrics. Pandora has garnered its sizeable following by impressively delivering on its promise to “give people music they love”. Indeed the most attractive premise of streaming radio – and the area where Pandora excels – is that listeners can create stations that play songs matched to their individual tastes, with only minimal user input. To do this consistently requires big data and the ability to leverage it in meaningful ways.

Pandora’s chief edge over rival streaming services lies in its eight-year cache of more than 200 million users’ listening habits. In addition to data like age, gender and zip code that you provide at signup, Pandora tracks not only all the songs you’ve liked (thumbs up) and hated (thumbs down), but where and when you listen and which devices you use to do so. While on a per user basis this data is obviously attractive to advertisers (Pandora also offers a paid ad-free subscription option) when you combine it in aggregate among the entire user base and then couple it with the company’s unique song classification system you’ve got a formidable lead in song recommendation that may prove difficult for even Apple to make up. You can read my comparison of personalized stations between iTunes Radio, Pandora and Spotify to see how each service fared head to head.

Musical DNA

To best understand Pandora’s success, however,  requires going back to an idea that was a commercial failure. In 2000, Tim Westergren and a small team of entrepreneurs launched the Music Genome Project. It’s aim was as bold as it’s concept was simple: to create a database of discrete musical characteristics for a given song in order to identify other songs with similar qualities. Essentially the idea requires a two-pronged approach. The first is to break down a song to its “DNA” level, identifying and rating elements like meter, tonality, tempo, instrumentation. Then, armed with enough of these data points you can devise an algorithm to find other songs that have similar if not identical attributes.

Although computer algorithms would be used to create these musical matches, the Music Genome Project team decided that decoding a song’s “DNA” required a human touch. To that end, they hired musicians whose job it was (and still is) to listen to each song and describe it using as many as 450 pre-determined attributes, each of which is to be given a numerical value. These scores are fundamental data points that the algorithm uses to determine song matches.

Attempts to license this music recommendation data to outside companies met with tepid response. So in 2005 Tim Westergren co-founded Pandora, which uses Music Genome Project data to generate the custom playlists that users start by typing in a single artist, genre or composer.

Big Data in action

I recently had a conversation with Pandora’s Chief Scientist (yes, that’s an official title) and VP of Playlists, Eric Bieschke on just what goes into cueing up songs a listener will like, time after time. While the song analysis of the Music Genome Project plays a crucial role in track selection (streaming radio rivals often rely on third party recommendation services like The Echo Nest and Gracenote), Bieschke was quick to stress that having eight years’ worth of very detailed user behavior data at his team’s disposal is equally as important. “With so much information about user listening habits,” he says, “we’ve learned some interesting things about how people listen to music.” Amadou Diallo, Read more