How Music Recommendation Systems Like Spotify Work

How Music Recommendation Systems Like Spotify Work

A recommender system suggests personalized commodities, such as music or products, you may like. A good music recommender system, like Spotify, should be able to automatically detect preferences and generate playlists accordingly.

Collaborative filtering is a popular technique of finding users with similar listening patterns and tastes as us and suggesting conditional differences in music. For novel recommendations, these are songs a user has listened to but the other not if the two have significant overlap in their tastes, listening histories, and habits. Ratings weigh these suggestions. This method, particularly acquiring user ratings, is hard to implement when starting out.

The three components modelled are the users, items, and user-item matching. User modelling, composed of stable and fluid attributes, incorporates differences in users such as geography, gender, age, interests, and music preferences. This also accounts for emotional states such as moods and opinions. Interestingly, intelligence, personality, and users’ music preferences are linked. For instance, agreeableness is correlated with energetic and rhythmic music.

Item modeling describes three different types of metadata — editorial, cultural, and acoustic. Editorial metadata is the album name, contributing musicians, titles, genres, etc. Cultural metadata provides emergent listening trends, categories, and similarities between songs. Acoustic metadata is non-lyrical, incorporating the beat, tempo, pitch, instrument, and mood.

User-item matching classifies listeners into four groups — savants, enthusiasts, casuals and indifferents. These categories determine the amount of music to be discovered and filtered from the long tail of interesting, obscure music in the popularity distribution curve of songs.

To address subjectivity in music, emotion-based and context-based models have been proposed. Emotional modeling often uses 2D valence-arousal axes to represent emotional states when listening to music. Valence is how positive or negative the music is, and arousal is how exciting or calming it is. Perceptual features such as energy, rhythm, temporal and spectral harmony are considered in automatic emotion recognition.

Another emotional model called circumflex which conceptualizes affect in a circle in the following order: pleasure (0°), excitement (45°), arousal (90°), distress (135°), displeasure (180°), depression(225°), sleepiness (270°), and relaxation (315°), is used to measure emotional attributes. Another model, MIREX, categorises emotion into 5 mood clusters listed below.

Context-based models use the internet, such as social media data — likes, comments, tags, friendship networks — from Facebook and Twitter, to make inferences about user preferences.

Collaborative filtering has 3 types: memory-based, model-based, and hybrid collaborative filtering. The memory-based method finds the nearest neighbours within modelled users whose past ratings have high correlations to provide suggestions. The model-based method applies machine learning (ML) techniques incorporating the aforementioned models to predict users’ ratings of music. Hybrid models are the most accurate combining the advantages and dampening the disadvantages of the various methods.

The main problems in music recommendation are popularity bias and complexity of subjectivity. Artists in the long tail end of the popularity curve rarely get exposure because of a lack of listens, hence ratings. Suggesting popular music reduces risk but compromises personalization. Significant human efforts are required to collect user ratings and to evaluate the accuracy of recommendations by establishing a ground truth. Arbitrary listening sequences can puzzle models.

I summarize above this paper.

More content at plainenglish.io