skip to content
danicodes

Hip Hop Leads and Features Reflection

Modeling relationships between artists as a network

Brief: Using a (messy) database of artists and their music, I’ve attempted to find a relationship between the of number of songs a hip-hop artist has ‘lead’ on, versus the number they have ‘featured’ on, to see if this had any potential effect on the artist’s Spotify popularity.

What I realized: Looking back at my first attempt at this problem, I see that I used k-means cluster in an attempt to separate and classify the hip-hop artists in terms of their ratio of leads to features. However, the k-means clustering algorithm is really a supervised learning algorithm. It works by being trained on a dataset with correct classifications. Meaning I would have had to separate and identify some classes of hip-hop artists in order for the algorithm to place the other artists.

Some feedback on my initial approach I’ve decided that this problem would be better modeled as a network.

Truthfully, I had just learned about k-means and was eager to see it in action on some data, whether or not this was ‘right’ to do, I see that there is a clearer way to model the relationship between my variables.

Below is a 3-D graph, showing the relationship between an artist’s number of leads, number of features and the artist’s Spotify popularity rating.

Relationship between Hip-Hop Artists' Popularity and Number of Songs as Leading or Supporting Artist