Spotify Graph Dashboard (Part I): Creating a Spotify Graph with TigerGraph
How to Create a Graph Modelling Spotify Data from Kaggle with TigerGraph
Overview
Objective
Spotify is a digital music, podcast, and video service used by many worldwide. Songs on Spotify have several characteristics, such as danceability, loudness, and so on. Using graph technology, specifically TigerGraph, this data can be mapped onto a graph database and then visualising it. For this first blog, we’ll focus on creating the Spotify graph, then we’ll create Spotify dashboards in subsequent blogs.
Tools
In this blog, we’ll be using:
- TigerGraph (specifically TG Cloud and pyTigerGraph)
- Kaggle (specifically the Multi-Genre Playlists Dataset)
- Colab
Part I: Set Up Your Solution on TG Cloud
First, we’ll set up our solution on TG Cloud. (See a thorough walkthrough here.) To do so, navigate to https://tgcloud.io/, log in or sign up, then click the “My Solutions” tab. Finally, click on the blue “Create Solution” button on the top right.
On the first page, press “Blank.” Since we’ll be loading in our own data, we won’t use a Starter Kit. Press “Next” then press “Next” again for the second page. The second page will create a free solution on TigerGraph
On the third page, modify the details to your solution. The subdomain must be unique. Keep note of your subdomain and password, as we’ll be using it for the Python portion.
Finally, double-check that everything looks good in the final step, then press “Submit” to provision the solution! This might take a few minutes.
Wait till the “Status” of the solution says “Ready.”
Once it’s green, you’re ready to create your schema and load data.
Part II: Connect to the Solution and Create the Schema
Step I: Install and Import pyTigerGraph and Connect to your Solution
Open Google Colab (or a Python file). First, we’ll need to connect to the solution we just created. To do this, we first need to install and import pyTigerGraph. In a Colab notebook, type:
!pip install pyTigerGraph
If you’re running a normal Python file, type this into your terminal:
pip install pyTigerGraph
Once it’s installed, import it.
import pyTigerGraph as tg
Finally, we’ll create a TigerGraph connection. Replace SUBDOMAIN and PASSWORD with your password and subdomain.
conn = tg.TigerGraphConnection(host="https://SUBDOMAIN.i.tgcloud.io/", password="PASSWORD")
Since I used the default password, my connection would look like this:
conn = tg.TigerGraphConnection(host="https://spotify.i.tgcloud.io/", password="tigergraph")
After running this, congrats! You’re now connected to your graph. Next, let’s add a schema.
Step II: Create a Schema
Looking at the Kaggle Spotify dataset, I decided to create four vertices: Genre, Song, Playlist, and Artist. I’ll connect these with edges like the following:
print(conn.gsql('''CREATE VERTEX Genre(PRIMARY_ID name STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"CREATE VERTEX Song(PRIMARY_ID id STRING, name STRING, popularity INT, dancibility DOUBLE, energy_level DOUBLE, energy DOUBLE, key_id INT, loudness DOUBLE, mode INT, speechiness DOUBLE, acousticness DOUBLE, instrumentalness DOUBLE, liveness DOUBLE, valence DOUBLE, tempo DOUBLE, uri STRING, track_href STRING, analysis_url STRING, duration_ms INT, time_signature INT) WITH PRIMARY_ID_AS_ATTRIBUTE="true"CREATE VERTEX Playlist(PRIMARY_ID name STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"CREATE VERTEX Artist(PRIMARY_ID name STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"CREATE UNDIRECTED EDGE SONG_ARTIST(FROM Song, TO Artist)CREATE UNDIRECTED EDGE SONG_PLAYLIST(FROM Song, TO Playlist)CREATE UNDIRECTED EDGE SONG_SGENRE(FROM Song, TO Genre)'''))
Fantastic! Finally, I’ll create the graph, calling it SpotifyGraph and passing as a parameter all the vertices and edges we just created.
print(conn.gsql('''CREATE GRAPH SpotifyGraph(Genre, Song, Playlist, Artist,SONG_ARTIST, SONG_PLAYLIST, SONG_GENRE)'''))
With this, you should be able to see your graph on GraphStudio! Nice job!
Step III: Updating Credentials
Before we proceed, we need to update the connection credentials, adding the graph name and the API token.
conn.graphname = "SpotifyGraph"conn.apiToken = conn.getToken(conn.createSecret())
Great! Now we’re set.
Part III: Load Data
First, let’s read the CSVs located in the train folder using pandas.
import pandas as pdalternative = pd.read_csv("train/alternative_music_data.csv")
indie_alt = pd.read_csv("train/indie_alt_music_data.csv")
rock = pd.read_csv("train/rock_music_data.csv")
blues = pd.read_csv("train/blues_music_data.csv")
metal = pd.read_csv("train/metal_music_data.csv")
hiphop = pd.read_csv("train/hiphop_music_data.csv")
pop = pd.read_csv("train/pop_music_data.csv")
Next, we’ll upsert the data using pyTigerGraph’s upsert dataframes method.
conn.upsertVertexDataFrame(alternative, "Playlist", "Playlist", attributes={"name": "Playlist"})conn.upsertVertexDataFrame(indie_alt, "Playlist", "Playlist", attributes={"name": "Playlist"})conn.upsertVertexDataFrame(rock, "Playlist", "Playlist", attributes={"name": "Playlist"})conn.upsertVertexDataFrame(blues, "Playlist", "Playlist", attributes={"name": "Playlist"})conn.upsertVertexDataFrame(metal, "Playlist", "Playlist", attributes={"name": "Playlist"})conn.upsertVertexDataFrame(hiphop, "Playlist", "Playlist", attributes={"name": "Playlist"})conn.upsertVertexDataFrame(pop, "Playlist", "Playlist", attributes={"name": "Playlist"})conn.upsertVertexDataFrame(alternative, "Artist", "Artist Name", attributes={"name": "Artist Name"})conn.upsertVertexDataFrame(indie_alt, "Artist", "Artist Name", attributes={"name": "Artist Name"})conn.upsertVertexDataFrame(rock, "Artist", "Artist Name", attributes={"name": "Artist Name"})conn.upsertVertexDataFrame(blues, "Artist", "Artist Name", attributes={"name": "Artist Name"})conn.upsertVertexDataFrame(metal, "Artist", "Artist Name", attributes={"name": "Artist Name"})conn.upsertVertexDataFrame(hiphop, "Artist", "Artist Name", attributes={"name": "Artist Name"})conn.upsertVertexDataFrame(pop, "Artist", "Artist Name", attributes={"name": "Artist Name"})conn.upsertEdgeDataFrame(alternative, "Song", "SONG_ARTIST", "Artist", "id", "Artist Name", attributes={})conn.upsertEdgeDataFrame(indie_alt, "Song", "SONG_ARTIST", "Artist", "id", "Artist Name", attributes={})conn.upsertEdgeDataFrame(rock, "Song", "SONG_ARTIST", "Artist", "id", "Artist Name", attributes={})conn.upsertEdgeDataFrame(blues, "Song", "SONG_ARTIST", "Artist", "id", "Artist Name", attributes={})conn.upsertEdgeDataFrame(metal, "Song", "SONG_ARTIST", "Artist", "id", "Artist Name", attributes={})conn.upsertEdgeDataFrame(hiphop, "Song", "SONG_ARTIST", "Artist", "id", "Artist Name", attributes={})conn.upsertEdgeDataFrame(pop, "Song", "SONG_ARTIST", "Artist", "id", "Artist Name", attributes={})conn.upsertEdgeDataFrame(alternative, "Song", "SONG_ARTIST", "Artist", "id", "Artist Name", attributes={})conn.upsertEdgeDataFrame(indie_alt, "Song", "SONG_ARTIST", "Artist", "id", "Artist Name", attributes={})conn.upsertEdgeDataFrame(rock, "Song", "SONG_ARTIST", "Artist", "id", "Artist Name", attributes={})conn.upsertEdgeDataFrame(blues, "Song", "SONG_ARTIST", "Artist", "id", "Artist Name", attributes={})conn.upsertEdgeDataFrame(metal, "Song", "SONG_ARTIST", "Artist", "id", "Artist Name", attributes={})conn.upsertEdgeDataFrame(hiphop, "Song", "SONG_ARTIST", "Artist", "id", "Artist Name", attributes={})conn.upsertEdgeDataFrame(pop, "Song", "SONG_ARTIST", "Artist", "id", "Artist Name", attributes={})conn.upsertEdgeDataFrame(alternative, "Song", "SONG_GENRE", "Genre", "id", "Genre", attributes={})conn.upsertEdgeDataFrame(indie_alt, "Song", "SONG_GENRE", "Genre", "id", "Genre", attributes={})conn.upsertEdgeDataFrame(rock, "Song", "SONG_GENRE", "Genre", "id", "Genre", attributes={})conn.upsertEdgeDataFrame(blues, "Song", "SONG_GENRE", "Genre", "id", "Genre", attributes={})conn.upsertEdgeDataFrame(metal, "Song", "SONG_GENRE", "Genre", "id", "Genre", attributes={})conn.upsertEdgeDataFrame(hiphop, "Song", "SONG_GENRE", "Genre", "id", "Genre", attributes={})conn.upsertEdgeDataFrame(pop, "Song", "SONG_GENRE", "Genre", "id", "Genre", attributes={})conn.upsertVertexDataFrame(alternative, "Song", "id", attributes={"id": "id", "name": "Track Name", "popularity": "Popularity", "dancibility": "danceability", "energy": "energy", "key_id": "key", "loudness": "loudness", "mode": "mode", "speechiness": "speechiness", "acousticness": "acousticness", "instrumentalness": "instrumentalness", "liveness": "liveness", "valence": "valence", "tempo": "tempo", "uri": "uri", "track_href": "track_href", "analysis_url": "analysis_url", "duration_ms": "duration_ms", "time_signature": "time_signature"})conn.upsertVertexDataFrame(indie_alt, "Song", "id", attributes={"id": "id", "name": "Track Name", "popularity": "Popularity", "dancibility": "danceability", "energy": "energy", "key_id": "key", "loudness": "loudness", "mode": "mode", "speechiness": "speechiness", "acousticness": "acousticness", "instrumentalness": "instrumentalness", "liveness": "liveness", "valence": "valence", "tempo": "tempo", "uri": "uri", "track_href": "track_href", "analysis_url": "analysis_url", "duration_ms": "duration_ms", "time_signature": "time_signature"})conn.upsertVertexDataFrame(rock, "Song", "id", attributes={"id": "id", "name": "Track Name", "popularity": "Popularity", "dancibility": "danceability", "energy": "energy", "key_id": "key", "loudness": "loudness", "mode": "mode", "speechiness": "speechiness", "acousticness": "acousticness", "instrumentalness": "instrumentalness", "liveness": "liveness", "valence": "valence", "tempo": "tempo", "uri": "uri", "track_href": "track_href", "analysis_url": "analysis_url", "duration_ms": "duration_ms", "time_signature": "time_signature"})conn.upsertVertexDataFrame(blues, "Song", "id", attributes={"id": "id", "name": "Track Name", "popularity": "Popularity", "dancibility": "danceability", "energy": "energy", "key_id": "key", "loudness": "loudness", "mode": "mode", "speechiness": "speechiness", "acousticness": "acousticness", "instrumentalness": "instrumentalness", "liveness": "liveness", "valence": "valence", "tempo": "tempo", "uri": "uri", "track_href": "track_href", "analysis_url": "analysis_url", "duration_ms": "duration_ms", "time_signature": "time_signature"})conn.upsertVertexDataFrame(metal, "Song", "id", attributes={"id": "id", "name": "Track Name", "popularity": "Popularity", "dancibility": "danceability", "energy": "energy", "key_id": "key", "loudness": "loudness", "mode": "mode", "speechiness": "speechiness", "acousticness": "acousticness", "instrumentalness": "instrumentalness", "liveness": "liveness", "valence": "valence", "tempo": "tempo", "uri": "uri", "track_href": "track_href", "analysis_url": "analysis_url", "duration_ms": "duration_ms", "time_signature": "time_signature"})conn.upsertVertexDataFrame(hiphop, "Song", "id", attributes={"id": "id", "name": "Track Name", "popularity": "Popularity", "dancibility": "danceability", "energy": "energy", "key_id": "key", "loudness": "loudness", "mode": "mode", "speechiness": "speechiness", "acousticness": "acousticness", "instrumentalness": "instrumentalness", "liveness": "liveness", "valence": "valence", "tempo": "tempo", "uri": "uri", "track_href": "track_href", "analysis_url": "analysis_url", "duration_ms": "duration_ms", "time_signature": "time_signature"})conn.upsertVertexDataFrame(pop, "Song", "id", attributes={"id": "id", "name": "Track Name", "popularity": "Popularity", "dancibility": "danceability", "energy": "energy", "key_id": "key", "loudness": "loudness", "mode": "mode", "speechiness": "speechiness", "acousticness": "acousticness", "instrumentalness": "instrumentalness", "liveness": "liveness", "valence": "valence", "tempo": "tempo", "uri": "uri", "track_href": "track_href", "analysis_url": "analysis_url", "duration_ms": "duration_ms", "time_signature": "time_signature"})
Great! Once this is completed, our graph is officially set up! We can now start exploring visualisations on top of our graph.
Part IV: Congrats!
Congrats! Great work on creating this Spotify Graph. Look out for the next blogs to create some awesome visualisations on top of it!
If you have any questions or would like to learn more, feel free to join the TigerGraph Discord:
Good luck with your TigerGraph adventures!