SpotifyData

Data Engineering Starter Project - Spotify Service

Python service which provides an API to simulate some Spotify features. It is a system that saves Spotify data on a relational database, and can be accessed through an API. It can be accessed both through the API routes or through a connection to the Database.

Author

Juan Sebastián Vargas C.

Description

The source of information is the Spotify web public API. The level of detail and data features are described below in the Model Section. To access the API data, a framework called Spotipy was used. The reasons for this are the following:

PostgreSQL was chosen as well, given it is one the best relational databases, supported by the community and frequently used in Data Engineering projects.

Flask framework was used given it is lightweight and works seamlessly with SQLAlchemy, a tool that was mandatory to this project.

A REST API was built using Marshmallow to build JSON schemas, providing the DB objects fetched by SQLAlchemy.

Bonuses

Here you can check some! Artist Most Popular Track Followers per Artist Tracks per album

Running the application

  1. First get your Spotify Credentials on https://developer.spotify.com/dashboard/login. Register your app, making note of your Client ID and your App Secret. If you don’t already have a Spotify account, you will need to create one. If you want to know more about the Spotify Web API, check this article

  2. Go to the Docker Compose File, then go to the pythonapp section. Go to environment and fill the SPOTIPY_CLIENT_ID and SPOTIPY_CLIENT_SECRET variables, you can leave SPOTIPY_REDIRECT_URI as it is.

  3. Now go to your terminal and build the containers.

docker-compose build

  1. Now you can run them all detached.

docker-compose up -d

  1. Finally, you can run docker-compose exec pythonapp python3 make_pgdb.py to create the Database. Now you’ll be able to check the database. If you want to reset it you could use docker-compose exec pythonapp python3 drop_pgdb.py as well.

You will see three containers:

You can change this configuration on the docker-compose.yml file

What you will see

First go to the browser or use a tool such as Postman to check the API. Make the GET requests in this order (just for the very first time):

  1. Countries
  2. Playlists
  3. Update Tracks Popularity
http://127.0.0.1:4000/countries
http://127.0.0.1:4000/playlists
http://127.0.0.1:4000/tracks/update-popularity

Downloading and saving to the database could take some minutes for some objects.

Then you can use the rest of the routes to pull: tracks, artists, albums, countries and playlists. You can use it as Backend service to serve a Frontend side, or you can connect a DataViz tool to create some nice Visualizations.

Docker Container

Link to Docker File

To run in a local environment

Requirements

# capture requirements to install
pip freeze > requirements.txt

# install requirements from requirements.txt
pip install -r requirements.txt

Environment Variables - Flask

export FLASK_ENV=development
export FLASK_APP=run.py

export SPOTIPY_CLIENT_ID=''
export SPOTIPY_CLIENT_SECRET=''
export SPOTIPY_REDIRECT_URI='https://localhost:8888/callback/'

Database Migrations in Flask

from application import *

db.drop_all() # If there's updates on Columns or new models, it's necessary to drop the DB
db.create_all() # It'll create the DB.

Tree Structure

.
├── README.md
├── SpotipyTests
│   ...
├── __pycache__
│   ...
├── application
│   ├── __init__.py
│   ├── __pycache__
│   │   ├── __init__.cpython-39.pyc
│   │   ├── iso_country_codes.cpython-39.pyc
│   │   ├── models.cpython-39.pyc
│   │   ├── routes.cpython-39.pyc
│   │   └── spotipy_methods.cpython-39.pyc
│   ├── iso_country_codes.py
│   ├── models.py
│   ├── routes.py
│   └── spotipy_methods.py
├── documents
│   ├── Database ER diagram GENERAL.png
│   ├── Drafts
│   │   ├── Blank diagram - UML Class.pdf
│   │   └── UML-Class Diagram 1.png
│   └── UML Class Diagram.png
├── img
│   ...
├── requirements.txt
├── run.py
└── spotify_playlist.txt

8 directories, 49 files