Basic data analysis with spotify song attributes dataset

Cabral Juan Andrés

In this tutorial, we delve into fundamental commands for carrying out an initial analysis of a dataset. The dataset that we'll be exploring in this context pertains to spotify song attributes and has been sourced from Kaggle: GeorgeMcIntire

Load dataset

Check for missing values

Check for duplicates

Data types

Generate summary statistics for numeric columns

Unique artists in the dataset

Histograms for numeric variables

Some comments about variables\ Danceability: The distribution seems to be left-skewed, indicating that most songs are considered to be quite danceable. \ Energy: The distribution is fairly symmetrical, suggesting a balance in the dataset between songs with low and high energy. \ Tempo: The distribution of tempo appears to be somewhat normally distributed, with a peak around 120-130 beats per minute.

Loudness and energy show a strong positive correlation, which makes sense as louder songs typically sound more energetic.