Unique Presentation Identifier:

87

Program Type

Honors

Faculty Advisor

Dr. Herbert Brown

Document Type

Presentation

Loading...

Media is loading
 

Location

Online

Start Date

9-4-2026 8:00 AM

Abstract

This project examines whether Spotify audio features can be used to predict song popularity using data analytics techniques. With the growth of music streaming platforms, large datasets of audio characteristics provide new opportunities to analyze patterns in listener engagement. The research question guiding this study asks whether measurable musical attributes can meaningfully explain variation in Spotify’s popularity score. To investigate this question, a dataset containing Spotify audio features was obtained from Kaggle and analyzed using Python. After cleaning the data and removing missing values, 1,486 songs remained for analysis. Categorical variables such as genre and musical key were converted into numerical variables using one-hot encoding so they could be included in statistical models.

Two predictive approaches were applied: multiple linear regression and a Random Forest regression model. The regression model tested whether linear relationships between audio features and encoded variables could explain popularity. The Random Forest model was used to capture potential nonlinear relationships and produced feature importance rankings for variables such as acousticness, liveness, loudness, and valence.

Overall, the findings suggest that audio characteristics alone are not strong predictors of song popularity. Instead, popularity likely depends on additional factors such as marketing exposure, playlist placement, and artist recognition. This project demonstrates how machine learning methods can be applied to entertainment datasets while highlighting the challenges of predicting cultural outcomes using limited data.

Share

COinS
 
Apr 9th, 8:00 AM

Predicting Song Popularity with Data Analytics

Online

This project examines whether Spotify audio features can be used to predict song popularity using data analytics techniques. With the growth of music streaming platforms, large datasets of audio characteristics provide new opportunities to analyze patterns in listener engagement. The research question guiding this study asks whether measurable musical attributes can meaningfully explain variation in Spotify’s popularity score. To investigate this question, a dataset containing Spotify audio features was obtained from Kaggle and analyzed using Python. After cleaning the data and removing missing values, 1,486 songs remained for analysis. Categorical variables such as genre and musical key were converted into numerical variables using one-hot encoding so they could be included in statistical models.

Two predictive approaches were applied: multiple linear regression and a Random Forest regression model. The regression model tested whether linear relationships between audio features and encoded variables could explain popularity. The Random Forest model was used to capture potential nonlinear relationships and produced feature importance rankings for variables such as acousticness, liveness, loudness, and valence.

Overall, the findings suggest that audio characteristics alone are not strong predictors of song popularity. Instead, popularity likely depends on additional factors such as marketing exposure, playlist placement, and artist recognition. This project demonstrates how machine learning methods can be applied to entertainment datasets while highlighting the challenges of predicting cultural outcomes using limited data.