Unique Presentation Identifier:
87
Program Type
Honors
Faculty Advisor
Dr. Herbert Brown
Document Type
Presentation
Loading...
Location
Online
Start Date
9-4-2026 8:00 AM
Abstract
This project examines whether Spotify audio features can be used to predict song popularity using data analytics techniques. With the growth of music streaming platforms, large datasets of audio characteristics provide new opportunities to analyze patterns in listener engagement. The research question guiding this study asks whether measurable musical attributes can meaningfully explain variation in Spotify’s popularity score. To investigate this question, a dataset containing Spotify audio features was obtained from Kaggle and analyzed using Python. After cleaning the data and removing missing values, 1,486 songs remained for analysis. Categorical variables such as genre and musical key were converted into numerical variables using one-hot encoding so they could be included in statistical models.
Two predictive approaches were applied: multiple linear regression and a Random Forest regression model. The regression model tested whether linear relationships between audio features and encoded variables could explain popularity. The Random Forest model was used to capture potential nonlinear relationships and produced feature importance rankings for variables such as acousticness, liveness, loudness, and valence.
Overall, the findings suggest that audio characteristics alone are not strong predictors of song popularity. Instead, popularity likely depends on additional factors such as marketing exposure, playlist placement, and artist recognition. This project demonstrates how machine learning methods can be applied to entertainment datasets while highlighting the challenges of predicting cultural outcomes using limited data.
Recommended Citation
Smith, Natalie, "Predicting Song Popularity with Data Analytics" (2026). ATU Scholars Symposium. 9.
https://orc.library.atu.edu/atu_rs/2026/2026/9
Predicting Song Popularity with Data Analytics
Online
This project examines whether Spotify audio features can be used to predict song popularity using data analytics techniques. With the growth of music streaming platforms, large datasets of audio characteristics provide new opportunities to analyze patterns in listener engagement. The research question guiding this study asks whether measurable musical attributes can meaningfully explain variation in Spotify’s popularity score. To investigate this question, a dataset containing Spotify audio features was obtained from Kaggle and analyzed using Python. After cleaning the data and removing missing values, 1,486 songs remained for analysis. Categorical variables such as genre and musical key were converted into numerical variables using one-hot encoding so they could be included in statistical models.
Two predictive approaches were applied: multiple linear regression and a Random Forest regression model. The regression model tested whether linear relationships between audio features and encoded variables could explain popularity. The Random Forest model was used to capture potential nonlinear relationships and produced feature importance rankings for variables such as acousticness, liveness, loudness, and valence.
Overall, the findings suggest that audio characteristics alone are not strong predictors of song popularity. Instead, popularity likely depends on additional factors such as marketing exposure, playlist placement, and artist recognition. This project demonstrates how machine learning methods can be applied to entertainment datasets while highlighting the challenges of predicting cultural outcomes using limited data.