STA141B Project: Lyrics Analysis
For the analysis, visit https://jiahtan.github.io/STA141B/
Files:
Analysis Part I Files
- Tasks0.ipynb, Tasks0.py - Scrape Billboard Top 100 Songs of All Time for Song, Artist and Year.
- Tasks2.ipynb, Tasks2.py - Retrieve lyrics for all 100 entries, scraping from songlyrics.com, deal with a few missing entries.
- NGRAMS.ipynb, NGRAMS.py - Extractng out bigrams and trigrams from all songs.
- ContentDensity.ipynb, ContentDesnity.py - Lexical density (# of content words -noun, verb, adj, adv)
- ExploratoryBokeh.ipynb - Bokeh plot of number of songs by year and notebook widget displaying different Spotify audio features.
- AudioFeaturePlot.py - Same plot as in Spotify audio feature, but generate the html file for the web.
- Sentiment Analysis.ipynb - Conduct Sentiment Analysis on the Top 100 Songs of All Times.
Analysis Part II Files
- LovetoCuss.ipynb - Examines the number times 'love' or cuss words appear in the Year-End Top 100 songs.
- GoldenEra.ipynb, GoldenEra.py - Data exploration - finding a “Golden Period”
- TimeAnalysis.ipynb, TimeAnalysis.py - number of repeat artists, word clouds of lexical words per decade, lexical density per decade
- Similarity.ipynb - Computes and visualizes the similarity between annual/decadal top 100 and all-time top 100
Other
- images - Directory containing plots as images
- Lyric Generation.ipynb - Sourced from an another github repo for fun, not part of the analysis