Team 3 COVID 19 data challenge
This repo represents work that Team 3 is doing in tackling the socqe data challenge.
Original dataset obtained through webhose.io: DATASET3: ENGLISH NEWS ARTICLES THAT MENTION "CORONA VIRUS" OR "CORONAVIRUS" OR "COVID" (BY WEBHOSE.IO) Link: https://webhose.io/free-datasets/news-articles-that-mention-corona-virus/
AllSides data (bias data) obtained using work by harry-wood and sautumn: https://github.com/harry-wood/AllSides-Scraper/tree/update-scraper
NOTE: you must update the paths in order for this to work. This includes the input and output paths!
Order of the scripts used to clean the data:
rename.rbclean.rb- create a postgres database
- I ran the code from
create_table.sqlcommand by command in thepsqlUNIX utility.
To get access to the transformed main data, you should the csv data.