Simple scrapper, developed using scala 2.12.1, ruippeixotog/scala-scraper and scalatest.
Use of FlatSpecs from scalatest to work in TDD.
Scrapping the website http://craftcans.com/db.php?search=all&sort=beerid&ord=desc&view=text.
Followed the tutorial http://blog.kaggle.com/2017/01/31/scraping-for-craft-beers-a-dataset-creation-tutorial/ to clean the data (work in progress)
git clone https://github.com/ColinLeverger/beer-scraper-scala`
sbt test
sbt run
- Connect to the website and download the HTML
- Parse it with the library scala-scraper
- Create a list of Beers case class objects
- Write CSV
- Done!
See this file. Variables cleaned.