This crawler crawls a popular Indian baby names website and dumps all names into a single file. It is written using the scrapy framework in Python.
This currently only crawls for boy names. For girl names, edit the url in bachpandotcom/spiders/bachpan.py and replace indian-boy-names with indian-girl-names.
- scrapy
scrapy crawl bachpan -o bachpan.json . This crawls the website and dumps the names into bachpan.json. bachpan.json can be processed using parsejson.py
The results of my scrape are present in bachpan.json and the text file dump is in bachpan.txt