-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Feeding a Spider from Redis
Jeremy Chou edited this page May 30, 2023
·
2 revisions
The class scrapy_redis.spiders.RedisSpider enables a spider to read the urls from redis. The urls in the redis queue will be processed one after another, if the first request yields more requests, the spider will process those requests before fetching another url from redis.
For example, create a file myspider.py with the code below:
from scrapy_redis.spiders import RedisSpider
class MySpider(RedisSpider):
name = 'myspider'
def parse(self, response):
# do stuff
passThen:
- run the spider:
scrapy runspider myspider.py- push urls to redis:
redis-cli lpush myspider:start_urls http://google.comThese spiders rely on the spider idle signal to fetch start urls, hence it may have a few seconds of delay between the time you push a new url and the spider starts crawling it.