版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/zhao_5352269/article/details/86710646
使用scrapy-redis爬取知乎,当redis中存的数据量多的时候碰到的问题。 解决办法参考:https://blog.csdn.net/song19890528/article/details/38536871 这个最好还是用redis集群比较好,可以去参考崔庆才博客https://cuiqingcai.com/6058.html
2019-01-31 01:11:46 [twisted] CRITICAL: Unhandled error in Deferred:
2019-01-31 01:11:46 [twisted] CRITICAL:
Traceback (most recent call last):
File "D:\Python3\lib\site-packages\twisted\internet\task.py", line 517, in _oneWorkUnit
result = next(self._iterator)
File "D:\Python3\lib\site-packages\scrapy\utils\defer.py", line 63, in <genexpr>
work = (callable(elem, *args, **named) for elem in iterable)
File "D:\Python3\lib\site-packages\scrapy\core\scraper.py", line 183, in _process_spidermw_output
self.crawler.engine.crawl(request=output, spider=spider)
File "D:\Python3\lib\site-packages\scrapy\core\engine.py", line 210, in crawl
self.schedule(request, spider)
File "D:\Python3\lib\site-packages\scrapy\core\engine.py", line 216, in schedule
if not self.slot.scheduler.enqueue_request(request):
File "D:\Python3\lib\site-packages\scrapy_redis\scheduler.py", line 162, in enqueue_request
if not request.dont_filter and self.df.request_seen(request):
File "D:\Python3\lib\site-packages\scrapy_redis\dupefilter.py", line 100, in request_seen
added = self.server.sadd(self.key, fp)
File "D:\Python3\lib\site-packages\redis\client.py", line 1821, in sadd
return self.execute_command('SADD', name, *values)
File "D:\Python3\lib\site-packages\redis\client.py", line 755, in execute_command
return self.parse_response(connection, command_name, **options)
File "D:\Python3\lib\site-packages\redis\client.py", line 768, in parse_response
response = connection.read_response()
File "D:\Python3\lib\site-packages\redis\connection.py", line 638, in read_response
raise response
redis.exceptions.ResponseError: MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error.
2019-01-31 01:11:46 [scrapy.core.scraper] ERROR: Spider error processing <GET https://zhihu.com/people/leonora-yu/following> (referer: https://zhihu.com/people/shui-shuo-cheng-xu-yuan-bu-hui-xie-wen-zhang/following)
Traceback (most recent call last):
File "D:\Python3\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback
yield next(it)
GeneratorExit
大意为:(错误)misconf redis被配置以保存数据库快照,但misconf redis目前不能在硬盘上持久化。用来修改数据集合的命令不能用,请使用日志的错误详细信息。
这是由于强制停止redis快照,不能持久化引起的,运行info命令查看redis快照的状态,如下:
解决方案如下:
运行 config set stop-writes-on-bgsave-error no 命令