关于Scrapy: 如何分别从Spider、Middleware、Pipeline获取settings.py里的参数

运行环境

PyCharm 2018.1
Scrapy 1.5.1
Python 3.5.4
Windows 10


settings的优先级

官方文档中scrapy中settings参数有四个级别:

  1. 命令行选项(Command line Options)(最高优先级)
  2. 项目设定模块(Project settings module)
  3. 命令默认设定模块(Default settings per-command)
  4. 全局默认设定(Default global settings) (最低优先级)


Spider

  1. 在parse()函数中获取settings
def parse(self, response):
	print(self.settings.get('CONFIG_KEY'))
  1. 在实例化spider时获取settings:
class MySpider:
	def __init__(self, settings, *args, **kwargs):
		super(MySpider, self).__init__( *args, **kwargs)
		print(settings.get('CONFIG_KEY'))
	@classmethod
	def from_crawler(cls, crawler, *args, **kwargs):
		spider = cls(crawler.settings, *args, **kwargs)
		spider._set_crawler(crawler)
		return spider


Middleware & Pipeline

  1. 通过处理方法里传入的spider参数 获取:
    比如Middleware中的process_spider_input方法:
def process_spider_input(response, spider):
	print(spider.settings.get('CONFIG_KEY'))
  1. 在实例化时获取settings:
class MyMiddleware:
	def __init__(self, settings):
		print(setting.get('CONFIG_KEY'))
	@classmethod
	def from_crawler(cls, crawler):
		return cls(crawler.settings)


一个清晰简单但有风险的方法:get_project_setting()

from scrapy.utils.project import get_project_settings
...
def parse(self, response):
	settings = get_project_settings()
	print(settings.get('CONFIG'))
  • pros: 简单明了
  • cons: 不能识别从command line 中传入的参数, command line 传入的参数具有最高优先级


参考文档:

猜你喜欢

转载自blog.csdn.net/weixin_40841752/article/details/82900326