通常我们执行Scrapy的时候是类似这样的:scrapy crawl spiderName
在加入参数后我们可以这样执行:
scrapy crawl spiderName -a parameter1=value1 -a parameter2=value2
爬虫文件中可以获取这些参数:
class MySpider(Spider):
name = 'myspider'
...
def parse(self, response):
...
if self.parameter1 == value1:
# this is True
# or also
if getattr(self, parameter2) == value2:
# this is also True
通过使用-a可以向爬虫文件中定义的类传递属性,然后在该类中获取该属性即可实现传入自定义参数。
通过命令:scrapy crawl quotes -a num=7来执行
通过-a参数传递的变量其实就是通过初始化方法传递进来的,当然也可以自定义这样的方法:
import scrapy
class QuotesSpider(scrapy.Spider):
name = 'quotes'
allowed_domains = ['quotes.com']
def __init__(self,num='', *args,**kwargs):
super().__init__(*args, **kwargs)
self.num = num
self.start_urls = [f'http://quotes.com/{self.num}']
当然还可以通过getattr来获取对象的属性:
import scrapy
class QuotesSpider(scrapy.Spider):
name = 'quotes'
allowed_domains = ['quotes.com']
def start_requests(self):
num = getattr(self, num, False)
if num:
url = f'hppt://quotes.com/{num}'
yield scrapy.Request(url)