关于Python Scrapy框架 yield scrapy.Request(next_url, call_back="")无法翻页情况解决

错误的代码:


class XXSpider(scrapy.Spider):
    name = 'xxspider'
    allowed_domains = ['https://www.xx.com']
    start_urls = ['https://www.xx.com/ask/highlight/']

 正确的代码:

class XXSpider(scrapy.Spider):
    name = 'xxspider'
    allowed_domains = ['www.xx.com']
    start_urls = ['https://www.xx.com/ask/highlight/']

这里, allowed_domains中域名设置问题, Request需要的是一组域名而不是一组url

还有一情况也会导致yield scrapy.Request()失效:

    系统don't_filter将该Url过滤掉了

解决方案: 

yield scrapy.Request(next_url, call_back=self.parse, dont_filter=True)

猜你喜欢

转载自blog.csdn.net/Li_G_yuan/article/details/81589556