scrapy | downloader middleware

1.User-Agent

scrapy默认的由UserAgentMiddleware设置为  "User-Agent": "Scrapy/1.5.1 (+https://scrapy.org)"

一、可以在setting中设置USER-AGENT设置

1 USER_AGENT='Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36'

二、自定义随机user-agent 设置完成后在setting中解放

 1 class RandomMiddlewares(object):
 2     def __init__(self):
 3         self.user_agent=['Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
 4                          'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.133 Safari/534.16',
 5                          'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36',
 6                          'Mozilla/5.0 (compatible; Baiduspider/2.0; - +http://www.baidu.com/search/spider.html)',
 7                          'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)',]
 8 
 9     def process_request(self,request,spider):
10         request.headers['User-Agent']=choice(self.user_agent)

猜你喜欢

转载自www.cnblogs.com/404NooFound/p/10383604.html