Sometimes in order to test xpath, need temporary download pages, then use the command line to test is the most convenient, but many web pages that require authentication can not be used directly scrapy shell commands page crawl, so the need to re-structure the request, set cookies and headers.
First install ipython in the current environment with a python in scrapy
# Under python environment PIP install IPython # under conda environment conda install ipython
First enter scrapy shell, it will automatically use ipython
scrapy shell
The cookies turn into a dictionary format
# 指定请求目标的 URL 链接 url = 'https://novel18.syosetu.com/n7016er/31/' # 自定义 Headers 请求头(一般建议在调试时使用自定义 UA,以绕过最基础的 User-Agent 检测) headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'} # 构造需要附带的 Cookies 字典 cookies = {"key_1": "value_1", "key_2": "value_2", "key_3": "value_3"} # 构造 Request 请求对象 req = scrapy.Request(url, cookies=cookies, headers=headers) # 发起 Request 请求 fetch(req) # 在系统默认浏览器查看请求的页面(主要为了检查是否正常爬取到内页) view(response) # 网页响应正文 byte类型 response.body # 网页响应正文 str类型 response.text # xpath选择器 repsonse.xpath()
原文链接:https://blog.csdn.net/u010741500/article/details/100974510