风火编程--爬虫素材及工具方法heades, ua, cookies

版权声明:风火编程, 欢迎指正. https://blog.csdn.net/weixin_42620314/article/details/83592662

设置user-agent的请求头

headers = {'User-Agent': 'Mozilla/5.0(compatible;MSIE9.0;WindowsNT6.1;Trident/5.0;'}

ua列表

user_agent_list = [
"Mozilla/5.0(compatible;MSIE9.0;WindowsNT6.1;Trident/5.0;",
"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)",
"Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1",
"Opera/9.80 (Windows NT 6.1; U; en) Presto/2.8.131 Version/11.11",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Maxthon 2.0)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; TencentTraveler 4.0)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; The World)"
]
user_agent = random.choice(user_agent_list )

处理headers键值对格式

处理直接从f12复制出来的headers
如:

s="""Host: blog.csdn.net
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"""

def make_headers(s):
    """
    把字符串格式的headers转换成字典格式
    :param s: "key:value"
    :return: {"key": "value"}
    """
    r = {k: j for k, j in [i.split(': ') for i in s.split('\n')]}
    return r

处理cookie键值对格式

处理直接从f12复制出来的cookie
如:

s="""__yadk_uid=xR1OjgKQ2CeqmfVrRH1DIkO73khKET08; Hm_ct_6bcd52f51e9b3dce32bec4a3997715ac=1788*1*PC_VC; smidV2=201807251019067decffaff952b0bb01f47a768824572e00a4a65cfd6889290; UN=weixin_42620314; ARK_ID=JS475ff4d4717c679c604192c471bb0153475f; __utma=17226283.973575484.1539159222.1539159222.1539159222.1; __utmz=17226283.1539159222.1.1.utmcsr=baidu|utmccn=(organic)|utmcmd=organic; uuid_tt_dd=10_28867322940-1540532715013-534135; bdshare_firstime=1540540501498; dc_session_id=10_1541137469803.249467; TY_SESSION_ID=466d7b7c-6d1e-4ea0-99b5-3ec79bf848fa; Hm_lvt_6bcd52f51e9b3dce32bec4a3997715ac=1540975415,1541137470,1541159297,1541159379; UserName=weixin_42620314; UserInfo=uZH9oCALKs49rvxo2JZfV9eMAs6epxR1sAi3jPOHS2oocX1epaEo9HeZ1hO4oZ6y%2Brx6oW5e4yFXxOUE4WChVmwoAn1%2BpWKxdBPoPNv4C2DiS%2B21ZjEdkpaAy%2Bc3M6ra45f2OHJeOCFBcA4ThtSuXQ%3D%3D; UserNick=%E9%A3%8E%E7%81%AB%E7%BC%96%E7%A8%8B; AU=878; BT=1541163196688; UserToken=uZH9oCALKs49rvxo2JZfV9eMAs6epxR1sAi3jPOHS2oocX1epaEo9HeZ1hO4oZ6y%2Brx6oW5e4yFXxOUE4WChVmwoAn1%2BpWKxdBPoPNv4C2DiS%2B21ZjEdkpaAy%2Bc3M6rajz2U%2FLxS3fcudt0W6LpqSLuiGoGR2SxS35B4xPXy7TvibWr3xYp%2F1dTamuVKz%2FjO; aliyungf_tc=AQAAAOPWZS7v5gEADFl0cdBgnaTzsfWs; dc_tos=php6u6; Hm_lpvt_6bcd52f51e9b3dce32bec4a3997715ac=1541383135"""

def make_cookie(s):
    """
    把字符串格式的cookie转换成字典
    :param s: "key=value;"
    :return: 字典形式的cookie
    """
    r = {k: j for k, j in [i.split('=', 1) for i in s.split(';')]}
    return r

通过selenium登录并将cookies到cookies.json文件

driver.get("login_url")
# 输入账号密码登录
input()
# 登录成功,可以获取到cookies
cookies = self.driver.get_cookies()
            cookies_json = json.dumps(cookies)
            with open("cookies.json", "w") as f:
                f.write(cookies_json)

读取json文件中的cookies,并添加到driver

with open("cookies.json", "r") as f:
    cookies = json.load(f)
for cookie in cookies:
    driver.add_cookie({cookie["name"]: cookie["value"]})

猜你喜欢

转载自blog.csdn.net/weixin_42620314/article/details/83592662