Python爬虫 —— 知乎之selenium模拟登陆+requests.Session()获取cookies - 代码天地

Python爬虫 —— 知乎之selenium模拟登陆+requests.Session()获取cookies

其他 2018-06-11 23:22:51 阅读次数: 4

代码如下：

 1 # coding:utf-8
 2 from selenium import webdriver
 3 import requests
 4 import sys
 5 import time
 6 from lxml import etree
 7 # reload(sys)
 8 # sys.setdefaultencoding('utf-8')
 9 
10 class Zhihu:
11     def __init__(self,homeurl):
12         self.homeurl = homeurl
13 
14     def GetCookies(self):
15         browser = webdriver.Chrome()
16         browser.get("https://www.zhihu.com/signin")
17         browser.find_element_by_css_selector(".SignFlow-accountInput.Input-wrapper input").send_keys("13060882373")
18         browser.find_element_by_css_selector(".SignFlow-password input").send_keys("XXXXXX")
19         browser.find_element_by_css_selector(".Button.SignFlow-submitButton").click()
20         time.sleep(3)
21         # js = "window.scrollTo(0, document.body.scrollHeight);"
22         # browser.execute_script(js)
23         # time.sleep(3)
24         cookies = browser.get_cookies()
25         browser.quit()
26         return cookies
27 
28     def Crawl(self):
29         s = requests.Session()
30         s.headers.clear()
31         for cookie in self.GetCookies():
32             s.cookies.set(cookie['name'], cookie['value'])
33         html = s.get(self.homeurl).text
34         html_tree = etree.HTML(html)
35         items = html_tree.xpath('//*[@id="root"]/div/main/div/div/div[1]/div[2]/div//div[@class="ContentItem AnswerItem"]/@data-zop')
36         for item in items:
37             # print item
38             content = eval(item)
39             authorName = content['authorName']
40             title = content['title']
41             print authorName + "回答了：" + title
42 
43 
44 zhihu = Zhihu('https://www.zhihu.com/')
45 zhihu.Crawl()

猜你喜欢

转载自www.cnblogs.com/DOLFAMINGO/p/9170429.html

Python爬虫 —— 知乎之selenium模拟登陆+requests.Session()获取cookies

Python爬虫 —— 知乎之selenium模拟登陆获取cookies+requests.Session()访问+session序列化-转

Python爬虫模拟登陆知乎

python网络爬虫入门（二）———模拟登陆知乎

Python爬虫之selenium模拟登陆

Python 模拟知乎登陆，保存登陆cookie

python实战：将cookies添加到requests.session中实现淘宝的模拟登录

python selenium网络爬虫模拟登陆

python selenium 使用cookies免登陆，与requests使用cookies免登陆的差别

python3爬虫-知乎登陆

爬虫获取知乎登陆的网页信息

[ python] 爬虫笔记（七) 模拟cookies登陆

(八）爬虫之js调试（登陆知乎）

python 爬虫 cookies设置，获取登陆后界面。

python 爬虫之模拟登陆

python爬取知乎（模拟登陆）

python模拟登陆知乎（最新版)

python爬虫模拟登陆

Python爬虫教程：requests模拟登陆github

python requests.session 与 requests

python使用selenium和requests.session登录抓取

python 使用 selenium 爬虫知乎

Python模拟登录(一) requests.Session应用

python的requests+selenium实现模拟登陆

Python3 使用requests库登陆知乎并保存cookie为本地文件

【Python爬虫】Session攻破爬虫登陆验证码

Python 爬虫——模拟登陆豆瓣

python 爬虫带验证的模拟登陆

python爬虫模拟登陆微博

python爬虫学习：模拟登陆

今日推荐

国产云输入法——仅华为无云端数据上传安全问题

开源日报 | 工业开源项目OGG 1.0；姐姐，你要和我一起配置火狐吗；苹果AI遥遥落后？Fedora 40

开放签电子签章：停止新增，优化体验，前进更进（五一假期前工作）

开源日报 | 中学生开源前端动画引擎；全球首个Llama3 8B中文版开源模型；联想电脑恐出局；Linus讽刺AI炒作

“百模大战”必有一战 | 2024中国“百模大战”竞争格局分析

最强开源大模型 Llama 3 上架 Gitee AI

周排行

自媒体文章如何提高原创度以及如何检测原创度

开启qq邮箱的smtp服务

Qt程序单次启动（QSingleApplication类）

国外的外包网站

更新IDEA主题——放飞代码风格

cocos2dx 实现搓牌效果（翻牌效果），包括铺平动画

dict和json之间的互相转换

angular的一些思考

. Fibonacci数列是这样定义的： F[0] = 0 F[1] = 1 for each i ≥ 2: F[i] = F[i-1] + F[i-2] 因此，Fibonacci数列就形如：0, 1

洛谷P1064 金明的预算方案

每日归档

更多

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)

2024-04-22(39)

2024-04-21(0)

2024-04-20(6)

2024-04-19(5)

2024-04-18(0)

2024-04-17(5)

2024-04-16(70)