Selenium登录知乎

版权声明:本文为 [onefine] 原创文章,转载请注明出处: https://blog.csdn.net/jiduochou963/article/details/88199654

解决selenium + chromedriver被知乎反爬的问题1

当使用selenium去某宝或其他网站进行爬虫或者模拟登陆时,会出现滑动验证码,并且无论是用ActionChains滑还是手动滑,都会很委婉的告诉你“哎呀网络错误,请刷新”等等。why?

经过科学上网,查阅众多资料,发现seleniumyou 有一些特征值, 例如下面:

window.navigator.webdriver
window.navigator.languages
window.navigator.plugins.length

其中最主要的特征值就是webdriver这一项。

partial interface Navigator {
	readonly attribute boolean webdriver;
 };

Navigator接口的webdriver IDL属性必须返回webdriver-active标志的值,该标志默认值为false或者undefined。

此属性允许网站确定用户代理受WebDriver控制,并可用于帮助缓解压力,拒绝服务攻击。

检测方法:

检查→Console→输入window.navigator.webdriver
正常情况下为false或者undefined(根据浏览器稳定)

ok,接下来我们要做的

selenium被检测的突破——修改webdriver的特征值,这里使用的是利用mitmproxy通过代码注入的方式进行修改webdriver的值:

Object.defineProperties(navigator,{webdriver:{get:() => false}});

完整代码

# -*- coding: utf-8 -*-
# @Time    : 2019/3/5 15:11
# @Author  : One Fine


from selenium import webdriver
from time import sleep
import requests


try:
    import http.cookiejar as cookielib
except Exception as e:
    print("兼容Py2.x", e)
    import cookielib  # 兼容Py2.x


class ZhihuAccount(object):
    """"
    入口:check_login
    True:
    False:
    """
    def __init__(self):
        self.brower = webdriver.Chrome(executable_path='D:/selenium/chromedriver.exe')
        self.session = requests.session()
        self.session.cookies = cookielib.LWPCookieJar(filename='zhihu_cookie.text')
        self.headers = {
            'Referer': 'https://www.zhihu.com/signup?next=%2F',
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                          'Chrome/72.0.3626.121 Safari/537.36',
        }
        # 加载cookie
        self.load_cookies()  # 加载失败主动抛出异常

    def login(self, username='', password=''):
        if username == '' or password == '':
            username = input('输入名称:')
            password = input('输入密码:')

        self.brower.get('https://www.zhihu.com/signup?next=%2F')
        try:
            self.brower.find_element_by_xpath('//*[@id="root"]/div/main/div/div/div/div[2]/div[2]/span').click()  # 点击
            self.brower.find_element_by_xpath('//*[@id="root"]//input[@name="username"]').send_keys(username)
            sleep(2)
            self.brower.find_element_by_xpath('//*[@id="root"]//input[@name="password"]').send_keys(password)

            self.brower.execute_script('Object.defineProperties(navigator,{webdriver:{get:() => false}});')
            status = self.brower.execute_script('window.navigator.webdriver')

            self.brower.find_element_by_xpath('//*/form/button').click()  # 点击  # if status == ('None' or 'False'):

            sleep(1)
            # 登录逻辑中保存session
            for cookie in self.brower.get_cookies():
                self.session.cookies.set_cookie(
                    cookielib.Cookie(version=0, name=cookie['name'], value=cookie['value'],
                                     port='80', port_specified=False, domain=cookie['domain'],
                                     domain_specified=True, domain_initial_dot=False,
                                     path=cookie['path'], path_specified=True,
                                     secure=cookie['secure'], rest={},
                                     expires=cookie['expiry'] if "expiry" in cookie else None,
                                     discard=False, comment=None, comment_url=None, rfc2109=False))

            self.session.cookies.save()
            return True
        except Exception as e:
            print("登录失败", e)
            return False

    def load_cookies(self):
        try:
            self.session.cookies.load(ignore_discard=True)
            return True
        except Exception as e:
            print("zhihu_cookie未能加载", e)
            print("正在重新登录...")
            # 第一次尝试登录:
            if self.login():
                print("cookie成功加载")
                return True
            else:
                print("加载cookie失败")
                return False

    def check_login(self):
        # 通过设置页面返回状态码来判断是否为登录状态
        inbox_url = 'https://www.zhihu.com/settings/account'
        response = self.session.get(inbox_url, headers=self.headers, allow_redirects=False)
        status = True
        if not response.status_code == 200:
            # 第二次尝试登录:
            # print("正在重新登录...")
            if not self.login():
                status = False

        # 关闭浏览器:
        self.brower.quit()
        self.session.close()

        if status:
            return True
        else:
            return False


if __name__ == '__main__':
    account = ZhihuAccount()
    if account.check_login():
        print("登录成功")
    else:
        print("登录失败")

参考:
selenium的检测与突破 https://zhuanlan.zhihu.com/p/55956954
绕过selenium的检测,实现模拟登陆 https://zhuanlan.zhihu.com/p/56040461
web自动化测试框架selenium调用JavaScript代码常用操作解析 https://blog.csdn.net/cxx654/article/details/79949366
关于selenium获取cookie然后实现免登陆 https://blog.csdn.net/weixin_40444270/article/details/80593058
LWPCookieJar的使用 https://blog.csdn.net/nimade511/article/details/52540437


  1. 参考: selenium的检测与突破 https://zhuanlan.zhihu.com/p/55956954 ↩︎

猜你喜欢

转载自blog.csdn.net/jiduochou963/article/details/88199654