关于通过请求获取的验证码不正确的解决的破解方法

前一段时间,爬取一个网站的数据,该网站需要模拟登陆,输入账号,密码,及其简单的验证码,其验证码通过请求获取的验证码是和页面上的不一样,所以想要成功破解验证码,需要利用Selnium截图,然后模拟登陆,输入账号,密码进行模拟登陆。

1.先利用selnium进行截取登陆页面图片,然后定位验证码的位置,进行截图,然后进行验证码破解,具体代码参考如下:

 

# -*- coding:utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
import urllib
from PIL import  Image,ImageEnhance
import pytesseract
import requests
from selenium import webdriver
def get_image(driver):
    driver.set_window_size(1400,900)
截取登陆页面的图
 driver.get_screenshot_as_file('.//1.png')
    # 获取指定元素位置
    element = driver.find_element_by_id('codePic')
    left = int(element.location['x'])
    top = int(element.location['y'])
    right = int(element.location['x'] + element.size['width'])
    bottom = int(element.location['y'] + element.size['height'])
    print left,top,right,bottom
    # 通过Image处理图像
    im = Image.open('.//1.png')
    im = im.crop((left, top, right, bottom))
    filename = "2.png"
    im.save(filename)

    return filename
threshold = 150
table = []
for i in range(256):
    if i < threshold:
        table.append(0)
    else:
        table.append(1)

def getverify1(name):
    im = Image.open(name)
    imgry = im.convert('L')
    imgry.save('g' + name)
    out = imgry.point(table, '1')
    out.save('b' + name)
    string = '--tessdata-dir "C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"'
    im = Image.open('b'+name)
    # enhancer = ImageEnhance.Contrast(im)
    # im = enhancer.enhance(6)
    text = pytesseract.image_to_string(im, config=string)
    text = text.strip('')
    text = text.upper()
    return text
def main(driver):
    im = get_image(driver)
    date = getverify1(im)
    print '-----',(date)
    return date
2.模拟登陆
 
def login():
    driver=webdriver.Chrome()
    driver.get("url")
    time.sleep(10)
    admin=driver.find_element_by_id("j_username")
    root = driver.find_element_by_id( "j_password_show")
    captch=driver.find_element_by_id("j_validation_code")
    admin.send_keys(str('用户名'))
    root.send_keys(str('密码'))
    date=main(driver)

    time.sleep(20)
    captch.send_keys(date)
    time.sleep(10)
    driver.find_element_by_link_text(u"登录").click()
    time.sleep(5)
3.如果放在centos服务器上,其模拟登陆时利用PhantmJS,在centos下载一个PhantmJS就可以进行简单的破解:
from get_captach import main
import requests
from PIL import  Image,ImageEnhance
import pytesseract
import json
from lxml import etree
import update_config
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC



def login():
    driver = webdriver.PhantomJS(executable_path=r'D:\phantomjs-2.1.1-windows\phantomjs-2.1.1-windows\bin\phantomjs.exe')
    driver.get("url")
    time.sleep(10)
    element = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.ID, 'codePic')))
    admin=driver.find_element_by_id("j_username")
    root = driver.find_element_by_id( "j_password_show")
    captch=driver.find_element_by_id("j_validation_code")
    admin.send_keys(str('用户名'))
    root.send_keys(str('密码'))
    date=main(driver)


猜你喜欢

转载自blog.csdn.net/xxy_yang/article/details/79540968