Python爬虫之selenium

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/u011262253/article/details/78549185

下载与安装


selenium 安装

pip install selenium

chromedriver 下载地址

https://sites.google.com/a/chromium.org/chromedriver/downloads
https://chromedriver.storage.googleapis.com/index.html

注意: 这里需要将Chrome的版本与chromedriver的版本相对应

http://blog.csdn.net/huilan_same/article/details/51896672

使用举例


打开链接

dr = webdriver.Chrome(r"D:\Tools\BrowserDriver\chromedriver\2.33\chromedriver.exe")
url = 'http://www.baidu.com'
print "now access %s" %(url)
dr.get(url)
dr.quit()

注意: 将chromedriver路径前加 r 防止转义

元素获取

页面中的元素

<input type="text" name="passwd" id="passwd-id" />

使用selenium获取

element = driver.find_element_by_id("passwd-id")
element = driver.find_element_by_name("passwd")
element = driver.find_elements_by_tag_name("input")
element = driver.find_element_by_xpath("//input[@id='passwd-id']")

注意:用 xpath时,如果有多个元素匹配了 xpath,它只会返回第一个匹配的元素。如果没有找到,那么会抛出 NoSuchElementException 的异常

单个元素获取

  • find_element_by_id
  • find_element_by_name
  • find_element_by_xpath
  • find_element_by_link_text
  • find_element_by_partial_link_text
  • find_element_by_tag_name
  • find_element_by_class_name

多个元素获取

  • find_elements_by_name
  • find_elements_by_xpath
  • find_elements_by_link_text
  • find_elements_by_partial_link_text
  • find_elements_by_tag_name
  • find_elements_by_class_name
  • find_elements_by_css_selector

元素操作

文本框

element.send_keys("some text") # 输入文本
element.send_keys("and some", Keys.ARROW_DOWN) # 模拟按键
element.clear() # 文本清除

下拉选项

# 方法一:
element = driver.find_element_by_xpath("//select[@name='name']")
all_options = element.find_elements_by_tag_name("option")
for option in all_options:
    print("Value is: %s" % option.get_attribute("value"))
    option.click()

# 方法二:
from selenium.webdriver.support.ui import Select
select = Select(driver.find_element_by_name('name'))
select.select_by_index(index)
select.select_by_visible_text("text")
select.select_by_value(value)

select.deselect_all() # 全部取消
all_selected_options = select.all_selected_options # 获取已选
options = select.options # 获取可选选项

按钮

btn = driver.find_element_by_id("submit")
btn.click()

拖拽

element = driver.find_element_by_name("source")
target = driver.find_element_by_name("target")

from selenium.webdriver import ActionChains
action_chains = ActionChains(driver)
action_chains.drag_and_drop(element, target).perform()

浏览器操作

页面切换

# 切换到windowName窗口
driver.switch_to_window("windowName")

# 获取每个窗口对象
for handle in driver.window_handles:
    driver.switch_to_window(handle)

# 切换到frameName窗口
driver.switch_to_frame("frameName.0.child")

弹窗

# 获取弹窗对象
alert = driver.switch_to_alert()

历史记录

driver.forward()
driver.back()

Cookies

# 添加Cookie
cookie = {‘name’ : ‘foo’, ‘value’ : ‘bar’}
driver.add_cookie(cookie)

# 获取Cookie
driver.get_cookies()

定时等待

driver = webdriver.Chrome()
driver.implicitly_wait(1) # seconds

等待元素加载完成

# 方法一:
from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(driver, 10)
element = wait.until(EC.element_to_be_clickable((By.ID,'someid')))
# 方法二:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("http://somedomain/url_that_delays_loading")
try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.ID, "myDynamicElement"))
    )
finally:
    driver.quit()

猜你喜欢

转载自blog.csdn.net/u011262253/article/details/78549185