selenium+chromedriver获取动态网页数据以及模拟鼠标操作后才能获得的数据 - 代码天地

selenium+chromedriver获取动态网页数据以及模拟鼠标操作后才能获得的数据

其他 2019-05-03 09:51:18 阅读次数: 0

1.下载chromedriver，记住chromedriver和chrome浏览器版本有对应关系

2.获得动态加载后的界面模拟鼠标操作，获得需要点击等特定操作后才能获得的动态加载的数据

3.源码：

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains



#下载后的chromedriver地址，我这里是windows版本的
CHROME_DRIVER_PATH = 'D:\\Code\imgageRecognition\\site_scrapy\\chromedriver.exe'



#下载动态界面，返回可被beatifulsoup4解析的数据
def get_dynamic_html(site_url):
    print('开始加载',site_url,'动态页面')
    chrome_options = webdriver.ChromeOptions()
    #ban sandbox
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--disable-dev-shm-usage')
    #use headless
    #chrome_options.add_argument('--headless')
    chrome_options.add_argument('--disable-gpu')
    chrome_options.add_argument('--ignore-ssl-errors')
    driver = webdriver.Chrome(executable_path=CHROME_DRIVER_PATH,chrome_options=chrome_options)
    #print('dynamic laod web is', site_url)
    driver.set_page_load_timeout(100)
    #driver.set_script_timeout(100)
    try:
        driver.get(site_url)
    except Exception as e:
        driver.execute_script('window.stop()')  # 超出时间则不加载
        print(e, 'dynamic web load timeout')
    action = ActionChains(driver)
    womwn_nav_tag = driver.find_element_by_css_selector('.navigation-bar.second-level.clearfix.p_15.active')
    nav_tag_list = womwn_nav_tag.find_elements_by_css_selector('.navigation-bar-item')
    for tag in nav_tag_list:
        print(tag.text)
        #模拟移动鼠标获得动态加载后的数据
        action.move_to_element(tag).perform()
        time.sleep(5)

    data = driver.page_source
    soup = BeautifulSoup(data, 'html.parser')
    try:
        driver.quit()
    except:
        pass
    return soup

猜你喜欢

转载自blog.csdn.net/huangmengfeng/article/details/89762025

selenium+chromedriver获取动态网页数据以及模拟鼠标操作后才能获得的数据

python selenium 获取动态网页数据

selenium抓取动态网页数据

Python3 Selenium+ChromeDriver抓取动态网页

用selenium和chromedriver从网页中爬数据以及chromedriver安装时出现的问题

动态网页数据抓取（一）

Python3+Selenium爬取动态网页数据

利用selenium并使用gevent爬取动态网页数据

如何实时抓取动态网页数据？

如何使用 Python 爬虫抓取动态网页数据

selenium获取动态网站数据

爬虫-获取鼠标点击或则移动到指定位置才能获得的动态加载数据

python学习笔记--抓取静态网页数据以及分析数据

Windows下利用python+selenium+firefox爬取动态网页数据(爬取东方财富网指数行情数据)

python selenium chromedriver 实现selenium操作chrome浏览器抓取网页数据内容自动填表功能（正常运行的代码）

selenium+chromedriver的心得

selenium+chromedriver

Java网络爬虫-2 抓取指定URL网页数据以及解析

aardio动态获取网页数据，匹配数据

第四章爬虫进阶之动态网页数据抓取

Python爬虫4.2 — ajax(动态网页数据抓取)用法教程

JAVA 爬虫获取js动态生成的网页数据

初学爬虫（三）：使用selenium模拟浏览器抓取动态网页之（2）selenium项目实战——深圳短租数据

ubuntu 安装selenium+Chromedriver

selenium+chromedriver 环境配置

selenium驱动Chrome抓取网页数据

获取jqGrid中选中行的数据以及 jqGrid获得所有行数据的方法

第十四周助教总结 python爬取动态网页数据，详解 CA-RNN论文读取 python爬取动态网页数据，详解 CA-RNN论文读取

HttpClient 模拟登录并解析网页数据

Python开发爬虫之动态网页抓取篇：爬取博客评论数据——通过Selenium模拟浏览器抓取

今日推荐

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

Java自定义时间格式

同步整形电路

在开发中最最最常用的字符串的属性大集合

Linux 查看端口占用并杀掉

Java基础四：ArrayList

多线程之死锁就是这么简单

mysql 基础命令集

awk 命令详解

Centos6.3编译安装nginx+php步骤

OCR （Optical Character Recognition，光学字符识别）

每日归档

更多

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)