爬虫使用selenium和PhantomJS获取动态数据 - 代码天地

爬虫使用selenium和PhantomJS获取动态数据

其他 2018-09-04 00:09:59 阅读次数: 0

创建一个scrapy项目，在终端输入如下命令后用pycharm打开桌面生成的zhilian项目

cd Desktop

scrapy startproject zhilian

cd zhilian

scrapy genspider Zhilian sou.zhilian.com

middlewares.py里添加如下代码：

from scrapy.http.response.html import HtmlResponse

class PhantomjsMiddleware(object):
    def process_request(self,request,spider):
        if spider.name == 'Zhilian':
            spider.driver.get(request.url)
            spider.driver.implicitly_wait(10)

            response = HtmlResponse(url=spider.driver.current_url,
                                    request=request,
                                    body=spider.driver.page_source,
                                    encoding='utf-8'
                                    )
            return response

settings.py里添加如下代码：

DOWNLOADER_MIDDLEWARES = {
# 'zhilian.middlewares.ZhilianDownloaderMiddleware': 543,
'zhilian.middlewares.PhantomjsMiddleware': 1,
}

zhilian.py里添加如下代码：

from selenium import webdriver

def __init__(self):
self.driver =webdriver.PhantomJS() # 在ZhilianSpider这个类中添加这个方法

扫描二维码关注公众号，回复： 3038984 查看本文章

猜你喜欢

转载自blog.csdn.net/qq_41949802/article/details/81738744

爬虫使用selenium和PhantomJS获取动态数据

Python爬虫使用Selenium+PhantomJS抓取Ajax和动态HTML内容

Scrapy+Selenium+PhantomJS+MongoDB实现获取动态数据

selenium和PhantomJS的使用

爬虫工具——Selenium和PhantomJS

爬虫之selenium和PhantomJS

Python爬虫：selenium使用chrome和PhantomJS实用参数

selenium + phantomjs+python 外网动态爬虫

Scrapy+PhantomJS+Selenium动态爬虫

Scrapy之PhantomJS , Selenium动态爬虫

Python网络爬虫 - Phantomjs, selenium/Chromedirver使用

爬虫07-selenium和PhantomJS

scrapy使用PhantomJS和selenium爬取数据

Python学习笔记--Python 爬虫入门 -17-10 动态数据的采集 Selenium+PhantomJS

selenium+PhantomJs爬虫

Python3~Scrapy+PhantomJS+Selenium动态爬虫

Python爬虫开发【第1篇】【动态HTML、Selenium、PhantomJS】

[Python爬虫] 八、动态HTML处理之Selenium与PhantomJS

使用selenium和phantomjs解决爬虫中对渲染页面的爬取

使用selenium和PhantomJS抓取信息

如何使用Selenium+PhantomJS抓取动态页面以及常见指令和问题

Python怎么爬取动态网页——如何使用selenium和PhantomJS

使用selenium和phantomJS浏览器获取网页内容的小演示

爬虫-图片懒加载技术、selenium和PhantomJS

爬虫之图片懒加载技术、selenium和PhantomJS

Python爬虫之图片懒加载技术、selenium和PhantomJS

Selenium+PhantomJS爬虫之路

python爬虫之selenium、phantomJs

Selenium使用PhantomJS

Selenium+PhantomJS使用

今日推荐

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

【转】spring中对控制反转和依赖注入的理解

tms webcore 安装和使用

java程序员进阶相关书籍

SpringMVC接受请求参数、

如何保存训练好的机器学习模型

MyEclipse、Eclipse设置项目JDK的三个地方

商超行业微信小程序开发定制一般多少钱（行业技术人员解读）

Markdown编辑器语言——30分钟入门到到精通

Linux系统下MongoDB的简单安装与基本操作

Power Strings

每日归档

更多

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)