Scrapy response.css学习

编程语言 2018-08-10 00:47:56 阅读次数: 0

摘自官网

you can try selecting elements using CSS with the response object:

>>> response.css('title')
[<Selector xpath='descendant-or-self::title' data='<title>Quotes to Scrape</title>'>]
The result of running response.css('title') is a list-like object called SelectorList, which represents a list of Selector objects that wrap around XML/HTML elements and allow you to run further queries to fine-grain the selection or extract the data.

To extract the text from the title above, you can do:

>>> response.css('title::text').extract()
['Quotes to Scrape']
There are two things to note here: one is that we’ve added ::text to the CSS query, to mean we want to select only the text elements directly inside <title> element. If we don’t specify ::text, we’d get the full title element, including its tags:

>>> response.css('title').extract()
['<title>Quotes to Scrape</title>']
The other thing is that the result of calling .extract() is a list, because we’re dealing with an instance of SelectorList. When you know you just want the first result, as in this case, you can do:

>>> response.css('title::text').extract_first()
'Quotes to Scrape'
As an alternative, you could’ve written:

>>> response.css('title::text')[0].extract()
'Quotes to Scrape'
However, using .extract_first() avoids an IndexError and returns None when it doesn’t find any element matching the selection.

There’s a lesson here: for most scraping code, you want it to be resilient to errors due to things not being found on a page, so that even if some parts fail to be scraped, you can at least get some data.

示例

for href in response.css('a::attr(href)').extract():
- 注意：用css选择器时，response返回的是带标签的，使用::text等可以选择属性或文本

猜你喜欢

转载自blog.csdn.net/sspmii/article/details/81459375

Scrapy response.css学习

Scrapy - response.css()

response.css

Scrapy学习

Scrapy学习-10-Request&Response对象

Scrapy爬虫框架学习之Response对象

Scrapy框架----- Request/Response

scrapy框架-- response

Scrapy_request&response

Scrapy源码 Response对象

Scrapy验证response内容

笔记-scrapy-Request/Response

scrapy 中response常用属性

scrapy:get cookie from response

Scrapy中的Request和Response

scrapy中的headers，Request，response

scrapy xpath / css

Scrapy Css Selector扩展

Scrapy学习笔记(1) --Scrapy的介绍

Scrapy框架学习 - Scrapy框架的安装与使用

Scrapy框架学习（二）Scrapy入门

Scrapy框架学习（一）Scrapy框架介绍

Scrapy:学习笔记(2)——Scrapy项目

Scrapy学习笔记-Scrapy入门Following links

Scrapy学习笔记-Scrapy入门Spiders

Scrapy中scrapy.Request和response.follow的区别

scrapy框架scrapy.Reqest和response.follow的区别

Scrapy学习-2-xpath&css使用

Scrapy学习笔记

爬虫scrapy学习

今日推荐

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

报告：Django 仍然是 74% 开发者的首选

《2024 年一季度互联网投融资运行情况》研究报告

15 年前上了“FFmpeg 耻辱柱”，今天他还得谢谢咱——腾讯QQPlayer一雪前耻？

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

周排行

curl的POST请求，封装方法

8.1.1. Integer Types

Java基础 Day05(个人复习整理)

Python - Django - 中间件 process_exception

小L的试卷

【Shell编程】（函数）判断用户是否存在

python(css样式)

spring ant path 匹配原则 - 【笔记】

《JavaScript与JScript从入门到精通》(美)James.Jaworski.中译本.扫描版.pdf

Eclipse运行带参数的java程序

每日归档

更多

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)