xpath的学习 - 代码天地

xpath的学习

其他 2018-06-22 05:14:51 阅读次数: 3

xpath的作用就是两个字“定位”，运用各种方法进行快速准确的定位，推荐两个非常有用的的firefox工具：firebug和xpath checker

定位

1.依靠自己属性，文本定位

//td[text()='xxx']

//div[contains(@class,'xxx')]

//div[@class='xxx' and @type='xxx']

2.依靠父节点定位

//div[@class='xxx']/div

//div[@id='xxx']/div

3.依靠子节点定位

//div[div[@id='xxx']]

//div[div[@name='xxx']]

4.混合型

//div[div[@name='xxx']]/img

//td[a/font[contains(text(),'xxx')]]//input[@type='xxx']

xpath的学习-拓展

1.following-sibling

following-sibling即为“选择当前节点之后的所有同级节点”，那么没有加上“sibling”关键字的，搜索的就是之上/之下的所有节点，忽略同级概念，例如：

<div>

<input id="123">

<input>

</div>

要定位第二个input：//input[@id='123']/following-sibling::input

2.preceding-sibling

preceding-sibling的解释是“选取当前节点之前的所有同级节点”，那么没有加上“sibling”关键字的，搜索的就是之上/之下的所有节点，忽略同级概念， preceding-sibling和following-sibling是刚好相反的

<div>

<span>text</span>

<input id="123">

</div>

要定位第二个input：//input[@id='123']/preceding-sibling::span

3.contains

和字面意思一样就是包含，例如：//div[contains(@class,'xxx')]

4.starts-with

和字面意思一样就是以某某开头，例如：//input[starts-with(@class,'xxx')]

5.not

就是否定的意思

比如找一个id不为123的input：input[not[id='123']]

又如找一个文本中不包含xxx字段的span：//span[not(contains(text(),'xxx'))]

xpath的学习-补充

绝对路径 html/body/div/span[2]/input[2] 中间结构变化，就失效

相对路径 //开始，在整个html source里找，不管在什么位置

索引[x] //div/input[2] div下面第二个input

position()=2position()>3position()<3

例如html：<div id="positions">

<input>

<span>test position()1</span>

<span>test position()2</span>

<span>test position()3</span>

<span>test position()4</span>

<span>test position()5</span>

</input>

</div>

获取第一个span，可以是//div[@id='positions']/span[1]，也可以是//div[@id='positions']/span[position()=1]

//div[@id='positions']/span[position()>3]就是定位了test position()4和test position()5

//div[@id='positions']/span[position()<3]就是定位了test position()1和test position()2

last()last()-1

以上面的html为例子，获取最后一个span：//div[@id='positions']/span[last()]

以上面的html为例子，获取倒数第二个span：//div[@id='positions']/span[last()-1]

属性定位@class //div[@class] 有class属性的div

属性值定位，前面已经讲过了 //div[@class='xxx']

功能关键字

1.常用

and/[][]，比如://span[@name='xxx' and text()='xxx']也是可以写成//span[@name='xxx'][text()='xxx']

or，比如以上面html为例子，定位文本为test position()5和test position()4的span ：//div[@id='positions']/span[text()='test position()5' or text()='test position()4']

not,contains,starts-with

ends-with 在xpath中是没有这个的

2.不常用的

substring,substring-before,substing-after

sbustring(str,start-position,length) 比如html：

<div id="xxx">

<span name="?-xxxxx-09">text</span>

</div>

定位上面html中span：//div[@id='xxx']/span[substring(@name,3,5)='xxxxx']

substring-before的用法，比如html

<div id="xxx">

<span class="spanclass1-789">text</span>

</div>

定位上面html中span：//div[@id="xxx"]/span[sbustring-before(@class,"-")="spanclass1"]

substring-after的用法，比如html

<div id="xxx">

<span class="789-spanclass2">text</span>

</div>

定位上面html中span：//div[@id="xxx"]/span[sbustring-after(@class,"-")="spanclass22"]

通配符 *

比如//span[@*="xxx"]指定位span中任意属性包含xxx的

比如//*[@*="xxx"]指定位页面中任意属性保护xxx的标签

Axes 轴

parent 父节点

ancestor 祖先节点，包括父节点，一层一层向上

descendant 所有子孙节点找，不管什么位置，简写//，就是xpath中出现//的情况。。//div[@class="xxx"]//input

follwing-sibling 当前元素后面的兄弟姐妹

preceding-sibling 当前元素前面的兄弟姐妹

following 当前元素后面所有元素，一直到</html>

preceding 当前元素之前所有元素，一直到<html>

ancestor-or-self

descendant-or-self

使用的时候注意加上::

猜你喜欢

转载自blog.csdn.net/u012111923/article/details/80704515

xpath学习

xpath的学习

学习XPath

xpath学习，通过xpath查找指定的元素

XPath学习笔记

xpath学习二

python xpath学习总结

python学习-----xpath用法

spider----xpath学习

XPath定位学习记录

xpath基础学习

XPath学习笔记（二）

XPath学习笔记（一）

XPath注入漏洞学习

XML学习笔记3 XPath

Xpath选择器学习

爬虫学习打卡3——xpath

XPath学习记录——part 2

XPath解析网页学习笔记

【数据采集】Xpath实例学习

xpath学习（二），通过xpath 采集数据

XPATH

xpath的|

学习：Dom4j和Xpath

Scrapy学习-2-xpath&css使用

xpath再学习（持续更新中）

Python学习之旅-08-Xpath元素

Selenium学习之==>Xpath使用方法

Spider学习笔记（一）:xpath基础操作

Xpath_还在学习中

今日推荐

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

OOP第二次作业

java web 乱码问题

android 禁止scrollview 因控件变化自动滚动到底的方法

mysql服务解压版的安装(5.7)

centos7 nginx+tomcat配置https 安装免费SSL Let’s Encrypt

使用Mosquitto遗嘱机制实现感知客户端上下线功能的方法

面向对象之------多态与多态性

开发Teams Tabs应用程序

C# 希尔排序

第2章 Jupyter Notebooks

每日归档

更多

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)