XPath、Beautiful Soup

XPath的使用:

常用匹配规则:

/

从当前节点,选取子节点

//

从当前节点,选取子孙节点

.

选取当前节点

..

选择当前节点的父节点

@

选择属性

属性获取:

from lxml import etree
html = '<div><a class="du" href="http://www.baidu.com">百度</a></div>'
parser = etree.HTML(html)
result = parser.xpath('//a[@class="du"]/@href')
print(result)
View Code

文本获取:

from lxml import etree
html = '<div><a class="du" href="http://www.baidu.com">百度</a></div>'
parser = etree.HTML(html)
result = parser.xpath('//a[@class="du"]/text()')
print(result)
View Code

属性多值匹配:

from lxml import etree
html = '<div><a class="du baidu" href="http://www.baidu.com">百度</a></div>'
parser = etree.HTML(html)
result = parser.xpath('//a[contains(@class,"du")]/text()')
print(result)
View Code

多属性匹配:

from lxml import etree
html = '<div><a name="item" class="du baidu" href="http://www.baidu.com">百度</a></div>'
parser = etree.HTML(html)
result = parser.xpath('//a[contains(@class,"du") and @name="item"]/text()')
print(result)
View Code

按序选择:

from lxml import etree
html = """
        <li>item1</li>
        <li>item2</li>
        <li>item3</li>
        <li>item4</li>
        <li>item5</li>
"""
parser = etree.HTML(html)
result = parser.xpath('//li[1]/text()') #匹配第一个
print(result)
result = parser.xpath('//li[last()]/text()') #匹配最后一个
print(result)
result = parser.xpath('//li[position()<3]/text()') #匹配第一、第二个
print(result)
result = parser.xpath('//li[last()-2]/text()') #匹配倒数第三个
print(result)
View Code

更多用法:http://www.w3school.com.cn/xpath/xpath_functions.asp

猜你喜欢

转载自www.cnblogs.com/py-peng/p/12014687.html