python之XML模块(增、删、改、查)

文章目录

一、XML概述

1、概述
2 、语法

二、python操作XML

1、读取XML
2、（查）获取根节点的tag、attrib、text
3、（查）遍历获得到子节点（元素）
4、（查）ElementTree提取标签中的属性值
5、（查）根据索引查找节点
6、（查）查找节点的三种方法（find、findall、iter）
7、（查）XPath路径表达式查找
8、（改）修改节点
9、（增）增加节点
10、（删）删除节点
11、新建XML （xml.etree.ElementTree 方法）
12、新建xml （xml.dom.minidom 方法）

一、XML概述

1、概述

XML（可扩展性标记语言）是一种常见的文件类型，主要用于存储、传输数据和配置文件，和json差不多，但是json更简单。

2 、语法

2.1、文档声明：

<?xml version="1.0"?>
'''
1、文档声明必须以<?xml开头，以?>结束，中间没有空格
2、文档声明必须从文档的0行0列位置开始
3、文档声明只有两个属性
	version:指定XML文档版本。必须属性，一般使用1.0
	encoding:指定当前文档的编码。可选属性，默认值为UTF-8。

'''

2.2 元素（Element）／标签（Tag）

<country name="Liechtenstein">
      <rank updated="yes">2</rank>
      <year>2008</year>
      <gdppc>141100</gdppc>
      <neighbor name="Austria" direction="E"/>
</country>
'''
1. 标签 ：代码中 country、rank、neighbor均为标签，标签内可以没有属性、数据等，但是必须跟上结束标签如：</country>
2. 属性： <rank updated="yes">，属性是元素的一部分，它必须出现在元素的开始标签中，一个标签中只能出现0或者N个属性，但是不能出现多个同名属性。
3. 数据 ：<year>2008</year>，2008即是XML中存储的数据(文本)
'''

二、python操作XML

xml是python内置模块，不需要额外安装，本文内容将包括对XML文件的解析、遍历、查找、增加、删除（标签、属性、数据）等内容使用的是python的xml.etree .ElementTree类。详情可见其官方文档：xml.etree .ElementTree官方文档。全文都将围绕以下的a.xml进行操作：

'''a.xml'''

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

1、读取XML

ElementTree获取的是整个xml的树型结构，可以用树的形式来加载一个已有的xml文件。

from xml.etree import ElementTree as ET

xml = ElementTree.parse('a.xml')   # 打开xml文档，获取ElementTree
root = xml.getroot()  # 获取根节点，一个标准格式的XML有且只有一个根节点

2、（查）获取根节点的tag、attrib、text

# 2、获取根节点的tag、attrib、text
print(root.tag)  #data
print(root.attrib) #{} 如果属性为空的话，返回一个空的dict
print(root.text)  # '\n ' data标签无文本，只有换行，则打印空白

3、（查）遍历获得到子节点（元素）

# 3、遍历获得到子元素
for child in root:
    print(child.tag,child.attrib,child.text)
'''打印内容：
country {'name': 'Liechtenstein'} 
        
country {'name': 'Singapore'} 
        
country {'name': 'Panama'} 

三个子元素都没有text,只有换行，则打印出现换行效果
'''

4、（查）ElementTree提取标签中的属性值

# 4、ElementTree获得属性值
for child in root:
    print(child.tag,child.attrib['name'])

'''打印内容：
country Liechtenstein
country Singapore
country Panama
'''

5、（查）根据索引查找节点

# 5、根据索引查找子元素
print(root[0][0].text)  # <rank updated="yes">2</rank> 中的2
print(root[1][2].tag)   # <gdppc>13600</gdppc>中的tag
print(root[3][1])       # 报错 IndexError: child index out of range

6、（查）查找节点的三种方法（find、findall、iter）

# 6\1、 root.find方法 根据tag查找直接子元素，返回查到的第一个元素
print(root.find('country').attrib)
# >>> {'name': 'Liechtenstein'}

# 6\2、2root.findall() 根据tag查找直接子元素，返回查到的所有元素的列表

for country in root.findall('country'):
    print(country.attrib)
'''
{'name': 'Liechtenstein'}
{'name': 'Singapore'}
{'name': 'Panama'}
'''
#6\3、root.iter()   根据tag查找直接子元素，返回查到的所有元素的生成器
print(root.iter('country'))  # <_elementtree._element_iterator object at 0x000001D592F7C3B0>
for i in root.iter('country'):
    print(i.tag)  # country\country\country

7、（查）XPath路径表达式查找

更多XPath的用法，参考博客（https://blog.csdn.net/weixin_33847182/article/details/92515678）

# XPath语句(XML Path)路径查找
'''
/   从根节点选取。
//  从匹配选择的当前节点选择文档中的节点，而不考虑它们的位置。
.   选取当前节点。
..  选取当前节点的父节点。
@   选取属性。

'''
# 返回的值均是生成器
print(root.findall('.//rank'))  # 查找任意层次元素
print(root.findall('country/*'))  # 查找孙子节点元素
print(root.findall('.//rank/..'))   # 选取当前节点
print(root.findall('country[@name]'))   # 包含name属性的country
print(root.findall('country[@name="Singapore"]'))  # name属性为Singapore的country
print(root.findall('country[year="2008"]'))  # 子元素中包含year且year元素的text为2008的country
print(root.findall('country[1]'))      # 第一个country
print(root.findall('country[last()-1]'))  # 倒数第二个country
print(root.findall('country[last()]'))    # 最后一个country

8、（改）修改节点

# 修改节点内容
head = root.find('country')        # 获取一个节点
head.text = 'YCY2'              # 修改值 数值必须为字符串内容，不然会报错。
head.attrib = {'name':'Head'}   # 修改属性（覆盖原有的属性）
head.set('age','18')            # 添加属性（在原有的基础上添加）

print(head.tag,head.text,head.attrib)  # 本次修改只是在内存中修改，保存文件需要调用ree.write()方法
#>>> country 我是新的text {'name': 'Head', 'age': '18'}

9、（增）增加节点

# XML增加节点内容

head1 = root.find('country')    # 获取第一个country节点
body = ET.Element('months')     # 创建一个months节点
body.attrib = {"name":"12"}		# 设置属性
body.text = '24day'             # 新增数据
head1.append(body)				# 通过append方法添加节点

tree.write("a.xml",encoding="utf-8")  # 保存到文件

10、（删）删除节点

### 删除Head节点下的Order节点
tree = ET.parse('a.xml')		# 读取.xml文件
root = tree.getroot()				# 获取根节点，

head = root.find('country')  # 获取节点
root.remove(months)		# 删除节点
tree.write('a.xml') # 保存文件

11、新建XML （xml.etree.ElementTree 方法）

import xml.etree.ElementTree as ET

new_xml = ET.Element("namelist")
name = ET.SubElement(new_xml, "name", attrib={"enrolled": "yes"})
age = ET.SubElement(name, "age", attrib={"checked": "no"})
sex = ET.SubElement(name, "sex")
sex.text = '33'
name2 = ET.SubElement(new_xml, "name", attrib={"enrolled": "no"})
age = ET.SubElement(name2, "age")
age.text = '19'

et = ET.ElementTree(new_xml)  # 生成文档对象
et.write("b.xml", encoding="utf-8", xml_declaration=True)

ET.dump(new_xml)  # 打印生成的格式

12、新建xml （xml.dom.minidom 方法）

使用xml.etree.ElementTree 创建节点保存时，没有缩进，不太美观。采用xml.dom.minidom能有效解决这个问题


from xml.dom import minidom

file = minidom.Document()  # 创建xml文件对象

node_1 = file.createElement('root')  # 创建节点
file.appendChild(node_1)  # 使其成为根节点

node_2 = file.createElement('Maple')  # 创建第二个节点
node_2.setAttribute('TestName','aaa')  # 设置attrib

node_3 = file.createElement('Job1')  # 创建第三个节点
node_3.setAttribute('Jobname1','job1')  # 设置attrib
node_3.appendChild(file.createTextNode('100'))  # 设置Text

node_4 = file.createElement('Job2')  # 创建第四个节点
node_4.setAttribute('Jobname2','job2')  # 设置attrib
node_4.appendChild(file.createTextNode('200'))  # 设置Text

node_1.appendChild(node_2)  # 将Maple节点添加到root节点下
node_2.appendChild(node_3)  # 将Job1节点添加到Maple节点下
node_3.appendChild(node_4)  # 将Job2节点添加到Job1节点下


with open('c.xml', 'w') as f:
    file.writexml(f, indent='', addindent='\t', newl='\n', encoding='UTF-8')
'''
# indent:每个tag前填充字符
# newl:每个tag后的填充字符
# addindent:每个节点的缩进字符
'''

生成后的xml文件：

<?xml version="1.0" encoding="UTF-8"?>
<root>
	<Maple TestName="aaa">
		<Job1 Jobname1="job1">
			100
			<Job2 Jobname2="job2">200</Job2>
		</Job1>
	</Maple>
</root>

金鞍少年

发布了46 篇原创文章 · 获赞 37 · 访问量 4516

私信关注