First, greed and non-greed
1. Greed: as many matches (*) represents the greedy match
2. Non-greedy: (?) To find qualified to a minimum content, represents a non-greedy
3. Regular default greedy match
import re title = u"<div>name</div><div>age</div>" p1 = re.compile(r"<div>.*</div>")#贪婪模式 p2 = re.compile(r"<div>.*?<div>")#非贪婪模式 m1 = p1.search(title) print(m1.group()) m2 = p2.search(title) print(m2.group())
Two, XPATH
1. Interpretation: Find a set of rules of information / language in the XML file, the XML elements
Documentation help: http: //www.w3cshool.com.cn/xpath/index.asp
2.XPath Development Tools
Open source XPath expression editing tools: XMLQuire
Chrome plug-in: XPath Helper
Firefox plug-in: XPath Checker
3. how to select nodes in an XML file
(1) nodename: Select all the child nodes of this node
(2) /: Start from the root node to select
Examples: / Student: No results
/ School: School selected node
(3) @: select nodes, without considering the position
Examples: // age: selecting three nodes, generally consisting of a list of return
(4) Select the current node .:
(5) ..: Select the parent node of the current node
(6) @: selection attribute
(7) Xpath find general method to find a path in accordance with
School / teacher: teacher returned node
School / student: student returns two nodes
// Student: Student Select all nodes, regardless of location
School // Age: select all nodes Age School offspring
// @ Other: Select Other property
// Age [@Details]: Select an element with attributes Details of Age
<?xml version="1.0" encoding="utf-8" ?> <School> <Teacher desc="PythonTeacher" score="good"> <name>LiuDana</name> <Age_1 Details="Age for year 2010">18</Age_1> <Mobile>13260446055</Mobile> </Teacher> <Student> <zhangsan>= "He is a squad leader"OtherThe Name</Name> <Age Details="The youngest boy in class">14</Age> </Student> <Student> <Name>LiSi</Name> <Age>19</Age> <Mobile>15578875040</Mobile> </Student> </School>
4. predicate
/ School / Student [1]: Select School following a first node Student
/ School / Student [last ()]: Select School Student following the last node
/ School / Student [last () - 1]: Select School Student following penultimate node
/ School / Student [position () <3]: Select the first two nodes below School
// Student [@score]: Select Student score of nodes with attributes
// Student [@ score = "99"]: Select score with attributes and attribute values of Student nodes 99
// Student [@score] / Age: Age Student selected child node score of nodes with attributes
Some operations in 5.XPath
(1) |: or
For example: // Student [@score] | // Teacher: selecting nodes with attributes Student or Teacher node score
(2) The remaining not common XPath arithmetic symbol + .-. *, Div (division means),>, <
Second, the source
D31_2_GreedMatch.py
D32_1_School.xml
https://github.com/ruigege66/Python_learning/blob/master/D31_2_GreedMatch.py
https://github.com/ruigege66/Python_learning/blob/master/D32_1_School.xml
2.CSDN: https: //blog.csdn.net/weixin_44630050 (Xi Jun Jun Moods do not know - Rui)
3. Park blog: https: //www.cnblogs.com/ruigege0000/
4. Welcomes the focus on micro-channel public number: Fourier transform public personal number, only for learning exchanges, backstage reply "gifts" to get big data learning materials