An introduction to xml parsing
*xml is a markup document*js uses dom to parse markup document?
- According to the hierarchical structure of HTML, allocate a tree structure in memory, and encapsulate HTML tags, attributes and text into objects
- document object, element object, attribute object, text object, Node node object * xml parsing method ( Technology): dom and sax *** The difference between dom parsing and sax parsing: ** DOM parsing * Allocate a tree structure in memory according to the hierarchical structure of xml, and encapsulate xml tags, attributes and text into objects * Disadvantages : If the file is too large, it will cause memory overflow * Advantages: It is very convenient to implement addition, deletion and modification operations ** sax parsing * Adopt event-driven, parsing while reading - from top to bottom, parsing line by line, parsing to a certain object, return Object name * Disadvantage: Addition, deletion, and modification operations cannot be implemented * Advantages: If the file is too large, it will not cause memory overflow and facilitate query operations * To parse xml, you first need a parser ** Different companies and organizations provide DOM and The parser in sax mode is provided through api *** Sun company provides jaxp for dom and sax parser *** dom4j organization, for dom and sax parser dom4j (*** in actual development****) * ** jdom organization, for dom and sax parser jdom
2. jaxp
**jaxp is part of javase**jaxp parser is in jdk's javax.xml.parsers package
** Four classes: classes used for dom and sax parsing *** dom: DocumentBuilder : parser class - this The class is an abstract class and cannot be new. An instance of this class can be obtained from the DocumentBuilderFactory.newDocumentBuilder() method - the xml parse("xml path") method can parse the xml and return the entire document of the Document - the returned document is an interface, The parent interface is Node. If you can't find the method you want in the document, go to Node to find it - Document interface getElementsByTagName(String tagname) -- this method can get the tag -- return the collection NodeList createElement(String tagName) -- create Label createTextNode(String data) -- create text appendChild(Node newChild) -- add text below label removeChild(Node oldChild) -- delete node getParentNode()
-- get parent node
NodeList list interface
- getLength() gets the length of the set- item(int index) subscripts to the specific value
for(int i=0;i<list.getLength();i++) {
list.item(i); //return Node interface
}
Node interface
getTextContent()- Get the content inside the tag DocumentBuilderFactory: Parser Factory - This class is also an abstract class and cannot be new
newInstance() gets an instance of DocumentBuilderFactory.
Third, use jaxp to achieve query operations
*** Query the values of all name elements in xml
* Step
//Query the values of all name elements
/*
* 1. Create a parser factory
DocumentBuilderFactory.newInstance(); * 2. Create a parser builderFactory.newDocumentBuilder
according to the parser factory
();
* 3. Parse xml and return document
* Document document = builder.parse("src/Person.xml"); * 4. Use document.getElementsByTagName("name")
to get all name elements ; * 5. Return collection , traverse the collection, get each name element - traverse getLength() item() - get the value inside the element and use getTextContent()
* */
DocumentBuilderFactory dbf= DocumentBuilderFactory.newInstance(); DocumentBuilder db=dbf.newDocumentBuilder(); Document d=db.parse("src/Person.xml"); NodeList nl=d.getElementsByTagName("name"); for(int i=0;i<nl.getLength();i++){ Node n=nl.item(i); System.out.println(n.getTextContent()); }
Fourth, use jaxp to add nodes
*** Add <sex>male</sex> under the first student (at the end)**steps
/*
* 1. Create a parser factory
* 2. Create a parser according to the parser factory
* 3. Parse xml and return document
* 4. Get all students and
use the item method to subscript to get
* 5. Create sex label createElement
* 6. Create text createTextNode
* 7. Add text under sex appendChild
* 8. Add sex to the first p1 under appendChild
* 9, write back xml
* */
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = builderFactory.newDocumentBuilder(); Document document = builder.parse("src/Person.xml"); NodeList nodeList = document.getElementsByTagName("student"); for (int i = 0; i < nodeList.getLength(); i++) { Node node = nodeList.item(i); Node node2 = document.createElement("sex"); Node node3 = document.createTextNode("男"); node2.appendChild(node3); node.appendChild(node2); } TransformerFactory transformerFactory = TransformerFactory.newInstance(); Transformer transformer = transformerFactory.newTransformer(); transformer.transform(new DOMSource(document), new StreamResult("src/Person.xml"));
To write back xml, you need to use the Transformer abstract class, which needs to be obtained through the newTransformer() method of the TransformerFactory class.
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
Methods for writeback:
transform(Source xmlSource, Result outputTarget)
Parameter: Source xmlSource: xml input to be converted (the parameter of this class is the Node interface)
outputTarget: Convert the Result of the xmlSource (the path where the parameters of this class are written)
Five, use jaxp to modify the node
*** Modify the sex content under the first p1 to be nan** Steps
/*
* 1. Create a parser factory
* 2. Create a parser based on the parser factory
* 3. Parse xml, return document
**
4, get sex item method
* 5. Modify the value in sex
*** setTextContent method
*
* 6. Write back xml
* */
Six, use jaxp to delete nodes
*** Delete <sex>nan</sex> node** Steps
/*
* 1. Create a parser factory
* 2. Create a parser based on the parser factory
* 3. Parse xml, return document
*
* 4. Get sex element
* 5. Use the getParentNode method to get the parent node of sex
* 6. Delete the parent node using the removeChild method
*
* 7. Write back xml
* */
Seven, use jaxp to traverse nodes
** Print out all the element names in xml** Steps
/*
* 1. Create a parser factory
* 2. Create a parser based on the parser factory
* 3. Parse xml and return document
*
* ==== Use recursive implementation =====
* 4, get the root node
* 5, get the child node of the root node
* 6, get the child nodes of the child node of the root node
* */
public static void main(String[] args) throws Exception { DocumentBuilderFactory builderFactory = DocumentBuilderFactory .newInstance(); DocumentBuilder builder = builderFactory.newDocumentBuilder(); Document document = builder.parse("src/Person.xml"); list(document); } private static void list(Node node) { //Print only when it is determined to be the element type if (node.getNodeType() == Node.ELEMENT_NODE) { System.out.println(node.getNodeName()); } //Get the set of child nodes of this node NodeList list = node.getChildNodes(); for (int i = 0; i < list.getLength(); i++) { // get each child node Node node1 = list.item(i); //recursive call list(node1); } }
Eight, the principle of sax analysis
* Allocate a tree structure in memory according to the hierarchical structure ofxml** Encapsulate tags, attributes, and text in xml into objects * sax mode: event-driven, parsing while reading * In the javax.xml.parsers package ** SAXParser An instance of this class can be obtained from the SAXParserFactory.newSAXParser() method - parse(File f, DefaultHandler dh) * two parameters ** first parameter: xml path ** event handler ** SAXParserFactory instance newInstance() method * sax execution process
* When the start tag is parsed, the startElement method is automatically executed
startElement(String uri, String localName, String qName, Attributes attributes)
* When parsing to text, automatically execute the characters method
characters(char[] ch, int start, int length)
* When parsing to the end tag, automatically execute the endElement method
endElement(String uri, String localName, String qName)
Nine, use jaxp's sax method to parse xml
** Print the entire document
*** Execute the parse method, the first parameter is the xml path, and the second parameter is the event handler
*** Create a class and inherit The class of the event handler,
***Rewrite the three methods in it * Get the values of all the name elements ** Define a member variable flag= false ** Determine whether the start method is a name element, if it is a name element, put The flag value is set to true ** If the flag value is true, print the content in the characters method ** When the execution reaches the end method, set the flag value to false * Get the value of the first name element ** Define a member variable idx =1 ** At the end of the method, idx+1 idx++ ** Want to print the value of the first name element , - judge in the characters method, -- judge flag=true and idx==1, print the content
public class SaxDemo { public static void main(String[] args) throws Exception { SAXParserFactory parserFactory = SAXParserFactory.newInstance(); SAXParser saxParser = parserFactory.newSAXParser(); MyDefault dh = new MyDefault(); saxParser.parse("src/Person.xml", dh); } } // Create a class that inherits the event handler class // Override the three methods inside class MyDefault extends DefaultHandler { // define a member variable boo= false boolean boo = false; @Override public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { // Determine whether the start method is a name element, if it is a name element, set the boo value to true if (qName.equals("name")) { boo = true; System.out.print("<" + qName + ">"); } } @Override public void endElement(String uri, String localName, String qName) throws SAXException { // When the execution reaches the end method, set the flag value to false if (qName.equals("name")) { boo = false; System.out.println("</" + qName + ">"); } } @Override public void characters(char[] ch, int start, int length) throws SAXException { // If the flag value is true, print the content in the characters method if (boo) { System.out.print(new String(ch, start, length)); } } }
Ten, schema constraints
dtd syntax: <!ELEMENT element name constraint>
**schema conforms to xml syntax, xml statement
** one xml can have multiple schemas, and multiple schemas are distinguished by namespace (similar to java package name)
**dtd contains PCDATA type, but more data types can be supported in schema
*** For example, age can only be an integer, an integer type can be defined directly in schema
*** The schema syntax is more complicated, and schema cannot currently replace dtd
11. Quick Start of Schema
* Create a schema file with a suffix of .xsd
** root node <schema>
** in the schema file
** attribute xmlns="http://www.w3.org/2001/XMLSchema"
- indicates that the current xml file is a Constraint file
** targetNamespace="http://www.example.org/1"
- use the schema constraint file and import the constraint file directly through this address
** elementFormDefault="qualified"
Step
(1) See how many elements are in the xml
<element>
(2) Look at simple elements and complex elements
* If complex elements are
<complexType>
<sequence>
sub-elements
</sequence>
</complexType>
(3) simple elements, write them in the complex elements
<element name="person" ">
<complexType>
<sequence>
<element name="name" type="string"></element>
<element name="age" type="int"></element>
</sequence>
</complexType>
</element>
(4) Introduce the constraint file in the constrained file
<person xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.itcast.cn/20151111"
xsi :schemaLocation="http://www.example.org/1 1.xsd" >
** xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
-- indicates that xml is a Constraint file
** xmlns="http://www.itcast.cn/20151111"
-- is the targetNamespace
** xsi:schemaLocation="http://www.itcast.cn/20151111 1.xsd"> in the constraint document
- - targetNamespace space constraints the address path of the document * <sequence>: Indicates the order in which the element appears <all>: The element can only appear once <choice>: The element can only appear in one of them maxOccurs="unbounded": Indicates the occurrence of the element Times <any></any>: Indicates any element * can constrain attributes * written in complex elements *** written in </complexType> before --
<attribute name="id1" type="int" use="required"></attribute>
- name: attribute name
- type: attribute type int string
- use: whether the attribute must appear required
* complex schema constraints
* introduce multiple schema files, each of which can be given an alias
<?xml version="1.0" encoding="UTF-8"?> <!-- The data file references multiple schemas --> <company xmlns = "http://www.example.org/company" xmlns:dept="http://www.example.org/department" xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.example.org/company company.xsd http://www.example.org/department department.xsd" > <employee age="30"> <!-- Department Name--> <dept:name>Human Resources</dept:name> <!-- employee name--> <name>Wang Xiaoxiao</name>
<?xml version="1.0" encoding="UTF-8"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.example.org/company" elementFormDefault="qualified"> <element name="company"> <complexType> <sequence> <element name="employee"> <complexType> <sequence> <!-- Refers to any element --> <any></any> <!-- employee name--> <element name="name"></element> </sequence> <!-- Add attribute to employee element --> <attribute name="age" type="int"></attribute> </complexType> </element> </sequence> </complexType> </element> </schema>
<?xml version="1.0" encoding="UTF-8"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.example.org/department" elementFormDefault="qualified"> <!-- Department Name--> <element name="name" type="string"></element> </schema>