Jsoup API解析HTML中标签

Jsoup官网地址:http://jsoup.org/

一:最新的maven 版本

<!-- https://mvnrepository.com/artifact/org.jsoup/jsoup -->
<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.13.1</version>
</dependency>

1. 通过ID解析单个input元素

Document doc = Jsoup.parse( responseStr );
Element inputTag = doc.getElementById("dataVal");
String dataVal = inputTag.attr("value");

2. 解析单个input元素

String html = "<p><input align=\"top\" src=\"/项目名/userfiles/image/yiyiren.jpg\" width=\"60%\" type=\"image\" longdesc=\"undefined\" /></p>";
Document doc = Jsoup.parse(html);
Element inputTag = doc.select("input").first();
String value= inputTag.attr("value");

3. 解析多个input元素

String html = "<p><input src=\"/项目名/userfiles/image/QQ图片20130618085610.jpg\" width=\"200\" height=\"99\" type=\"image\" longdesc=\"undefined\" /><strong>名称</strong>:薏苡仁<br /><input align=\"top\" src=\"/项目名/userfiles/image/yiyiren.jpg\" width=\"60%\" type=\"image\" longdesc=\"undefined\" /><br /></p>";
Document doc = Jsoup.parse(html);
Elements elements = doc.select("input");
for(Element inputTag : elements){
    String imgUrl = inputTag.attr("src");
    System.out.println("imgUrl=====" + imgUrl);
}

4. 直接通过URL、处理HTML

Document doc = Jsoup.connect("http://www.baidu.com").get();

Element inputTag = doc.getElementById("dataVal");
String dataVal = inputTag.attr("value");


String text = doc.body().text();   // 取得body的文本
Element link = doc.select("a").first();//查找第一个a元素
String linkText = link.text();     // 取得链接地址中的文本
String href = link.attr("href");   // 取得链接地址
 
String linkOuterH = link.outerHtml(); 
String linkInnerH = link.html();   // 取得链接内的html内容

5. 通过Class选择器

Document doc = Jsoup.parse( responseStr );
//获取目标HTML代码
Elements elements4 = doc.select("[class=wea]");
String text = elements4.get(0).text();

猜你喜欢

转载自blog.csdn.net/yexiaomodemo/article/details/106693278