Java-jsoup爬虫

最近用Java在干一些解析页面的工作，用的jsoup去通过url，拿到我们想要的数据。

例子

假设用我的博客地址。https://blog.csdn.net/weixin_45906830，获取到对应的信息。

首先：引进jsoup的包。

我用的是gradle，在build.gradle里面的dependencies下加入。（用Maven的也是差不多。添加对应依赖。）

compile group: 'org.jsoup', name: 'jsoup', version: '1.13.1'

我们就拿到用户名吧
在这里插入图片描述

 @Test
    public void test1() throws Exception {
    
    
		//我博客的url
        String url = "https://blog.csdn.net/weixin_45906830";
		//用来让idea模拟成浏览器访问
        Connection connect = Jsoup.connect(url).timeout(2500).ignoreContentType(true);
        connect.header("Connection", "keep-alive");
        connect.header("Host", "sousuo.gov.cn");
        connect.header("Content-Type", "application/json");
        connect.header("User-Agent", "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36");
		//拿到的全部的页面信息。
        String html = connect.execute().body();
//        System.out.println(html);
        Document document = Jsoup.parse(html);
        //根据F12去看页面里面的信息，然后调用对应的方法。
        Elements oneAbc = document.getElementsByClass("name ");
        String name = oneAbc.text();
        System.out.println("用户名：" + name);
    }

在这里插入图片描述

思路

基本上就是根据一个路径去获取到页面的html，或者是接口，然后我们在里面去筛选对自己有用的东西。可以把获取到的页面改成String，然后分割String，反正可以拿到就行了。

例子

首先：引进jsoup的包。

思路

猜你喜欢