Java-基于URL流的网页图片爬虫

技巧

在网页元素中以img开头的表示图片的元素,src=“内容"字符串里的内容就是图片的资源地址
如:
它右键审查元素可以看到img data-v-0d738edb=”" src=“https://avatar.csdn.net/9/9/A/1_preyhard.jpg?1543834708” alt="" class=“head”

步骤

1.建立URL流获取整个网页的信息
2.从信息中筛选出图片的资源地址,再分别建立URL流获取图片数据存到新的图片文件中

代码

package westos2;

import java.io.*;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;
import java.util.Random;

public class client {
    public static void main(String[] args) throws IOException {
        Random random = new Random ( );
        HttpURLConnection connection = (HttpURLConnection)
                new URL ("https://tieba.baidu.com/p/2256306796?red_tag=1781367364").openConnection();
        InputStream in = connection.getInputStream();
        BufferedReader buffer = new BufferedReader ( new InputStreamReader ( in ) );
        while (true){
            String s = buffer.readLine ( );
            if (s==null){
                break;
            }else {
                if (s.contains ( "<img" )){
                    show(s,random);
                }
            }
        }
    }

    private static void show(String s,Random random) throws IOException {
        int imgindex = s.indexOf ( "<img" );
        String s1 = s.substring ( imgindex );
        int srcindex = s1.indexOf ( "src=" );
        String s2 = s1.substring ( srcindex+5);
        int yinindex = s2.indexOf ( "\"" );
        String s3 = s2.substring ( 0, yinindex );
        System.out.println (s3 );
        if (s3.startsWith ( "http" )){
            HttpURLConnection url = (HttpURLConnection)new URL ( s3 ).openConnection ( );
            InputStream in = url.getInputStream ( );
            String i = random.nextInt ( )+"";
            FileOutputStream out = new FileOutputStream ( "C:\\Users\\Administrator\\Desktop\\jpg\\" + i + ".png" );

            while (true){
                byte[] bytes = new byte[1024 * 8];
                int read = in.read ( bytes );
                if (read==-1){
                    break;
                }
                out.write ( bytes,0,read );
            }
        }
        String str = s2.substring ( yinindex );
        if (str.contains ( "<img" )){
            show ( str,random );
        }
    }
}

结果

在这里插入图片描述
在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/PreyHard/article/details/84781907