java截取字符串中的img标签中的src地址，批量下载图片到本地，更新字段的部分内容

项目问题说明：

之前为了省事，使用了135富文本编辑器编辑文案，同时上传了非常多的图片到135编辑器上，后来发现图片访问403被拦截。

解决方法，需要将放在135编辑器上的图片先下载到本地，然后上传项目使用的图片服务器上，使用与135编辑器图片存储相同的路径，然后将图片域名更新为项目使用的图片服务器域名即可。

先将存有135编辑器图片的数据库字段查询出来，把数据存放到临时表aaa中。

        public static void main(String[] args) throws Exception { 
			List<String> imgs = new ArrayList<String>();
			String sql = "select ff result from aaa";
			PreparedStatement ps = DataBase.getPreparedStatement(sql);
			ResultSet rs = ps.executeQuery();
			while(rs.next()){
				String htmlcontent = rs.getString("result")==null?"":rs.getString("result");
				if("".equals(htmlcontent)){
					continue;
				}
				imgs.addAll(getImgSrc(htmlcontent));
			}
			List<String> temp = new ArrayList<String>();
			for (int i = 0; i < imgs.size(); i++) {
				if(!temp.contains(imgs.get(i))){
					temp.add(imgs.get(i));
				}
			}
			System.out.println(temp.size());
			for (int i = 0; i < imgs.size(); i++) {
				downloadimg(imgs.get(i));
			}
			System.out.println(imgs.size());
		}
    /**
	 *   @description 截取字符串中的img标签中的src地址
	 *   @createTime 创建时间：
	 */
	public static List<String> getImgSrc(String htmlStr) {
		
		if( htmlStr == null ){
			return null;
		}
		
		String img = "";  
        Pattern p_image;  
        Matcher m_image;  
        List<String> pics = new ArrayList<String>();
        
        String regEx_img = "<img.*src\\s*=\\s*(.*?)[^>]*?>";  
        p_image = Pattern.compile(regEx_img, Pattern.CASE_INSENSITIVE);  
        m_image = p_image.matcher(htmlStr);  
        while (m_image.find()) {  
            img = img + "," + m_image.group();  
            // Matcher m =  
            // Pattern.compile("src=\"?(.*?)(\"|>|\\s+)").matcher(img); //匹配src  
            Matcher m = Pattern.compile("src\\s*=\\s*\"?(.*?)(\"|>|\\s+)").matcher(img);
           
            while (m.find()) {  
            	if(m.group(1).contains("https://image.135editor.com/files/users/")){
            		pics.add(m.group(1));  
            	}
            }  
        }  
        return pics;
	}
    public static void downloadimg(String imgurl) throws Exception {
        // 下载网络文件
        int bytesum = 0;
        int byteread = 0;

        URL url = new URL(imgurl);
        String mainpath = "E:/htmlcontent";

        try {
            URLConnection conn = url.openConnection();
            InputStream inStream = conn.getInputStream();
            String realpath = mainpath+"/"+imgurl.split("https://image.135editor.com")[1];
            String path = realpath.substring(0,realpath.lastIndexOf("/"));
            //System.out.println(path);
            //创建与135图片服务器相同的目录
            File file = new File(path);
            if (!file.exists()) {
                file.mkdirs();
            }
            FileOutputStream fs = new FileOutputStream(mainpath+"/"+imgurl.split("https://image.135editor.com")[1]);

            byte[] buffer = new byte[51200];
            int length;
            while ((byteread = inStream.read(buffer)) != -1) {
                bytesum += byteread;
                System.out.println(bytesum);
                fs.write(buffer, 0, byteread);
                System.out.println("下载成功："+imgurl);
            }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    static class DataBase{
		private static final String className = "com.mysql.jdbc.Driver";
		private static final String url = "";
		private static final String username = "";
		private static final String password = "";
	
		
		private static Connection conn = null;
		static{
			try{
				Class.forName(className);
				//2.获得数据库的连接
				conn=DriverManager.getConnection(url, username, password);
			}catch(Exception e){
				e.printStackTrace();
			}
		}
		
		public static Connection getConnection() throws ClassNotFoundException, SQLException{
			return conn;
		}
		
		public static PreparedStatement getPreparedStatement(String sql) throws ClassNotFoundException, SQLException{
			return getConnection().prepareStatement(sql);
		}
		
	}

以上可以把所需要的图片都下载到本地，如果图片数量过大，建议分批处理。

然后就是手动将图片放到自己所在项目的图片服务器上了。

最后将图片的域名由135图片域名变为项目的图片服务器域名即可。

UPDATE aaa
SET content = REPLACE (
    htmlcontent,
    "yyyyyyyyy",
    "xxxxxxxxx"
)
WHERE
content LIKE '%yyyyyyyyy%'

问题解决。

半世晨晓17

发布了27 篇原创文章 · 获赞 1 · 访问量 1万+

私信关注

java截取字符串中的img标签中的src地址，批量下载图片到本地，更新字段的部分内容

猜你喜欢