富文本导出过滤样式

前言

Java 后端的导出经常使用到的插件是EasyPoi，同时也偶有涉及导出图片和富文本的情况。

关于EasyPoi导出图片，参考我下面这篇博客：

https://blog.csdn.net/qq_41057885/article/details/108736913

正文

富文本的导出，需要对立面的样式进行过滤，主要是过滤HTML标签，CSS样式和转义字符。

String htmlStr = "<h1>我是标题H1</h1><div style="color:red; font-size:20px;">我是DIV <span>HELLO WORLD</span></div>";

// 我是标题H1我是DIVHELLO WOELD
htmlStr = htmlStr.replaceAll("\\<.*?>|\\s*|\t|\r|\n", "");

\\<.*?> 过滤HTML标签

\\s* 过滤空格

\t 过滤制表符

\r 过滤回车符

\n 过滤换行符

去掉HTML标签工具类

public class HtmlUtils {

    public static String getText(String htmlStr) {
        if (htmlStr == null || "".equals(htmlStr)) {
            return "";
        }
        String textStr = "";
        Pattern pattern;
        java.util.regex.Matcher matcher;

        try {
            String regexRemark = "<!--.+?-->";
            //定义script的正则表达式{或<script[^>]*?>[\\s\\S]*?<\\/script> }
            String regexScript = "<[\\s]*?script[^>]*?>[\\s\\S]*?<[\\s]*?\\/[\\s]*?script[\\s]*?>";
            //定义style的正则表达式{或<style[^>]*?>[\\s\\S]*?<\\/style> }
            String regexStyle = "<[\\s]*?style[^>]*?>[\\s\\S]*?<[\\s]*?\\/[\\s]*?style[\\s]*?>";
            定义HTML标签的正则表达式
            String regexHtml = "<[^>]+>";
            String regexHtml1 = "<[^>]+";
            htmlStr = htmlStr.replaceAll("\n", "");
            htmlStr = htmlStr.replaceAll("\t", "");
            htmlStr = htmlStr.replaceAll("\r", "");
            htmlStr = htmlStr.replaceAll("&nbsp;", "");
            //过滤注释标签
            pattern = Pattern.compile(regexRemark);
            matcher = pattern.matcher(htmlStr);
            htmlStr = matcher.replaceAll("");

            pattern = Pattern.compile(regexScript, Pattern.CASE_INSENSITIVE);
            matcher = pattern.matcher(htmlStr);
            过滤script标签
            htmlStr = matcher.replaceAll("");

            pattern = Pattern.compile(regexStyle, Pattern.CASE_INSENSITIVE);
            matcher = pattern.matcher(htmlStr);
            过滤style标签
            htmlStr = matcher.replaceAll("");

            pattern = Pattern.compile(regexHtml, Pattern.CASE_INSENSITIVE);
            matcher = pattern.matcher(htmlStr);
            //过滤html标签
            htmlStr = matcher.replaceAll("");

            pattern = Pattern.compile(regexHtml1, Pattern.CASE_INSENSITIVE);
            matcher = pattern.matcher(htmlStr);
            过滤html标签
            htmlStr = matcher.replaceAll("");

            textStr = htmlStr.trim();

        } catch (Exception e) {
            e.printStackTrace();
        }
        //返回文本字符串
        return textStr;
    }
}

富文本导出过滤样式

前言

正文

去掉HTML标签工具类

猜你喜欢