php去除富文本编辑器里的其他样式，提取纯净内容及图片

项目用到富文本编辑器，在提交到数据库时，原始的值是包含HTML编码的样式，但我们又不需要这些无用的样式，需要将其内容提取出来

未处理前，原始的值是这样的

<p><span style="color: rgb(25, 25, 25); font-family: &quot;PingFang SC&quot;, 
Arial, 微软雅黑, 宋体, simsun, sans-serif; font-size: 16px; font-style: normal; 
font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; 
letter-spacing: normal; orphans: 2; text-align: justify; text-indent: 0px;
 text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; 
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); 
display: inline !important; float: none;">爱上一个人最明显的特征就是会习惯他的习惯，
即便哪天分开了，也还是在继续着他的习惯。分开后的第一、二个月之所以那么痛苦，也是因为习惯。
</span><strong style="font-weight: 700; border: 0px; margin: 0px; padding: 0px;
font-size: 16px; color: rgb(25, 25, 25); font-family: 
orphans: 2; text-align: justify; text-indent: 0px;-webkit-text-stroke-width: 0px;
 background-color: rgb(255, 255, 255);">习惯了两个人，突然要一个人独处，自然是难受，
这也是有些人为什么愿意放下所有的尊严和底线去苦苦哀求一个人，因为她害怕离开了某人自己就无
法生活下去了，毕竟有他的时候，一切都有依赖。</strong></p>

【提取内容】接下来我所知道的有两种方法：

一、用下面这几行代码

$data["content"] = I("content");
$content_01 = $data['content'];//从数据库获取富文本content
$content_02 = htmlspecialchars_decode($content_01);//把一些预定义的 HTML 实体转换为字符
$content_03 = str_replace("&nbsp;","",$content_02);//将空格替换成空
$contents = strip_tags($content_03);//函数剥去字符串中的 HTML、XML 以及 PHP 的标签,获取纯文本内容
$data['content'] = $contents;

//拼装图片
$imgs = $content_03;
$imgs = strip_tags($imgs, '<img>');

preg_match_all('/\<img\s+src\=\"([\w:\/\.]+)\"/', $imgs, $matches);  //$matches[1] 为图片路径数组

以上方法最后的$data['content']则是纯净的内容（拼装图片暂时没用到）。

二、通过php的htmlspecialchars_decode()函数将信息里的 <内容转换成html的标记，再通过strip_tags()将html标记去除就可以获取到干净的内容了。

$data["content"] = strip_tags(htmlspecialchars_decode($data["content"]));

php去除富文本编辑器里的其他样式，提取纯净内容及图片

猜你喜欢