PHP爬虫-爬取百度贴吧首页违规主题贴

因为是第一次写,感觉有点冗余。不过嘛,本文章主要面向不知道爬虫为何物的小伙伴。o(∩_∩)o

<?php
$url='http://tieba.baidu.com/f?ie=utf-8&kw=php&fr=search'; // 地址
$html = file_get_contents($url); // 获取页面内容

$dom = new DOMDocument();
@$dom->loadHTML($html); // 因为会报警告,所以忽略掉


$xpath = new DOMXPath($dom);
$condition = "php|小白"; // 这是你要搜的符合条件,|分隔
$ex_condition = explode('|', $condition);

$str = '';
$count = count($ex_condition) - 1;

foreach ($ex_condition as $key => $value) { // 拼接条件
    if ($key < $count) {
        $str .= "contains(@title, '" . $value . "') or ";
    } else {
        $str .= "contains(@title, '" . $value . "')";
    }

}

$elements['title'] = $xpath->query("//div[@class='threadlist_lz clearfix']/div/a[" . $str . "]"); // 获取标题
$elements['href'] = $xpath->query("//div[@class='threadlist_lz clearfix']/div/a[" . $str . "]/@href"); // 获取链接
if (!is_null($elements)) {
    foreach ($elements['title'] as $key => $title) {
        echo "<a href='http://tieba.baidu.com". $elements['href'][$key]->textContent . "'/a>" . $title->textContent . "<br>";
    }
}

效果是这样的:
在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/weikaixxxxxx/article/details/83577586
今日推荐