python3爬虫(基于requests、BeautifulSoup4)之项目实战(二) - 代码天地

python3爬虫(基于requests、BeautifulSoup4)之项目实战(二)

其他 2018-07-30 05:11:04 阅读次数: 0

紧接着上期话题，我在这里为大家详细解释一下BeautifulSoup的用法

soup=BeautifulSoup(res.text,'html.parser')

当我们获取了soup内容后该如何随心所欲的抓取自己想要的内容呢?
我在这里给大家介绍几个方法：
1.soup.select(‘.class’):
这个方法可以返回特定div class下的内容

import requests
from bs4 import BeautifulSoup

def getInfo(url):
    res=requests.get(url)
    res.encoding='utf-8'
    soup=BeautifulSoup(res.text,'html.parser')
    results=soup.select('.intim')
    for result in results:
        print(result.text)

if __name__ == '__main__':
    url='http://jwc.tyut.edu.cn/'
    getInfo(url)

这样写我可以返回 div class叫intim下的所有内容，部分结果如下：
这里写图片描述

2.soup.select(‘#id’):
这个方法可以返回特定div id下的内容

results=soup.select('#select')

部分结果如下：
这里写图片描述

3.那么我想进一步获取特定div 特定标签下的内容呢?
BeautifulSoup支持嵌套结构
比如我想获得intmc class下的a标签的title内容：

results=soup.select('.intmc a')
    for result in results:
        print(result['title'])

结果部分如下：
这里写图片描述

今天的实战就到这里了，我们下期再见！

猜你喜欢

转载自blog.csdn.net/weixin_38168694/article/details/81270938

python3爬虫(基于requests、BeautifulSoup4)之项目实战(二)

python3爬虫(基于requests、BeautifulSoup4)之项目实战(三)

python3爬虫(基于requests、BeautifulSoup4)之项目实战(一)

python3爬虫(基于requests、BeautifulSoup4)之环境配置

python3之beautifulsoup4

【爬虫】002 python3 +beautifulsoup4 +requests 爬取静态页面

python3 --- 基于requests + beautifulsoup 实现爬虫项目

【python3爬虫】beautifulsoup4 安装

Python3网络爬虫教程14——BeautifulSoup4之搜索文档树

Python3 BeautifulSoup4

python爬虫beautifulsoup4系列3

python爬虫之-BeautifulSoup4

爬虫之BeautifulSoup4

python 爬虫-beautifulsoup4

python爬虫实战：基础爬虫(使用BeautifulSoup4等) python爬虫实战：基础爬虫(使用BeautifulSoup4等)

【Python3 爬虫】U10_初识BeautifulSoup4库

python3 爬虫（requests+BeautifulSoup）

python3解析库BeautifulSoup4

Python3 --- BeautifulSoup4用法总结

python之BeautifulSoup4

python爬虫之BeautifulSoup4库的简单用法

python爬虫之BeautifulSoup4介绍

python爬虫之数据解析（一）：BeautifulSoup4库

python爬虫之BeautifulSoup4基础教程

python3爬虫学习之beautifulsoup实战

Python进阶(十九)-Python3安装第三方爬虫库BeautifulSoup4

Python3网络爬虫教程13——BeautifulSoup4基本使用及遍历文档树

爬虫之BeautifulSoup4学习

python爬虫beautifulsoup4系列1

python爬虫beautifulsoup4系列2

今日推荐

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

Java自定义时间格式

同步整形电路

在开发中最最最常用的字符串的属性大集合

Linux 查看端口占用并杀掉

Java基础四：ArrayList

多线程之死锁就是这么简单

mysql 基础命令集

awk 命令详解

Centos6.3编译安装nginx+php步骤

OCR （Optical Character Recognition，光学字符识别）

每日归档

更多

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)