python3 爬取糗事百科 - 代码天地

python3 爬取糗事百科

其他 2018-08-01 05:13:44 阅读次数: 0

1.准备：

python 3.6

需要用到的包：re request BeautiflSoup urllib

2.代码如下：

# -*- coding: utf-8 -*-
import urllib.request
import re
from urllib import request
from bs4 import BeautifulSoup

articleUrl = "https://www.qiushibaike.com/textnew/page/%d"
#段子地址
commentUrl = "https://www.qiushibaike.com/article/%s"
#评论
page = 0

Url = articleUrl % page
#1.获取url源码

def getContentOrComment(Url):
	user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'
	headers = {'User-agent': user_agent} #浏览器信息
	req = request.Request(url=Url, headers=headers)
	response = urllib.request.urlopen(req) #打开网址
	content = response.read().decode('utf-8') #读取所有源代码
	#print(content)
	return content

articlePage = getContentOrComment(Url)

#2.获取话题内容
soup = BeautifulSoup(articlePage, 'html.parser')

#print(soup) #格式化输出
floor = 1
#attrs属性
for string in soup.find_all(attrs="article block untagged mb15"):
	#print(string)
	#切片
	commentId = str(string.get('id')).strip()[11:]
	#print(commentId) #获取内容链接的后面id 9位数
	print('\n')
	#获取内容
	print(floor, '.', string.find(attrs="content").get_text().strip()) 
	floor += 1

#3.获取评论
commentPage = getContentOrComment(commentUrl%commentId)
soup = BeautifulSoup(commentPage, 'html.parser')
Cfloor = 1
for comment in soup.find_all(attrs="body"):
 	print("\n  ", Cfloor, " 楼回复：", comment.get_text().strip())
 	Cfloor += 1

猜你喜欢

转载自blog.csdn.net/qq_34777982/article/details/81316948

python3糗事爬取-------------------糗事百科

Python爬取糗事百科

基于python3 爬取糗事百科

python3 爬取糗事百科

python爬虫（二）爬取糗事百科

Python 爬取糗事百科段子

python爬取糗事百科段子

利用Python爬取糗事百科段子信息

Python爬取糗事百科-多进程方法

python预加载爬取糗事百科帖子

爬虫：python爬取糗事百科网页信息

Python :爬取糗事百科段子

Python爬取多页糗事百科

python爬虫1、~爬取糗事百科

python scrapy demo 爬取糗事百科

python多进程爬取糗事百科图片

python3爬虫入门(正则+requests 糗事百科单页图片爬取)

python3爬虫入门(正则+requests 糗事百科多页图片爬取)

python爬去糗事百科

爬虫实战一基于Python3的urllib+re模块爬取糗事百科

python爬虫练习1：通过python爬取糗事百科的搞笑图片

芝麻HTTP:Python爬虫实战之爬取糗事百科段子

自己手写使用python爬取糗事百科段子

python笔记之利用BeautifulSoup爬取糗事百科首页段子

python笔记之利用scrapy框架爬取糗事百科首页段子

python爬虫十二：middlewares的使用，爬取糗事百科

Python爬虫实战(六)：爬取糗事百科段子

python爬虫学习之路(7) 爬取糗事百科

[Python爬虫]使用Scrapy框架爬取糗事百科

Python爬虫实现爬取糗事百科段子 (26行代码简单实现)

今日推荐

开源日报 | Chrome内置Gemini的意义不在于Gemini；中国AI追随之路的五大误区；ECharts创始人“下海”养鱼；谷歌I/O开发者大会什么都有，只是没有惊喜

微软回应中国区AI团队“打包赴美”传闻

基于大语言模型的开源知识库问答系统 MaxKB GitHub Star 数量突破 5,000 个！

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

《2024 年一季度互联网投融资运行情况》研究报告

报告：Django 仍然是 74% 开发者的首选

周排行

laravle中orm简单的增删改查

文本分类特征选取之CHI开方检验

Spark核心编程-WordCount

大数据开发实战系列之电信客服(1)

读书笔记 - 把时间当作朋友 by 李笑来

python 笔记--if else

SpringBoot/Mybatis/Druid, 多数据源MultiDataSource配置思路

排序三个整数

redis集群搭建【2】-Windows中Redis集群搭建

STM32F030驱动TM1650点亮4联数码管

每日归档

更多

2024-05-16(6)

2024-05-15(24)

2024-05-14(0)

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)