学习进度-10 python爬虫 - 代码天地

学习进度-10 python爬虫

其他 2020-02-10 18:34:06 阅读次数: 0

学习爬虫的第一个案例是小说爬虫。

小说爬虫首先是解析小说页面源代码，在页面源代码中可以看到小说每章节的内容链接

爬虫的代码：

import requests
import re

url = 'http://www.92kshu.cc/69509/'
response = requests.get(url)
response.encoding = 'gbk'
html = response.text
title = re.findall(r'<meta property="og:novel:book_name" content="(.*?)"/>', html)[0]
fb = open('%s.txt' % title, 'w', encoding='utf-8')
# 获取每章的内容
# print(html)
dl = re.findall(r'<dl><dt><i class="icon"></i>正文</dt>(.*?)</dl>', html)[0]
print(dl)
chapter_info_list = re.findall(r'<dd><a href="(.*?)">(.*?)</a></dd>', dl)
print(chapter_info_list)
for chapter_info in chapter_info_list:
    chapter_url, chapter_title = chapter_info
    chapter_url = "http://www.92kshu.cc%s" % chapter_url
    # print(chapter_url)
    chapter_response = requests.get(chapter_url)
    chapter_response.encoding = 'gbk'
    chapter_html = chapter_response.text
    chapter_content = re.findall(r'<div class="chapter">(.*?)><br>', chapter_html)[0]
    # print(chapter_content)
    chapter_content = chapter_content.replace('<p>', '')
    chapter_content = chapter_content.replace('</p>', '')
    fb.write(chapter_title)
    fb.write(chapter_content)
    fb.write('\n')
    print(chapter_url)

爬虫结果：

猜你喜欢

转载自www.cnblogs.com/zhaoxinhui/p/12291944.html

学习进度-10 python爬虫

python学习进度10（高阶函数）

Python爬虫学习：简单的爬虫

Python数据爬虫学习笔记（10）淘宝图片爬虫实战

Python学习（爬虫学习）

Python爬虫学习（四）

Python爬虫学习（三）

Python爬虫学习（二）

Python爬虫学习

Python爬虫学习（一）

python爬虫专栏学习

Python爬虫学习（五）

Python——爬虫学习1

Python——爬虫学习2

python爬虫学习记录

如何学习python爬虫

python爬虫学习01

学习python爬虫步骤

Python 爬虫学习2

python 爬虫学习1

python 爬虫学习（一）

Python爬虫学习笔记

Python爬虫学习路线

Python爬虫学习必看

python爬虫学习系列

Python爬虫学习三

Python爬虫学习二

Python爬虫学习一

python网络爬虫学习

MOOC学习Python爬虫

今日推荐

基于大语言模型的开源知识库问答系统 MaxKB GitHub Star 数量突破 5,000 个！

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

《2024 年一季度互联网投融资运行情况》研究报告

报告：Django 仍然是 74% 开发者的首选

15 年前上了“FFmpeg 耻辱柱”，今天他还得谢谢咱——腾讯QQPlayer一雪前耻？

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

周排行

记一下去大梅沙的准备（2018-05-26）

Spring 注解事务

基于HTTP协议的客户端缓存

阿里云rds 备份和还原

[PHP] 几个拖慢 PHP 程序/API 运行速度的点

python 代码风格------------PEP8规则

js控制json生成菜单——自制菜单（一）

将字符串: 'k:1|k1:2|k2:3|k3:4 ' ,处理成 python 字典: {'k':1, 'k1':2, ...}

微信小程序转支付宝小程序

Qt551.窗口滚动条

每日归档

更多

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)