python爬虫--一次爬取小说的尝试 - 代码天地

python爬虫--一次爬取小说的尝试

其他 2019-06-27 18:21:33 阅读次数: 0

一次爬取小说的尝试

 1 #!/usr/bin/python
 2 # -*- coding:utf-8 -*-
 3 import requests
 4 from bs4 import BeautifulSoup
 5 
 6 url = 'http://www.zanghaihua.org/nanbudangan/'
 7 req = requests.get(url=url)
 8 req.encoding = req.apparent_encoding
 9 soup = BeautifulSoup(req.text,'html.parser')
10 div = soup.find(name='div',attrs={'class':'booklist'})
11 # print(div)
12 span_list = div.find_all('span')
13 # print(span_list)
14 
15 for span in span_list:
16     a = span.find('a')
17     # span_text=span.find(attrs={'class':'v'})
18     if not a:
19         continue
20     a_url = a.get('href')
21     # a_text = a.text
22 
23 
24     response = requests.get(url=a_url)
25     response.encoding =response.apparent_encoding
26     # print(response.text)
27     # print(response.encoding)
28     soup = BeautifulSoup(response.text,'html.parser')
29 
30     Bookname = soup.find(name='h1',attrs={'align':'center'}).text
31     # print('书名：%s' %Bookname)
32     ChapterTitle =soup.find(name='div',attrs={'class':'chaptertitle'}).text
33     # print('章节名：%s' %ChapterTitle)
34 
35     Title = soup.find(name='div',attrs={'id':'BookText'}).get_text('\n','<br/><br/>')
36     #用get_text获取文本并将<br/><br/>替换成\n
37     # print(Title)
38 
39     with open(Bookname,'ab+') as f:
40         #以追加模式写入文件
41 
42         if ChapterTitle=='关于南部档案馆的研究':
43             f.write(Bookname.encode('utf-8'))
44         f.write(ChapterTitle.encode('utf-8'))
45         f.write(Title.encode('utf-8'))

猜你喜欢

转载自www.cnblogs.com/xiaoyujuan/p/11098668.html

python爬虫--一次爬取小说的尝试

第一次尝试爬虫，爬取慧聪网数据，牛刀小试

Python爬虫——爬取小说

Python爬虫层层递进，从爬取一章小说到爬取全站小说

如何用python爬虫从爬取一章小说到爬取全站小说

python爬虫之类的方法爬取一部小说

一个简单的爬取小说的python程序彻底搞懂Python的字符编码

python之如何爬取一篇小说的第一章内容

python如何快速的爬取小说的正确姿势

Python爬虫—爬取小说名著

python：爬虫练习爬取小说(初学)

Python爬虫爬取网站小说

python爬虫之爬取网站小说

python爬虫爬取网站小说

记录一次爬取某昵称网站的爬虫

python爬虫58同城（多个信息一次爬取）

python -又一次爬虫练习（爬取LOL所有的英雄头像）

python爬虫-第一次尝试

python爬虫之爬取网站小说，获取一部小说

初入爬虫-爬小说的代码

python3.6.5爬虫之一：笔趣阁小说爬取（首页爬取法）

Java爬取页面数据的一次尝试（以CSDN为例）

【爬虫实战】起点中文网小说的爬取

python 爬取小说

Python爬取小说

python爬取小说详解（一）

记录一次关于python爬取视频的过程

python爬虫练习2：通过Python爬取小说

python爬虫——记一次前所未有的经历（爬取魔方格作文）

Scrapy爬取全网小说到本地TXT，Python少年最爱的一个爬虫项目！

今日推荐

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

周排行

让自己的头脑极度开放

CentOS 6.5(x64) 和Redhat6.5操作系误删libc

高可用注册中心

【日记】12.28/【题解】AtCoder AGC041

XML（5）_XML 约束_DTD

Java集合Map（四）

树梅派安装桌面环境教程

pipenv 的使用和安装

小程序白屏问题和内存研究

C语言简单选择排序

每日归档

更多

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)