通过使用Python的Requests和BeautifulSoup库,编写爬虫程序来抓取古诗词并将其保存在文本文件中

以下是一个Python爬虫程序,从古诗词网爬取三首经典古诗,并将其写入桌面的文本文件中。这个程序使用Requests和BeautifulSoup库来实现:

# 导入所需的库
import requests
from bs4 import BeautifulSoup
import os

# 确定爬虫目标URL
url = 'https://www.gushiwen.org/'

# 向目标URL发送GET请求
response = requests.get(url)

# 解析HTML代码
soup = BeautifulSoup(response.content, 'html.parser')

# 通过CSS选择器获取古诗列表
poem_list = soup.select('.main3 .left .sons .cont a')

# 获取前三首古诗的标题和内容
poem_titles = []
poem_contents = []

for i in range(3):
    # 获取古诗的标题
    poem_title = poem_list[i].text.strip()
    poem_titles.append(poem_title)
    
    # 获取古诗的URL
    poem_url = url + poem_list[i].get('href')
    
    # 向古诗的URL发送GET请求
    poem_response = requests.get(poem_url)
    
    # 解析HTML代码
    poem_soup = BeautifulSoup(poem_response.content, 'html.parser')
    
    # 获取古诗的内容
    poem_content = poem_soup.select('.main3 .left .sons .contson')[0].text
    
    poem_contents.append(poem_content.strip())

# 将三首古诗写入文本文件
desktop_path = os.path.expanduser("~") + '/Desktop/'
file_path = desktop_path + 'poems.txt'

with open(file_path, 'w', encoding='utf-8') as f:
    for i in range(3):
        f.write(poem_titles[i] + '\n\n')
        f.write(poem_contents[i] + '\n\n\n')

这段代码会首先向古诗词网发送一个GET请求,然后使用BeautifulSoup库解析返回的HTML代码。接着,它通过CSS选择器获取古诗列表,并获取前三首古诗的标题和内容。最后,它将这三首古诗写入文本文件并保存到桌面上。

猜你喜欢

转载自blog.csdn.net/ximu__l/article/details/131696952