假期学习【七】首都之窗信件爬取(单一页面)

今天主要又继续了Python网络爬虫的学习,并完成了首都之窗百姓信件单一页面的爬取,明天打算完成整个爬虫任务。

源代码如下:

import requests
from bs4 import BeautifulSoup

kv = {'user-agent': 'Mozilla/5.0'}
id="AH20020400088"
url="http://www.beijing.gov.cn/hudong/hdjl/com.web.consult.consultDetail.flow?originalId="+id

def parser(url):
    try:
        r = requests.get(url, headers=kv)
        print(r.status_code)
        demo = r.text
        soup = BeautifulSoup(demo, "html.parser")
        print(soup.prettify())
        print("标题:", soup.find("strong").get_text())
        print("来信人:",soup.find_all("div", {"class": "col-xs-10 col-lg-3 col-sm-3 col-md-4 text-muted"})[0].get_text().lstrip('来信人:').lstrip().rstrip())
        print("时间:",soup.find_all("div", {"class": "col-xs-5 col-lg-3 col-sm-3 col-md-3 text-muted"})[0].get_text().lstrip('时间:'))
        print("网友同问:", soup.find_all("div", {"class": "col-xs-4 col-lg-3 col-sm-3 col-md-3 text-muted"})[0].get_text().lstrip().rstrip().lstrip("网友同问:").lstrip().rstrip())
        print("问题:", soup.find_all("div", {"class": "col-xs-12 col-md-12 column p-2 text-muted mx-2"})[0].get_text().lstrip().rstrip())
        print("官方:", soup.find_all("div", {"class": "col-xs-9 col-sm-7 col-md-5 o-font4 my-2"})[0].get_text())
        print("回答时间:",soup.find_all("div", {"class": "col-xs-12 col-sm-3 col-md-3 my-2"})[0].get_text().lstrip('答复时间:'))
        print("回答:", soup.find_all("div", {"class": "col-xs-12 col-md-12 column p-4 text-muted my-3"})[0].get_text().lstrip().rstrip())
    except:
        print("爬取失败!")

if __name__=="__main__":
    parser(url)
View Code

运行结果:

猜你喜欢

转载自www.cnblogs.com/zlc364624/p/12264011.html
今日推荐