python 爬取<a>标签内href的方法及遇到的问题 - 代码天地

python 爬取<a>标签内href的方法及遇到的问题

其他 2019-05-06 20:31:22 阅读次数: 0

原博客地址：

https://www.cnblogs.com/dengyg200891/p/6060010.html

 1 # -*- coding:utf-8 -*-
 2 #python 2.7
 3 #XiaoDeng
 4 #http://tieba.baidu.com/p/2460150866
 5 #标签操作
 6 
 7 
 8 from bs4 import BeautifulSoup
 9 import urllib.request
10 import re
11 
12 
13 #如果是网址，可以用这个办法来读取网页
14 #html_doc = "http://tieba.baidu.com/p/2460150866"
15 #req = urllib.request.Request(html_doc)  
16 #webpage = urllib.request.urlopen(req)  
17 #html = webpage.read()
18 
19 
20 
21 html="""
22 <html><head><title>The Dormouse's story</title></head>
23 <body>
24 <p class="title" name="dromouse"><b>The Dormouse's story</b></p>
25 <p class="story">Once upon a time there were three little sisters; and their names were
26 <a href="http://example.com/elsie" class="sister" id="xiaodeng"><!-- Elsie --></a>,
27 <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
28 <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
29 <a href="http://example.com/lacie" class="sister" id="xiaodeng">Lacie</a>
30 and they lived at the bottom of a well.</p>
31 <p class="story">...</p>
32 """
33 soup = BeautifulSoup(html, 'html.parser')   #文档对象
34 
35 
36 #查找a标签,只会查找出一个a标签
37 #print(soup.a)#<a class="sister" href="http://example.com/elsie" id="xiaodeng"><!-- Elsie --></a>
38 
39 for k in soup.find_all('a'):
40     print(k)
41     print(k['class'])#查a标签的class属性
42     print(k['id'])#查a标签的id值
43     print(k['href'])#查a标签的href值
44     print(k.string)#查a标签的string
45     #tag.get('calss')，也可以达到这个效果

在使用该方法的k['href']读取网页链接时，编译器报错：

 KeyError: 'href'

修改为：

 k.get('href')

成功运行，取出href中的链接。

猜你喜欢

转载自www.cnblogs.com/zhouya1/p/10821779.html

python 爬取<a>标签内href的方法及遇到的问题

python爬取网页遇到521的处理方法

python 使用 scrapy 爬取数据遇到问题整理

python根据标签爬取网页信息

python 爬取指定标签的class

python+selenium+requests爬取qq空间相册时遇到的问题及解决思路

python使用scrapy爬取数据并保存到mysql以及遇到的一些问题

python爬虫爬取天气数据并图形化显示以及遇到的问题分析解决

【Python】【爬虫】爬取小说5000章，遇到的爬虫问题与解决思路

Python爬取网站上的内链和外链

python爬取泰迪内推平台数据

python获取网页page数，同时按照href批量爬取网页（requests+BeautifulSoup）

python selenium爬取QQ空间方法

python爬取网页数据方法

Python爬取网页遇到乱码怎么办？

python爬取NIPS论文信息，以及遇到的疑难总结

python爬取华为应用商城app的标签信息

python爬取网页

Python爬取TripAdvisor

Python爬取淘宝

Python爬取小说

python 爬取小说

python 爬取可用

python 爬取，selenium

Python爬取大乐透

Python 爬取拉钩

Python 爬取猫眼

Python 爬取豆瓣

Python 爬取煎蛋

Python爬取图片

今日推荐

技术解析 GPT-4o：即时语音交互的突破与 GenAI 发展策略

开源大模型与闭源大模型

微信小程序授权登录获取用户的openid

亿级流量系统架构设计与实战

人工智能时代的程序设计教学与课程设计

纽交所技术问题致伯克希尔 (BRK.A) 显示跌近 100%

探索 api.maynor1024.live：一站式 AI 服务平台

AI一键去衣技术：窥见深度学习在图像处理领域的革命(最后有彩蛋)

艾体宝案例 | 使用Redis和Spring Ai构建rag应用程序

Apple M1 vs 高通8Gen2 vs Apple A12Z各方面比较

【升职加薪必备架构图】Springboot学习路线汇总_springboot四层架构流程图

与Apollo共创生态：Apollo7周年大会自动驾驶生态利剑出鞘

周排行

timesten性能问题分析

hdu1017A Mathematical Curiosity

利用FragmentTabHost和ViewPager来实现可滑动切换的页面

哪里找卖百度云资源

大数据技能图谱

PHP设计模式（5）—— 观察者模式

python list删除元素是要注意的坑点

TPM简介

并查集擒贼先擒王//解密犯罪团伙

码农也要修身

每日归档

更多

2024-06-04(10)

2024-06-03(52)

2024-06-02(4)

2024-06-01(60)

2024-05-31(47)

2024-05-30(4)

2024-05-29(65)

2024-05-28(2)

2024-05-27(56)

2024-05-26(6)