问题:
UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. The code that caused this warning is on line 10 of the file D:\python_work\test\test.py. To get rid of this warning, pass the additional argument 'features="html.parser"' to the BeautifulSoup constructor. noStarchSoup = bs4.BeautifulSoup(res.text)
解决方法:
noStarchSoup = bs4.BeautifulSoup(res.text,features='html.parser')
《CSS选择器的例子》,select()方法将返回一个Tag对象的列表
传递给select()方法的选择器 | 将匹配... |
soup.select('div') | 所有名为<div>的元素 |
soup.select('#author') | 带有id属性为author的元素 |
soup.select('.notice') | 所有使用CSS class属性名为notice的元素 |
soup.select('div span') | 所有在<div>元素之内的<span>元素 |
soup.select('div >span') | 所有直接在<div>元素之内的<span>元素,中间没有其他元素 |
soup.select('input[name]') | 所有名为<input>,并有一个name属性,其值无所谓的元素 |
soup.select('input[type="button"]') | 所有名为<input>,并有一个type属性,其值为button的元素 |
文件:example.html
<!-- This is the example.html example file. --> <html><head><title>The Website Title</title></head> <body> <p>Download my <strong>Python</strong> book from <a href='http://inventwithpython.com'>learn Python the easy way!</a>.</p> <p>By <span id='author'>Al Sweigart</span></p> </body> </html>
# -*-coding:utf-8-*- import requests ,bs4 #在firefox浏览器中,使用Ctrl+Shift+C调用开发者工具,来查看网页源代码 examFile = open('example.html') exampleSoup = bs4.BeautifulSoup(examFile.read(),features="html.parser") elems = exampleSoup.select('#author') print(type(elems)) print(len(elems)) print(type(elems[0])) print(elems[0].getText()) print(elems[0].attrs) #2.用select()方法巡检元素 #3.通过元素的属性获取数据
输出
<class 'list'> 1 <class 'bs4.element.Tag'> Al Sweigart {'id': 'author'}
# -*-coding:utf-8-*- import requests ,bs4 #通过元素的属性获取数据 examFile = open('example.html') exampleSoup = bs4.BeautifulSoup(examFile.read(),features="html.parser") spanElem = exampleSoup.select('span')[0] print(str(spanElem)) print(spanElem.get('id')) print(spanElem.get('some_nonexistent_addr') == None) print(spanElem.attrs)
输出
<span id="author">Al Sweigart</span> author True {'id': 'author'}