python3爬虫-urllib+BeautifulSoup

版权声明:如有侵权,请联系,如有错误,望指正,欢迎转载 https://blog.csdn.net/qq_29630271/article/details/79265797

urllib

  • 在Python2版本中,有urllib和urlib2两个库可以用来实现request的发送。而在Python3中,已经不存在urllib2这个库了,统一为urllib。Python3 urllib库包括了四个模块。
  • urllib.request for opening and reading URLs
  • urllib.error containing the exceptions raised by urllib.request
  • urllib.parse for parsing URLs
  • urllib.robotparser for parsing robots.txt files
import urllib.request
from bs4 import BeautifulSoup

response = urllib.request.urlopen("http://www.biqukan.com/1_1094/")
html = response.read().decode("gbk")
div_bf = BeautifulSoup(html)
div = div_bf.find_all('div', class_ = 'listmain')
a_bf = BeautifulSoup(str(div[0]))
a = a_bf.find_all('a')
for each in a:
    print(each.string, each.get('href'))

猜你喜欢

转载自blog.csdn.net/qq_29630271/article/details/79265797