Requests库简单介绍

爬虫三大库——Requests、Lxml、BeautifulSoup
Requests库的官方文档指出：让HTTP服务于人类。Requests库的作用就是请求网站获取网页数据的。

（一）打印网页源代码

#获得网页的源代码
import requests
res = requests.get('http://bj.xiaozhu.com/')#网站为小猪短租北京地区网址
print(res)
#PyCharm中返回结果为<Response [200]>,说明请求网址成功，若为404,400则请求网址失败
print(res.text)

运行结果（部分）

打开Chrome浏览器，进入该网页：http://bj.xiaozhu.com/，查看源代码，可以发现程序返回的全代码就是该网页的源代码

（二）请求头的使用

有时爬虫需要加入请求头来伪装成浏览器，以便更好地抓取数据。

获取请求头操作链接：https://blog.csdn.net/weixin_42479293/article/details/89285821

（1）请求头的使用方法

import requests
headers = {'User-Agent':' Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36(KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36'}
res = requests.get('http://bj.xiaozhu.com/',headers = headers)#get()方法加入请求头
#异常requests.exceptions.InvalidHeader: Invalid return character or leading space in header: User-Agent
#原因：粘贴的请求头中有空格
print(res.text)

注：User-Agent信息中的空格

Requests库简单介绍

爬虫三大库——Requests、Lxml、BeautifulSoup Requests库的官方文档指出：让HTTP服务于人类。Requests库的作用就是请求网站获取网页数据的。

（一）打印网页源代码

（二）请求头的使用

猜你喜欢

爬虫三大库——Requests、Lxml、BeautifulSoup
Requests库的官方文档指出：让HTTP服务于人类。Requests库的作用就是请求网站获取网页数据的。