网络请求requess库的基本使用 ----------python的爬虫学习

安装requests库：

使用cmd命令行输入 pip install requests

在安装python时没有勾选add python to path的，会导致安装requests报错，我们可以在 windows的环境变量处新建个path，path处添加pip的位置。

可以参考文章： https://blog.csdn.net/bimo123/article/details/89295896

request发送请求

url="http://www.baidu.com" 
import requests 
rep=requests.get(url) 
rep.text
'<!DOCTYPE html>\r\n<!‐‐STATUS OK‐‐><html> <head><meta http‐ equiv=content‐type content=text/html;charset=utf‐8><meta http‐equiv=X‐UA‐Co mpatible content=IE=Edge><meta content=always name=referrer><link rel=style sheet type=text/css href=http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css><title>ç\x99¾åº ¦ä¸\x80ä¸\x8bï¼\x8cä½\xa0å°±ç\x9f¥é\x81\x93</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true sr c=//www.baidu.com/img/bd_logo1.png wid#....此处省略

在requests中发生什么方式的请求就使用 reuquest.方式方式可以是get/post/put.....

在requests库中对返回数据的操作

读取返回数据：text属性、content属性。其中text属性读取是以猜的方式读取字符串，因此可能出现没有解码的数据，我们可以用content属性设置解码的方式。

content返回数据类型为bytes，其中我们使用decode进行解码，解码方式由网页的编码方式而定，编码的方式一般在网页meta标签的charset属性处

例子：

url="http://www.baidu.com" 
rep=requests.get(url) 

#使用text获取数据 
rep.text 
...前面省略<title>ç\x99¾åº¦ä¸\x80ä¸\x8bï¼\x8cä½\xa0å°±ç\x9f¥é\x81\x93</tit le></head> <body link=#0000cc> 

 #使用content获取数据并进行解码
rep.content.decode('utf‐8') 
 ...前面省略<title>百度一下，你就知道</title></head> <body link=#0000cc> <d iv id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <d iv class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.co m/img/bd_logo1.png width=270 height=129> </div>

查看编码方式：encoding属性

url="http://www.baidu.com" 
rep=requests.get(url) 
rep.encoding 
'ISO‐8859‐1'

对于ISO-8859-1的解码方式我们应该使用utf-8

查看响应状态码：status_code属性

url="http://www.baidu.com
rep=requests.get(url) 
rep.status_code
200

网络请求requess库的基本使用 ----------python的爬虫学习

猜你喜欢