Python爬虫系列-Urllib库详解

Urllib库详解

Python内置的Http请求库:
* urllib.request 请求模块
* urllib.error 异常处理模块
* urllib.parse url解析模块
* urllib.robotparser robots.txt解析模块
#### 相比在python2基础上的变化

Python2

import urllib2
response = urllib2.urlopen('http://www.baidu.com')
<font color=blue>Python3</font>
import urllib.request
response = urllib.request.urlopen('http://www.baidu.com')

#### urlopen实现get方法

import urllib.request
response = urllib.request.urlopen('http://www.baidu.com')
print(response.read().decode('utf-8'))

#### urlopen实现post方法

import urllib.parse
import urllib.request
data = bytes(urllib.parse.urlencode({'word':'hello'}),encoding='utf-8')
response = urllib.request.urlopen('http://httpbin.org/post',data=data)

#### urlopen实现超时设置

import urllib.request
response = urllib.request.urlopen('http://httpbin.org/get',timeout=1)
print(response.read())

#### 将时间缩短,查看效果

import socket
import urllib.request
import urllib.error
     try:
         response = urllib.request.urlopen('http://httpbin.org/get',timeout=0.1)
     except urllib.error.URLError as e:
         if isinstance(e.reason,socket.timeout):
                 print('TIME OUT')

响应类型

import urllib.request
response = urllib.request.urlopen('https://www.python.org')
print(type(response))

<class 'http.client.HTTPResponse'>

状态码、响应头

import urllib.request
response = urllib.request.
response = urllib.request.urlopen('https://www.python.org')
response = urllib.request.urlopen('https://www.python.org')
print(response.status)
print(response.getheaders())
print(response.getheader('Server'))

200
[('Server', 'nginx'), ('Content-Type', 'text/html; charset=utf-8'), ('X-Frame-Options', 'SAMEORIGIN'), ('x-xss-protection', '1; mode=block'), ('X-Clacks-Overhead', 'GNU Terry Pratchett'), ('Via', '1.1 varnish'), ('Content-Length', '50069'), ('Accept-Ranges', 'bytes'), ('Date', 'Mon, 26 Nov 2018 10:16:51 GMT'), ('Via', '1.1 varnish'), ('Age', '1872'), ('Connection', 'close'), ('X-Served-By', 'cache-iad2144-IAD, cache-tyo19943-TYO'), ('X-Cache', 'HIT, HIT'), ('X-Cache-Hits', '2, 4331'), ('X-Timer', 'S1543227412.955266,VS0,VE0'), ('Vary', 'Cookie'),
('Strict-Transport-Security', 'max-age=63072000; includeSubDomains')]
nginx

猜你喜欢

转载自www.cnblogs.com/carious/p/10021970.html