Python——urllib.request模块的使用

官方文档：http://cn.python-requests.org/zh_CN/latest/

llib.request请求返回网页

ulbrquett最简单的应用就是urlie.requet.urlopen了，函数使用如下:
urllib. request.urlopen(ur1[，data[, timeout[, cafile[, capath[, cadefaultl,context] ] ] ] ]
按照官方文档，urllib.request.urlopen 可以打开HTTP、HTTPS、FTP协议的URL,主要应用于HTTP协议。
参数中以ca开头的都是跟身份验证有关的，不太常用。
data 参数是以post方式提交URL时使用的，通常使用得不多。
最常用的就只有URL和timeout参数了。
url参数是提交的网络地址(地址全称，前端需协议名，后端需端口，比如http://192.168.1.1:80)，timeout 是超时时间设置。

函数返回对象有3个额外的使用方法

geturl()函数返回 response 的url信息，常用于url重定向的情况。
info()函数返回response 的基本信息。
getcode()函数返回response 的状态代码，最常见的代码是200服务器成功返回网页，404请求的网页不存在，503服务器暂时不可用。

[实例解析]

import urllib.request

__author__ = 'ling.'

def linkBaidu():

	#网址
    url = 'http://www.baidu.com'
    
    try:
    	#通过urllib发起的请求，会有一个默认的header：Python-urllib/version，指明请求是由urllib发出的，所以遇到一些验证user-agent的网站时，我们需要伪造我们的headers
　　#伪造headers，需要用到urllib.request.Request对象
    	req = urllib.request.Request(url, headers=headers)
    	
    	#向指定的url发送请求，并返回服务器响应的类文件对象; 设置timeout参数，如果请求超出我们设置的timeout时间，会跑出timeout error 异常。
		response = urllib.request.urlopen(req,timeout=3)
		
        # response.read() 接收json数据; decode 解码方式为utf-8
        result = response.read().decode('utf-8')
    except Exception as e:
        print("网络地址错误")
        exit()
    with open('baidu.txt', 'w') as fp:
        fp.write(result)
    print("获取url信息 : response.geturl() : %s" %response.geturl())
    print("获取返回代码 : response.getcode() : %s" %response.getcode())
    print("获取返回信息 : response.info() : %s" %response.info())
    print("获取的网页内容已存入当前目录的baidu.txt中，请自行查看")

纯代码

import urllib.request

__author__ = 'ling.'

def linkBaidu():
    url = 'http://www.baidu.com'
    try:
    	req = urllib.request.Request(url, headers=headers)
		response = urllib.request.urlopen(req,timeout=3)
        result = response.read().decode('utf-8')
    except Exception as e:
        print("网络地址错误")
        exit()
    with open('baidu.txt', 'w') as fp:
        fp.write(result)
    print("获取url信息 : response.geturl() : %s" %response.geturl())
    print("获取返回代码 : response.getcode() : %s" %response.getcode())
    print("获取返回信息 : response.info() : %s" %response.info())
    print("获取的网页内容已存入当前目录的baidu.txt中，请自行查看")

ゾ玖月风凌

发布了6 篇原创文章 · 获赞 1 · 访问量 120

私信关注

Python——urllib.request模块的使用

猜你喜欢