前几天刚刚学习了urllib和json库
urllib 库主要是查看一个网站网页的源码。多于正则,bs配合编写爬虫。
它还有一个重要的功能是,能向网站提交get post的请求
还有附带的几个模块
urllib.request 请求模块
urllib.error 异常处理模块
urllib.parse url解析模块
urllib.robotparser robots.txt解析模块
urlopen:response = urllib.request.urlopen('http://httpbin.org/get', timeout=0.1)
request:request = urllib.request.Request('https://python.org')
response = urllib.request.urlopen(request) 等同上一条
add_headers:添加头信息
request = urllib2.Request(url)
request.add_header('User-Agent', 'fake-client')
response = urllib2.urlopen(request)
ProxyHandler:设置代理
import urllib.request
proxy_handler = urllib.request.ProxyHandler({
'http': 'http://127.0.0.1:9743',
'https': 'https://127.0.0.1:9743'
})
opener = urllib.request.build_opener(proxy_handler)
response = opener.open('http://httpbin.org/get')
print(response.read())
HTTPCookiProcessor:添加cookie
import http.cookiejar, urllib.request
cookie = http.cookiejar.CookieJar()
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
response = opener.open('http://www.baidu.com')
for item in cookie:
print(item.name+"="+item.value)
json库:主要编写字符串
#!/usr/bin/python
import json
data = [ { 'a' : 1, 'b' : 2, 'c' : 3, 'd' : 4, 'e' : 5 } ]
json = json.dumps(data)
print json
准备以此编写一个小东西
import urllib.request
import json
import urllib.parse
#Request URL
import io
import sys
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf-8')
while 1:
i = input("请输入要翻译的文字(---q---退出):")
if i=='q':
print("退出")
break
url = "http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule"
data = {}
header = {}
header["User-Agent"] = "Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0"
data['action'] = 'FY_BY_CLICKBUTTION'
data['client'] = 'fanyideskweb'
data['doctype'] = 'json'
data['from'] = 'AUTO'
data['i'] = i
data['keyfrom'] = 'fanyi.web'
data['salt'] = '1531752128194'
data['sign'] = '88c77b4bcd6541ac488740afd5919019'
data['smartresult'] = 'dict'
data['to'] = 'AUTO'
data['typoResult'] = 'false'
data['version'] = '2.1'
#转码,data参数如果要传必须传bytes(字节流)类型的,如果是一个字典,先用urllib.parse.urlencode()编码。
data = urllib.parse.urlencode(data).encode("utf-8")
#打开链接
req = urllib.request.Request(url,data,header) #Request设置,发送数据和header
response = urllib.request.urlopen(req)
# response = urllib.request.urlopen(url,data,head)
#转为Unicode
html = response.read().decode("utf-8") #输出为json格式
#json文件读取
target = json.loads(html)
#最终字典列表输出
print(target["translateResult"][0][0]["tgt"])
以上脚本cmd运行就可以得到一个在线的有道翻译的接口。
小白。。。。。大佬勿喷!