On a reptile of self-cultivation 2: practical exercises + review

## On a reptile Prepares 2
       (Two examples of this are the actual conduct, the first example in Python download a cat, a second example is our use Python to simulate browser through online text translation Youdao translation)

## Using Python download a cat
       http://placekitten.com/ , this site is a site tailored for the cat farmers, after the site you only need to add / width / height, you can get a corresponding width and height of cat pictures. These pictures are JPG format, you can simply save it by right to where you want to save.
Here Insert Picture Description
       Plus / width of the rear connection point into / height
Here Insert Picture Description
       downloaded to your desktop
Here Insert Picture Description
       (below is to be used to achieve the operation just loaded Python cat picture from that site, the following code :)
Here Insert Picture Description
Here Insert Picture Description
       (parsing code: First of all, first introduced at the request urllib package module, then call his urlopen function, url parameter can be a string, it can also be a Request object, in fact, in the above program, the address string is passed, he will automatically convert a string address Request object and then pass the object urlopen (), the implementation of that one line will get the contents of the entire page, but the objects need to use read () function to read, returns a binary string of the form, where making when file operations, the first parameter is the file name, which is the name of the picture, because cat_img second parameter is a binary form of the writing also written to the file in binary mode, and then run after success, you will be in this .py source folder, this is a good program to download the current folder just that picture)
       (just mentioned url parameter can be String may be a Request object, if the incoming address of the string, he will automatically be converted to you in the incoming request object, the above code is equivalent to :)
Here Insert Picture Description
       (Further, urlopen () response returned by the function is actually an object (object), the object is a class file, that file with the objects are very similar, so you can read the contents of a read () method)
Here Insert Picture Description
       (From a document perspective, in addition to use than read () method can also be used geturl (), info () and getcode () method, as follows :) just run the program after the next (mainly
Here Insert Picture Description
       

## use Python to simulate browser translated text via online proper way translation
       (as are operating in the browser)
Here Insert Picture Description
       (press F12 to open the Inspect Element feature, and then click Network, when you click translation, the following will show a lot to request sent by the server, analyze the content of Headers)
Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description
           request the URL of : HTTP: //fanyi.youdao.com/translate_o smartresult = = dict & smartresult rule:?
           (One would think urlopen () function to open should be http: // fanyi .youdao.com / this address, in fact it is not, in fact, is embedded inside the front of this address translation mechanism is realized in this. this address is inside can be achieved translated)

           Request Method, : POST
           (Post method request is in the form of .HTTP agreement has 7 request method, commonly used two kinds: get and post, or want to learn more go to their differences -> HTTP request message )

           Status Code:200 OK
           (状态码 200 表示正常响应。如果是 404 就是页面资源找不到,可能是url打错了,也可能是没有这个资源。想了解更多状态码可以到—>HTTP响应消息,或者可以直接去百度状态码,会有各种状态码对应的不同情况)

           Remote Address:103.72.47.248:80
           (这个是服务器端ip地址,加上他打开的端口号)

           Resquest Headers:
           (Resquest Headers:是客服端 (现在这里是浏览器,用 Python代码的时候就是我们的代码)发送请求的请求头,这个常常服务端来判断是否非人类访问,这么说,有些人是比较坏的,写一个 Python 代码,然后用这个代码批量的,不断地访问网站的数据,这样子,服务器的压力就很大了,所以呢,服务器一般是不欢迎非人类的访问的。
           一般我们就是使用Resquest Headers里面的User-Agent这个请求头来识别是浏览器访问还是代码访问,可以看到其实这里显示的就是浏览器告诉服务器,我访问你使用的浏览器版本信息,如果使用Python 访问的话,这个User-Agent默认就是 Python URL 3.8,这样用防火墙一识别就知道这个访问其实是来自代码,而不是来自浏览器,就有可能把你屏蔽掉,但是这个User-Agent是可以进行自定义的,下一篇会介绍如何隐藏,
           然后想要了解除了User-Agent这个请求头的其他常用请用头,请call—>HTTP请求头)

           Form Data:
           (表单数据,其实就是我们这个Post提交的主要内容,在 i 这里看到了提交的待翻译的内容。那么用python如何提交post表单呢,那就去看看文档)
Here Insert Picture Description

Here Insert Picture Description
           (data参数必须是基于application/x-www-form-urlencoded的格式,然后我们可以使用urllib.parse.urlencode()函数将字符串转换为所需要的形式。)

import urllib.request
import urllib.parse

# 从Request URL:拷贝过来。把_o删了
url = "http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule"
# data就是表单数据,把Form Data 中的内容拷贝过来
data = {}
data['i'] = 'hello world'
data['from'] = 'AUTO'
data['to'] = 'AUTO'
data['smartresult'] = 'dict'
data['client'] = 'fanyideskweb'
data['salt'] = '15803439446390'
data['sign'] = '8e349204c5d1140741ffe43284595085'
data['ts'] = '1580343944639'
data['bv'] = 'bbb3ed55971873051bc2ff740579bb49'
data['doctype'] = 'json'
data['version'] = '2.1'
data['keyfrom'] = 'fanyi.web'
data['action'] = 'FY_BY_CLICKBUTTION'

#使用urllib.parse.urlencode()函数将字符串转换为所需要的形式
#把Unicode的文件格式转换为uf-8的编码形式
data = urllib.parse.urlencode(data).encode('utf-8')
response = urllib.request.urlopen(url,data)
#解码的时候也要用uf-8来解码
html = response.read().decode("utf-8")
print(html)

Here Insert Picture Description
           (结果倒是可以了,只是这样的结果是给程序员看的,如果是给用户看,用户可能会看不动,还以为是报错了,另外,如果大家对于编码还有什么困惑的,可以查看:Python编码问题的解决方案总结,可以看到,到事实上打印出来的是转码后字符串,可以用字符串查找的方式把结果查找出来,打印出来,但是这样太被动了

           其实,这是一个 json 结构,数据结构也是键值对的形式, 是一种轻量级的数据交换结构,说白了,这里就是用字符串的形式把 Python 的输出结果给封装起来,这个字符串里面包含的事实是一个字典,字典里边"translateResult" 里面的值是一个列表的列表的字典,事实上他就是一个json结构,字符串里边包含的就是python里边可以识别的正常结构,只需要把这些字符串去掉就可以了,如下:)
Here Insert Picture Description
           (解析:首先导入json,用json的loads方法把刚刚那个字符串给载入,可以看到得到的确实就是一个字典,那就好办了,要访问字典里边的键,就用他的关键词来访问,因为有多层,就一层一层地访问就好了,最后下面在美化一下就好了,后边也可以结合easyGui做出图形界面,以后就可以用自己的来翻译了)

import urllib.request
import urllib.parse
import json

content = input("请输入需要翻译的内容:")


# 从Request URL:拷贝过来。把_o删了
url = "http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule"
# data就是表单数据,把Form Data 中的内容拷贝过来
data = {}
data['i'] = content
data['from'] = 'AUTO'
data['to'] = 'AUTO'
data['smartresult'] = 'dict'
data['client'] = 'fanyideskweb'
data['salt'] = '15803439446390'
data['sign'] = '8e349204c5d1140741ffe43284595085'
data['ts'] = '1580343944639'
data['bv'] = 'bbb3ed55971873051bc2ff740579bb49'
data['doctype'] = 'json'
data['version'] = '2.1'
data['keyfrom'] = 'fanyi.web'
data['action'] = 'FY_BY_CLICKBUTTION'

#使用urllib.parse.urlencode()函数将字符串转换为所需要的形式
#把Unicode的文件格式转换为uf-8的编码形式
data = urllib.parse.urlencode(data).encode('utf-8')
response = urllib.request.urlopen(url,data)
#解码的时候也要用uf-8来解码
html = response.read().decode("utf-8")
target = json.loads(html)
print("翻译结果:%s"%(target["translateResult"][0][0]["tgt"]))

Here Insert Picture Description
           (这样的代码还不能应用到我们的生产实践中,因为你这样搞多了,服务器就会发现非人类的 User Agent 频繁访问,就会把你屏蔽掉了。还有就是发现这个IP怎么访问的这么频繁,就把你拉黑了。其实这些问题,Python都是有解决方法的,下一章会提到。)

##温故知新之习题

            0. urlopen() 方法的 timeout 参数用于设置什么?
           答:timeout 参数用于设置连接的超时时间,单位是秒。

            1. 如何从 urlopen() 返回的对象中获取 HTTP 状态码?
           答:


response = urllib.request.urlopen(url)
code = response.getcode()

            2. 在客户端和服务器之间进行请求-响应时,最常用的是哪两种方法?
           答:GET 和 POST。

            3. HTTP 是基于请求-响应的模式,那是客户端发出请求,服务端做出响应;还是服务端发出请求,客户端做出响应呢?
           答:发出请求的永远是客户端,做出响应的永远是服务端。

            4. User-Agent 属性通常是记录什么信息?
           答:普通浏览器会通过该内容向访问网站提供你所使用的浏览器类型、操作系统、浏览器内核等信息的标识。

            5. 如何通过 urlopen() 使用 POST 方法向服务端发出请求?
           答:urlopen 函数有一个 data 参数,如果给这个参数赋值,那么 HTTP 的请求就是使用 POST 方式;如果 data 的值是 None,也就是默认值,那么 HTTP 的请求就是使用 GET 方式。

            6. 使用字符串的什么方法将其它编码转换为 Unicode 编码?
           答:decode。decode 的作用是将其他编码的字符串转换成 unicode 编码,相反,encode 的作用是将 unicode 编码转换成其他编码的字符串。

           7. JSON 是什么鬼?
           答:JSON 是一种轻量级的数据交换格式,说白了这里就是用字符串把 Python 的数据结构封装起来,便与存储和使用。

##动动手

            0. 配合 EasyGui,给“下载一只猫“的代码增加互动:
               ※让用户输入尺寸;
               ※如果用户不输入尺寸,那么按默认宽400,高600下载喵;
               ※让用户指定保存位置。
           程序实现如下图:
Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description
           答:

import easygui as g
import urllib.request

def main():
    msg = "请填写喵的尺寸"
    title = "下载一只喵"
    fieldNames = ["宽:","高:"]
    fieldValues = []
    size = width,height = 400,600
    fieldValues = g.multenterbox(msg,title,fieldNames,size)

    while 1:
        if fieldValues == None:
            break
        errmsg = ""

        try:
            width = int(fieldValues[0].strip())
        except:
            errmsg +="宽必须为整数!"

        try:
            height = int(fieldValues[1].strip())
        except:
            errmsg += "高度必须为整数!"    
 
        if errmsg == "":
            break

        fieldValues = g.multenterbox(errmsg, title, fieldNames, fieldValues)

    url = "http://placekitten.com/g/%d/%d" % (width, height)

    response = urllib.request.urlopen(url)
    cat_img = response.read()

    filepath = g.diropenbox("请选择存放喵的文件夹")

    if filepath:
        filename = '%s/cat_%d_%d.jpg' % (filepath, width, height)
    else:
        filename = 'cat_%d_%d.jpg' % (width, height)

    with open(filename, 'wb') as f:
        f.write(cat_img)


if __name__ == "__main__":
    main()













        

            1. 写一个登录豆瓣的客户端。
           这道题可能要难为大家了,因为需要 N 多你没学过的知识!

           不过我也不打算让你断送希望,下边是一个可行的 Python 2 的代码片段,请修改为 Python 3 版本。其中一些库和知识点你可能还没学过,但凭借着过人的自学能力,你可以在不看答案的情况下完成任务的,对吗?

           程序实现如下图:
Here Insert Picture Description

           Python2 实现的代码:

# -- coding:gbk --
import re
import urllib, urllib2, cookielib
 
loginurl = 'https://www.douban.com/accounts/login'
cookie = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))
 
params = {
"form_email":"your email",
"form_password":"your password",
"source":"index_nav" #没有的话登录不成功
}
 
#从首页提交登录
response=opener.open(loginurl, urllib.urlencode(params))
 
#验证成功跳转至登录页
if response.geturl() == "https://www.douban.com/accounts/login":
    html=response.read()
 
    #验证码图片地址
    imgurl=re.search('<img id="captcha_image" src="(.+?)" alt="captcha" class="captcha_image"/>', html)
    if imgurl:
        url=imgurl.group(1)
        #将图片保存至同目录下
        res=urllib.urlretrieve(url, 'v.jpg')
        #获取captcha-id参数
        captcha=re.search('<input type="hidden" name="captcha-id" value="(.+?)"/>' ,html)
        if captcha:
            vcode=raw_input('请输入图片上的验证码:')
            params["captcha-solution"] = vcode
            params["captcha-id"] = captcha.group(1)
            params["user_login"] = "登录"
            #提交验证码验证
            response=opener.open(loginurl, urllib.urlencode(params))
            ''' 登录成功跳转至首页 '''
            if response.geturl() == "http://www.douban.com/":
                print 'login success ! '

           答:Python 3 对比 Python 2 有不少的改变。

           在本题中:
              ※urllib 和 urllib2 合并,大多数功能放入了 urllib.request 模块;
              ※原来的 urllib.urlencode() 变为 urllib.parse.urlencode().encode(),由于编码的关系,你还需要在后边加上 encode(‘utf-8’);
              ※cookielib 被改名为 http.cookiejar;

           课堂中我们还没讲,所以这里借机会给大家简单科普一下 cookie 是什么东西:
我们说 HTTP 协议是基于请求响应模式,就是客户端发一个请求,服务端回复一个响应酱紫……

           但 HTTP 协议是无状态的,也就是说客户端这会儿给服务端提交了账号密码,服务端回复验证通过,但下一秒客户端说我要访问 XXOO 资源,服务端回复:“啊??你是谁?!”

           为了解决这个尴尬的困境,有人就发明出了 cookie。cookie 相当于服务端(网站)用于验证你的身份的密文。于是客户端每次提交请求的时候,服务端通过验证 cookie 即可知道你的身份信息。想要了解更多cook可以到—>会话技术概述_Cookie,那么正如你所猜测的,CookieJar 是 Python 用于存放 cookie 的对象。

           Of course, you have to provide the code for Python 2, you do not know on top of these, do not affect the completion of the job.

import re
import urllib.request
from http.cookiejar import CookieJar
 
# 豆瓣的登录url 
loginurl = 'https://www.douban.com/accounts/login'
cookie = CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor)
 
data = {
    "form_email":"your email",
    "form_password":"your password",
    "source":"index_nav"
}
data = {}
data['form_email'] = '你的账号'
data['form_password'] = '你的密码'
data['source'] = 'index_nav'
 
response = opener.open(loginurl, urllib.parse.urlencode(data).encode('utf-8'))
 
#验证成功跳转至登录页
if response.geturl() == "https://www.douban.com/accounts/login":
    html = response.read().decode()
    
    #验证码图片地址
    imgurl = re.search('<img id="captcha_image" src="(.+?)" alt="captcha" class="captcha_image"/>', html)
    if imgurl:
        url = imgurl.group(1)
        # 将验证码图片保存至同目录下
        res = urllib.request.urlretrieve(url, 'v.jpg')
 
        # 获取captcha-id参数
        captcha = re.search('<input type="hidden" name="captcha-id" value="(.+?)"/>' ,html)
 
        if captcha:
            vcode = input('请输入图片上的验证码:')
            data["captcha-solution"] = vcode
            data["captcha-id"] = captcha.group(1)
            data["user_login"] = "登录"
 
            # 提交验证码验证
            response = opener.open(loginurl, urllib.parse.urlencode(data).encode('utf-8'))
 
            # 登录成功跳转至首页 '''
            if response.geturl() == "http://www.douban.com/":
                print('登录成功!')

Published 247 original articles · won praise 116 · views 280 000 +

Guess you like

Origin blog.csdn.net/w15977858408/article/details/104110316