前言

在这里我就不再一一介绍每个步骤的具体操作了，因为在上一次爬取今日头条数据的时候都已经讲的非常清楚了，所以在这里我只会在重点上讲述这个是这么实现的，如果想要看具体步骤请先去看我今日头条的文章内容，里面有非常详细的介绍以及是怎么找到加密js代码和api接口。

Python3爬取今日头条文章视频数据，完美解决as、cp、_signature的加密方法

QQ群聊

855262907

分析迅捷语音转文字网站

语音转文字整个过程：

1.登陆账号（因为非VIP只能2分钟，所以我借了一个有VIP手机号过来，但是测试的图片中还是我自己的手机号）
2.分片上传音频文件（为啥是分片上传音频呢，后面有讲解）
3.音频转文字（到这就结束了）

登陆账号

当我们输入手机号码后，点击发送，他会进行POST请求，这个时候我们看到他的Form Data中有很多参数，我们一一来逆向。

在这里插入图片描述

我们开始搜索关键参数phone能够发现发送短信的代码就在这里面，那么就简单了。

在这里插入图片描述
废话不多说，直接开始打断点，看看他是怎么构造的。

在这里插入图片描述
我们可以发现data的参数中只有uuid是由Uuid.get()构造出来的，其他参数一眼就能看出来了，所以我就不多说了，然后data最终要进行basicParams转换后才进行POST请求，所以一步一步来看。

在这里插入图片描述

解决uuid和basicParams

通过调试发现uuid是由Uuid.get()构造，直接跳到这个函数来，发现有用的部分就是create函数，get函数只是用来判断uuid是否存在于localstorage中，如果存在就直接取出来用，如果不在就create创建一个新的。

在这里插入图片描述
JS代码：

function create() {
    
    
    var s = [];
    var hexDigits = "0123456789abcdef";
    for (var i = 0; i < 36; i++) {
    
    
        s[i] = hexDigits.substr(Math.floor(Math.random() * 0x10), 1);
    }
    s[14] = "4";
    s[19] = hexDigits.substr((s[19] & 0x3) | 0x8, 1);
    s[8] = s[13] = s[18] = s[23] = "";
    var uuid = s.join("");
    return uuid;
}

我们给他进行Python还原。

Python代码：

import math
import random

def get_uuid():
    s = ['' for i in range(36)]
    hexDigits = "0123456789abcdef"
    for i in range(36):
        s[i] = hexDigits[math.floor(random.random() * 0x10)]
    s[14] = "4"
    s[19] = hexDigits[(int(s[19]) if s[19].isdecimal() else 0 & 0x3) | 0x8]
    s[8] = s[13] = s[18] = s[23] = ""
    uuid = ''.join(s)
    print(uuid)

if __name__ == '__main__':
    get_uuid()

在这里插入图片描述

解决basicParams

我们发现basicParams没有什么变化，只是给我们的data参数更加补充完整了，所以我们不需要逆向啥，直接都写成固定的就可以了。

在这里插入图片描述

发送短信

既然所有的参数都解决了，那么下面就直接上代码，开始发送短信。

Python代码：

import math
import random
import requests

class xunjie():
    def __init__(self):
        self.session = requests.Session()
        self.get_uuid()
        self.send_message()

    def get_uuid(self):
        s = ['' for i in range(36)]
        hexDigits = "0123456789abcdef"
        for i in range(36):
            s[i] = hexDigits[math.floor(random.random() * 0x10)]
        s[14] = "4"
        s[19] = hexDigits[(int(s[19]) if s[19].isdecimal() else 0 & 0x3) | 0x8]
        s[8] = s[13] = s[18] = s[23] = ""
        self.uuid = ''.join(s)

    def send_message(self):
        self.phone = int(input('输入你的手机号码：'))
        url = "https://user.api.hudunsoft.com/v1/sms"
        headers = {
    
    
            'authority': 'user.api.hudunsoft.com',
            'method': 'POST',
            'path': '/v1/sms',
            'scheme': 'https',
            'accept': 'application/json, text/javascript, */*; q=0.01',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9',
            'content-length': '163',
            'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
            'origin': 'http://voice.xunjiepdf.com',
            'referer': 'http://voice.xunjiepdf.com/voice2text.html',
            'sec-fetch-dest': 'empty',
            'sec-fetch-mode': 'cors',
            'sec-fetch-site': 'cross-site',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        data = {
    
    
            'client': 'web',
            'source': '335',
            'soft_version': 'v3.0.1.1',
            'device_id': self.uuid,
            'version': 'v1.0.0',
            'phone': self.phone,
            'uuid': self.uuid,
            'code': ''
        }
        while True:
            response = self.session.post(url,headers=headers,data=data)
            message = response.json().get('message')
            if "ok" in message:
                print("短信发送成功",message)
                break
            else:
                print("短信发送失败",u'%s' % message)

if __name__ == '__main__':
    xunjie()

在这里插入图片描述

解决高风险时的图片验证码

高风险的时候会要求你输入图片验证码，这个也非常简单，只不过我现在还没有达到高风险，所以现在看不到，也截不了图，所以就直接给你们上代码了，实现思路就是把图片下载下来，然后手动输入图片验证码，当然你也可以使用pytesseract库来识别图片验证码，所以这里我采用最简单的方法来实现。

Python代码：

import math
import random
import requests

class xunjie():
    def __init__(self):
        self.session = requests.Session()
        self.get_uuid()
        self.send_message()

    # 获取uuid
    def get_uuid(self):
        s = ['' for i in range(36)]
        hexDigits = "0123456789abcdef"
        for i in range(36):
            s[i] = hexDigits[math.floor(random.random() * 0x10)]
        s[14] = "4"
        s[19] = hexDigits[(int(s[19]) if s[19].isdecimal() else 0 & 0x3) | 0x8]
        s[8] = s[13] = s[18] = s[23] = ""
        self.uuid = ''.join(s)

    # 发送短信
    def send_message(self):
        self.phone = int(input('输入你的手机号码：'))
        url = "https://user.api.hudunsoft.com/v1/sms"
        headers = {
    
    
            'authority': 'user.api.hudunsoft.com',
            'method': 'POST',
            'path': '/v1/sms',
            'scheme': 'https',
            'accept': 'application/json, text/javascript, */*; q=0.01',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9',
            'content-length': '163',
            'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
            'origin': 'http://voice.xunjiepdf.com',
            'referer': 'http://voice.xunjiepdf.com/voice2text.html',
            'sec-fetch-dest': 'empty',
            'sec-fetch-mode': 'cors',
            'sec-fetch-site': 'cross-site',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        data = {
    
    
            'client': 'web',
            'source': '335',
            'soft_version': 'v3.0.1.1',
            'device_id': self.uuid,
            'version': 'v1.0.0',
            'phone': self.phone,
            'uuid': self.uuid,
            'code': ''
        }
        while True:
            response = self.session.post(url,headers=headers,data=data)
            message = response.json().get('message')
            if "ok" in message:
                print("短信发送成功",message)
                break
            else:
                print("短信发送失败",u'%s' % message)
                data['code'] = self.recognition_image()

    # 识别图片验证码
    def recognition_image(self):
        url = 'https://user.api.hudunsoft.com/v1/captcha?uuid={uuid}&time={time}&client=web&source=335'.format(uuid=self.uuid,time=str(time.time()).replace('.','')[:13])
        headers = {
    
    
            'authority': 'user.api.hudunsoft.com',
            'method': 'GET',
            'path': '/v1/captcha?uuid=e884673549de432f8487c6078bc38685&time=1597927148527&client=web&source=335',
            'scheme': 'https',
            'accept': 'image/webp,image/apng,image/*,*/*;q=0.8',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9',
            'referer': 'http://voice.xunjiepdf.com/voice2text.html',
            'sec-fetch-dest': 'image',
            'sec-fetch-mode': 'no-cors',
            'sec-fetch-site': 'cross-site',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        response = self.session.get(url,headers=headers)
        with open('验证码.jpg','wb') as f:
            f.write(response.content)
        code = int(input("请查看文件内的验证码并输入："))
        return code

if __name__ == '__main__':
    xunjie()

持久化登陆

从登陆成功时捕获的链接可以看出来这里面的参数都是固定的了，device_id就是你第一次获取uuid时的值，phone就是你的手机号码，code就是你的手机验证码了。

在这里插入图片描述
我在代码里面加了持久化登陆，因为这个迅捷的操作都是基于token的，所以我们直接记录登陆后的token就可以了。

Python代码：

import math
import random
import requests
import json
import os
import time

class xunjie():
    def __init__(self):
        self.session = requests.Session()
        self.get_uuid()
        # 持久化登陆代码
        if 'cookie.txt' in os.listdir('.'):
            with open('cookie.txt', 'r') as f:
                cookie_data = f.read()
                if cookie_data:
                    self.session.cookies = requests.utils.cookiejar_from_dict(json.loads(cookie_data))
                else:
                    print('cookie.txt文件内容为空，请删除后在运行')
                    return True
            with open('token.txt', 'r') as f:
                token_data = f.read()
                if token_data:
                    self.token = token_data
                else:
                    print('token.txt文件内容为空，请删除后在运行')
                    return True
        else:
            self.send_message()
            self.login()

    # 获取uuid
    def get_uuid(self):
        s = ['' for i in range(36)]
        hexDigits = "0123456789abcdef"
        for i in range(36):
            s[i] = hexDigits[math.floor(random.random() * 0x10)]
        s[14] = "4"
        s[19] = hexDigits[(int(s[19]) if s[19].isdecimal() else 0 & 0x3) | 0x8]
        s[8] = s[13] = s[18] = s[23] = ""
        self.uuid = ''.join(s)

    # 发送短信
    def send_message(self):
        self.phone = int(input('输入你的手机号码：'))
        url = "https://user.api.hudunsoft.com/v1/sms"
        headers = {
    
    
            'authority': 'user.api.hudunsoft.com',
            'method': 'POST',
            'path': '/v1/sms',
            'scheme': 'https',
            'accept': 'application/json, text/javascript, */*; q=0.01',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9',
            'content-length': '163',
            'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
            'origin': 'http://voice.xunjiepdf.com',
            'referer': 'http://voice.xunjiepdf.com/voice2text.html',
            'sec-fetch-dest': 'empty',
            'sec-fetch-mode': 'cors',
            'sec-fetch-site': 'cross-site',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        data = {
    
    
            'client': 'web',
            'source': '335',
            'soft_version': 'v3.0.1.1',
            'device_id': self.uuid,
            'version': 'v1.0.0',
            'phone': self.phone,
            'uuid': self.uuid,
            'code': ''
        }
        while True:
            response = self.session.post(url,headers=headers,data=data)
            message = response.json().get('message')
            if "ok" in message:
                print("短信发送成功",message)
                break
            else:
                print("短信发送失败",u'%s' % message)
                data['code'] = self.recognition_image()

    # 识别图片验证码
    def recognition_image(self):
        url = 'https://user.api.hudunsoft.com/v1/captcha?uuid={uuid}&time={time}&client=web&source=335'.format(uuid=self.uuid,time=str(time.time()).replace('.','')[:13])
        headers = {
    
    
            'authority': 'user.api.hudunsoft.com',
            'method': 'GET',
            'path': '/v1/captcha?uuid=e884673549de432f8487c6078bc38685&time=1597927148527&client=web&source=335',
            'scheme': 'https',
            'accept': 'image/webp,image/apng,image/*,*/*;q=0.8',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9',
            'referer': 'http://voice.xunjiepdf.com/voice2text.html',
            'sec-fetch-dest': 'image',
            'sec-fetch-mode': 'no-cors',
            'sec-fetch-site': 'cross-site',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        response = self.session.get(url,headers=headers)
        with open('验证码.jpg','wb') as f:
            f.write(response.content)
        code = int(input("请查看文件内的验证码并输入："))
        return code

    # 登陆
    def login(self):
        self.code = int(input('输入你的短信验证码：'))
        url = "https://user.api.hudunsoft.com/v1/user/auto_sign_in"
        headers = {
    
    
             'authority': 'user.api.hudunsoft.com',
             'method': 'POST',
             'path': '/v1/user/auto_sign_in',
             'scheme': 'https',
             'accept': 'application/json, text/javascript, */*; q=0.01',
             'accept-encoding': 'gzip, deflate, br',
             'accept-language': 'zh-CN,zh;q=0.9',
             'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
             'origin': 'http://voice.xunjiepdf.com',
             'referer': 'http://voice.xunjiepdf.com/voice2text.html',
             'sec-fetch-dest': 'empty',
             'sec-fetch-mode': 'cors',
             'sec-fetch-site': 'cross-site',
             'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        data = {
    
    
             'client': 'web',
             'source': '335',
             'soft_version': 'v3.0.1.1',
             'device_id': self.uuid,
             'phone': self.phone,
             'code': self.code
        }
        response = self.session.post(url,headers=headers,data=data)
        json_data = response.json()
        if "ok" in json_data.get('message'):
            print("登陆成功")
            print(json_data)
            self.token = json_data.get('data').get('token')
            with open('cookie.txt','w') as f:
                f.write(json.dumps(requests.utils.dict_from_cookiejar(response.cookies)))
            with open('token.txt','w') as f:
                f.write(self.token)
        else:
            print("登陆失败")

if __name__ == '__main__':
    xunjie()

在这里插入图片描述

分片上传音频文件

分片上传是什么意思呢？就是将大文件切分成多个小文件，把这些小文件都上传上去后在进行合并，合并为一个大文件。

我们上传的时候发现有3个新的POST请求产生，这就是我们分片上传的请求链接。从这些POST请求中Form Data参数就能看出来，第一个POST是分片上传的开始(仅仅是给服务器提示我要上传，做个记录)，第二POST才是真正分片上传文件的，第三个POST是分片上传的结束(仅仅是给服务器提示我上传完毕)。

在这里插入图片描述

解决POST请求参数

第一个POST请求：

第一个POST请求和第三个POST请求两者参数只有action有变化，其他均没有发生变化，md5参数和fileName是不固定的，下面开始解决这两个参数。

搜索fileName关键字，看到下面的md5和fileName参数都出来，直接开始调试。

在这里插入图片描述
在往上面看看，发现有惊喜，分片大小是每次2M，也就是说大于2M的文件将被分为多个2M的小文件，如：3M大小的文件将被分为2M和1M的文件，然后上传上去。

在这里插入图片描述
还发现个大问题就是，webUploader是有实现类的，所以我们跳进去看看，发现各种东西都是在里面进行处理的。

在这里插入图片描述
看到调试的file他的类型为FileInfo。

在这里插入图片描述
那么我们搜索他的实现类，ID = Guid.NewGuid().ToString("N");，MD5就是对整个文件进行MD5运算。

Guid JS代码：

function Guid(g) {
    
    
    var arr = new Array();
    if (typeof (g) == "string") {
    
    
        InitByString(arr, g)
    } else {
    
    
        InitByOther(arr)
    }
    ;this.Equals = function(o) {
    
    
        if (o && o.IsGuid) {
    
    
            return this.ToString() == o.ToString()
        } else {
    
    
            return false
        }
    }
    ;
    this.IsGuid = function() {
    
    }
    ;
    this.ToString = function(format) {
    
    
        if (typeof (format) == "string") {
    
    
            if (format == "N" || format == "D" || format == "B" || format == "P") {
    
    
                return ToStringWithFormat(arr, format)
            } else {
    
    
                return ToStringWithFormat(arr, "D")
            }
        } else {
    
    
            return ToStringWithFormat(arr, "D")
        }
    }
    ;
    function InitByString(arr, g) {
    
    
        g = g.replace(/\{|\(|\)|\}|-/g, "");
        g = g.toLowerCase();
        if (g.length != 32 || g.search(/[^0-9,a-f]/i) != -1) {
    
    
            InitByOther(arr)
        } else {
    
    
            for (var i = 0; i < g.length; i++) {
    
    
                arr.push(g[i])
            }
        }
    }
    ;function InitByOther(arr) {
    
    
        var i = 32;
        while (i--) {
    
    
            arr.push("0")
        }
    }
    ;function ToStringWithFormat(arr, format) {
    
    
        switch (format) {
    
    
        case "N":
            return arr.toString().replace(/,/g, "");
        case "D":
            var str = arr.slice(0, 8) + "-" + arr.slice(8, 12) + "-" + arr.slice(12, 16) + "-" + arr.slice(16, 20) + "-" + arr.slice(20, 32);
            str = str.replace(/,/g, "");
            return str;
        case "B":
            var str = ToStringWithFormat(arr, "D");
            str = "{" + str + "}";
            return str;
        case "P":
            var str = ToStringWithFormat(arr, "D");
            str = "(" + str + ")";
            return str;
        default:
            return new Guid()
        }
    }
}
;Guid.Empty = new Guid();
Guid.NewGuid = function() {
    
    
    var g = "";
    var i = 32;
    while (i--) {
    
    
        g += Math.floor(Math.random() * 16.0).toString(16)
    }
    return new Guid(g)
}
;
//这两行是自己添加上去的
var id = Guid.NewGuid().ToString("N");
console.log(id);

把上面这串JS代码保存下来，名字为guid.js
Python代码：

import os

def get_guid():
    guid = os.popen('node guid.js').read().replace('\n', '')
    return guid

if __name__ == '__main__':
    print(get_guid())

在这里插入图片描述
文件MD5值运算：

Python代码：

import hashlib

def get_md5():
    md5 = hashlib.md5()
    with open('1.mp3', 'rb') as f:
        md5.update(f.read())
        md5_file = md5.hexdigest()
        print(md5_file)

if __name__ == '__main__':
    get_md5()

在这里插入图片描述

通过这串代码(webUploader里面的)可以看出，这就是我们的第一个POST请求，参数就是data: { action: 'Begin', fileName: currentFile.ID + "_" + currentFile.Name, md5: currentFile.MD5 }，这里的currentFile就是我们看到的FileInfo。

在这里插入图片描述
分析这么久了，开始上代码了。
Python代码：

import math
import random
import requests
import json
import os
import hashlib
import time

class xunjie():
    def __init__(self):
        self.session = requests.Session()
        self.get_uuid()
        # 持久化登陆代码
        if 'cookie.txt' in os.listdir('.'):
            with open('cookie.txt', 'r') as f:
                cookie_data = f.read()
                if cookie_data:
                    self.session.cookies = requests.utils.cookiejar_from_dict(json.loads(cookie_data))
                else:
                    print('cookie.txt文件内容为空，请删除后在运行')
                    return True
            with open('token.txt', 'r') as f:
                token_data = f.read()
                if token_data:
                    self.token = token_data
                else:
                    print('token.txt文件内容为空，请删除后在运行')
                    return True
        else:
            self.send_message()
            self.login()
        self.start_upload_file()

    # 获取uuid
    def get_uuid(self):
        s = ['' for i in range(36)]
        hexDigits = "0123456789abcdef"
        for i in range(36):
            s[i] = hexDigits[math.floor(random.random() * 0x10)]
        s[14] = "4"
        s[19] = hexDigits[(int(s[19]) if s[19].isdecimal() else 0 & 0x3) | 0x8]
        s[8] = s[13] = s[18] = s[23] = ""
        self.uuid = ''.join(s)

    # 发送短信
    def send_message(self):
        self.phone = int(input('输入你的手机号码：'))
        url = "https://user.api.hudunsoft.com/v1/sms"
        headers = {
    
    
            'authority': 'user.api.hudunsoft.com',
            'method': 'POST',
            'path': '/v1/sms',
            'scheme': 'https',
            'accept': 'application/json, text/javascript, */*; q=0.01',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9',
            'content-length': '163',
            'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
            'origin': 'http://voice.xunjiepdf.com',
            'referer': 'http://voice.xunjiepdf.com/voice2text.html',
            'sec-fetch-dest': 'empty',
            'sec-fetch-mode': 'cors',
            'sec-fetch-site': 'cross-site',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        data = {
    
    
            'client': 'web',
            'source': '335',
            'soft_version': 'v3.0.1.1',
            'device_id': self.uuid,
            'version': 'v1.0.0',
            'phone': self.phone,
            'uuid': self.uuid,
            'code': ''
        }
        while True:
            response = self.session.post(url,headers=headers,data=data)
            message = response.json().get('message')
            if "ok" in message:
                print("短信发送成功",message)
                break
            else:
                print("短信发送失败",u'%s' % message)
                data['code'] = self.recognition_image()

    # 识别图片验证码
    def recognition_image(self):
        url = 'https://user.api.hudunsoft.com/v1/captcha?uuid={uuid}&time={time}&client=web&source=335'.format(uuid=self.uuid,time=str(time.time()).replace('.','')[:13])
        headers = {
    
    
            'authority': 'user.api.hudunsoft.com',
            'method': 'GET',
            'path': '/v1/captcha?uuid=e884673549de432f8487c6078bc38685&time=1597927148527&client=web&source=335',
            'scheme': 'https',
            'accept': 'image/webp,image/apng,image/*,*/*;q=0.8',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9',
            'referer': 'http://voice.xunjiepdf.com/voice2text.html',
            'sec-fetch-dest': 'image',
            'sec-fetch-mode': 'no-cors',
            'sec-fetch-site': 'cross-site',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        response = self.session.get(url,headers=headers)
        with open('验证码.jpg','wb') as f:
            f.write(response.content)
        code = int(input("请查看文件内的验证码并输入："))
        return code

    # 登陆
    def login(self):
        self.code = int(input('输入你的短信验证码：'))
        url = "https://user.api.hudunsoft.com/v1/user/auto_sign_in"
        headers = {
    
    
             'authority': 'user.api.hudunsoft.com',
             'method': 'POST',
             'path': '/v1/user/auto_sign_in',
             'scheme': 'https',
             'accept': 'application/json, text/javascript, */*; q=0.01',
             'accept-encoding': 'gzip, deflate, br',
             'accept-language': 'zh-CN,zh;q=0.9',
             'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
             'origin': 'http://voice.xunjiepdf.com',
             'referer': 'http://voice.xunjiepdf.com/voice2text.html',
             'sec-fetch-dest': 'empty',
             'sec-fetch-mode': 'cors',
             'sec-fetch-site': 'cross-site',
             'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        data = {
    
    
             'client': 'web',
             'source': '335',
             'soft_version': 'v3.0.1.1',
             'device_id': self.uuid,
             'phone': self.phone,
             'code': self.code
        }
        response = self.session.post(url,headers=headers,data=data)
        json_data = response.json()
        if "ok" in json_data.get('message'):
            print("登陆成功")
            print(json_data)
            self.token = json_data.get('data').get('token')
            with open('cookie.txt','w') as f:
                f.write(json.dumps(requests.utils.dict_from_cookiejar(response.cookies)))
            with open('token.txt','w') as f:
                f.write(self.token)
        else:
            print("登陆失败")

    # 获取GUID
    def get_guid(self):
        guid = os.popen('node guid.js').read().replace('\n', '')
        return guid

    # 获取文件md5值
    def get_md5(self):
        md5 = hashlib.md5()
        with open(self.file, 'rb') as f:
            md5.update(f.read())
            self.md5_file = md5.hexdigest()

    # 开始上传
    def start_upload_file(self):
        path = '/v1/alivoice/uploadaudiofile?r=' + str(random.random())
        url = 'https://user.api.hudunsoft.com' + path
        headers = {
    
    
            'authority': 'user.api.hudunsoft.com',
            'method': 'POST',
            'path': path,
            'scheme': 'https',
            'accept': 'application/json, text/javascript, */*; q=0.01',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9',
            'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
            'origin': 'http://voice.xunjiepdf.com',
            'referer': 'http://voice.xunjiepdf.com/voice2text.html',
            'sec-fetch-dest': 'empty',
            'sec-fetch-mode': 'cors',
            'sec-fetch-site': 'cross-site',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        self.file = '1.mp3'
        self.get_md5()
        self.file_name = self.get_guid() + '_' + self.file
        data = {
    
    
            'action': 'Begin',
            'fileName': self.file_name,
            'md5': self.md5_file
        }
        response = self.session.post(url,headers=headers,data=data)
        print(response.text)

if __name__ == '__main__':
    xunjie()

在这里插入图片描述
重复上传一样的，就会返回{"pos":"-1"}，如果是新上传的就会返回{"pos":"0"}。

在这里插入图片描述
第二个POST请求：

这个请求我就不在带你们看了，直接就上代码了，因为前面的部分都已经讲的很清楚了。

在这里插入图片描述

Python代码：

import math
import random
import requests
import json
import os
import hashlib
import time
from urllib3 import encode_multipart_formdata

class xunjie():
    def __init__(self):
        self.session = requests.Session()
        self.get_uuid()
        # 持久化登陆代码
        if 'cookie.txt' in os.listdir('.'):
            with open('cookie.txt', 'r') as f:
                cookie_data = f.read()
                if cookie_data:
                    self.session.cookies = requests.utils.cookiejar_from_dict(json.loads(cookie_data))
                else:
                    print('cookie.txt文件内容为空，请删除后在运行')
                    return True
            with open('token.txt', 'r') as f:
                token_data = f.read()
                if token_data:
                    self.token = token_data
                else:
                    print('token.txt文件内容为空，请删除后在运行')
                    return True
        else:
            self.send_message()
            self.login()
        self.start_upload_file()
        self.store_upload_file()

    # 获取uuid
    def get_uuid(self):
        s = ['' for i in range(36)]
        hexDigits = "0123456789abcdef"
        for i in range(36):
            s[i] = hexDigits[math.floor(random.random() * 0x10)]
        s[14] = "4"
        s[19] = hexDigits[(int(s[19]) if s[19].isdecimal() else 0 & 0x3) | 0x8]
        s[8] = s[13] = s[18] = s[23] = ""
        self.uuid = ''.join(s)

    # 发送短信
    def send_message(self):
        self.phone = int(input('输入你的手机号码：'))
        url = "https://user.api.hudunsoft.com/v1/sms"
        headers = {
    
    
            'authority': 'user.api.hudunsoft.com',
            'method': 'POST',
            'path': '/v1/sms',
            'scheme': 'https',
            'accept': 'application/json, text/javascript, */*; q=0.01',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9',
            'content-length': '163',
            'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
            'origin': 'http://voice.xunjiepdf.com',
            'referer': 'http://voice.xunjiepdf.com/voice2text.html',
            'sec-fetch-dest': 'empty',
            'sec-fetch-mode': 'cors',
            'sec-fetch-site': 'cross-site',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        data = {
    
    
            'client': 'web',
            'source': '335',
            'soft_version': 'v3.0.1.1',
            'device_id': self.uuid,
            'version': 'v1.0.0',
            'phone': self.phone,
            'uuid': self.uuid,
            'code': ''
        }
        while True:
            response = self.session.post(url,headers=headers,data=data)
            message = response.json().get('message')
            if "ok" in message:
                print("短信发送成功",message)
                break
            else:
                print("短信发送失败",u'%s' % message)
                data['code'] = self.recognition_image()

    # 识别图片验证码
    def recognition_image(self):
        url = 'https://user.api.hudunsoft.com/v1/captcha?uuid={uuid}&time={time}&client=web&source=335'.format(uuid=self.uuid,time=str(time.time()).replace('.','')[:13])
        headers = {
    
    
            'authority': 'user.api.hudunsoft.com',
            'method': 'GET',
            'path': '/v1/captcha?uuid=e884673549de432f8487c6078bc38685&time=1597927148527&client=web&source=335',
            'scheme': 'https',
            'accept': 'image/webp,image/apng,image/*,*/*;q=0.8',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9',
            'referer': 'http://voice.xunjiepdf.com/voice2text.html',
            'sec-fetch-dest': 'image',
            'sec-fetch-mode': 'no-cors',
            'sec-fetch-site': 'cross-site',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        response = self.session.get(url,headers=headers)
        with open('验证码.jpg','wb') as f:
            f.write(response.content)
        code = int(input("请查看文件内的验证码并输入："))
        return code

    # 登陆
    def login(self):
        self.code = int(input('输入你的短信验证码：'))
        url = "https://user.api.hudunsoft.com/v1/user/auto_sign_in"
        headers = {
    
    
             'authority': 'user.api.hudunsoft.com',
             'method': 'POST',
             'path': '/v1/user/auto_sign_in',
             'scheme': 'https',
             'accept': 'application/json, text/javascript, */*; q=0.01',
             'accept-encoding': 'gzip, deflate, br',
             'accept-language': 'zh-CN,zh;q=0.9',
             'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
             'origin': 'http://voice.xunjiepdf.com',
             'referer': 'http://voice.xunjiepdf.com/voice2text.html',
             'sec-fetch-dest': 'empty',
             'sec-fetch-mode': 'cors',
             'sec-fetch-site': 'cross-site',
             'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        data = {
    
    
             'client': 'web',
             'source': '335',
             'soft_version': 'v3.0.1.1',
             'device_id': self.uuid,
             'phone': self.phone,
             'code': self.code
        }
        response = self.session.post(url,headers=headers,data=data)
        json_data = response.json()
        if "ok" in json_data.get('message'):
            print("登陆成功")
            print(json_data)
            self.token = json_data.get('data').get('token')
            with open('cookie.txt','w') as f:
                f.write(json.dumps(requests.utils.dict_from_cookiejar(response.cookies)))
            with open('token.txt','w') as f:
                f.write(self.token)
        else:
            print("登陆失败")

    # 获取GUID
    def get_guid(self):
        guid = os.popen('node guid.js').read().replace('\n', '')
        return guid

    # 获取文件md5值
    def get_md5(self):
        md5 = hashlib.md5()
        with open(self.file, 'rb') as f:
            md5.update(f.read())
            self.md5_file = md5.hexdigest()

    # 开始上传
    def start_upload_file(self):
        path = '/v1/alivoice/uploadaudiofile?r=' + str(random.random())
        url = 'https://user.api.hudunsoft.com' + path
        headers = {
    
    
            'authority': 'user.api.hudunsoft.com',
            'method': 'POST',
            'path': path,
            'scheme': 'https',
            'accept': 'application/json, text/javascript, */*; q=0.01',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9',
            'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
            'origin': 'http://voice.xunjiepdf.com',
            'referer': 'http://voice.xunjiepdf.com/voice2text.html',
            'sec-fetch-dest': 'empty',
            'sec-fetch-mode': 'cors',
            'sec-fetch-site': 'cross-site',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        self.file = '1.mp3'
        self.get_md5()
        self.file_name = self.get_guid() + '_' + self.file
        data = {
    
    
            'action': 'Begin',
            'fileName': self.file_name,
            'md5': self.md5_file
        }
        response = self.session.post(url,headers=headers,data=data)
        print(response.text)

    # 分片上传文件内容
    def store_upload_file(self):
        path = "/v1/alivoice/uploadaudiofile?r=" + str(random.random())
        url = "https://user.api.hudunsoft.com" + path
        headers = {
    
    
            'authority': 'user.api.hudunsoft.com',
            'method': 'POST',
            'path': path,
            'scheme': 'https',
            'accept': 'application/json, text/javascript, */*; q=0.01',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9',
            'content-length': '2097152',
            'content-type': 'multipart/form-data;',
            'origin': 'http://voice.xunjiepdf.com',
            'referer': 'http://voice.xunjiepdf.com/voice2text.html',
            'sec-fetch-dest': 'empty',
            'sec-fetch-mode': 'cors',
            'sec-fetch-site': 'cross-site',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        data = {
    
    
            'action': 'Store',
            'pos': '0',
            'size': '2097152',
            'md5': self.md5_file
        }
        with open(self.file, 'rb') as f:
            while True:
                files = f.read(2 * 1024 * 1024)
                if files:
                    data['size'] = len(files)
                    data['file'] = (self.file, files)
                    encode_data = encode_multipart_formdata(data)
                    data1 = encode_data[0]
                    headers['Content-Type'] = encode_data[1]
                    headers['content-length'] = str(len(files))
                    response = self.session.post(url,data=data1,headers=headers)
                    print(response.text)
                    f.seek(f.tell())
                    data['pos'] = f.tell()
                else:
                    print('上传完成')
                    break

if __name__ == '__main__':
    xunjie()

在这里插入图片描述

第三个POST请求：

这个请求我就也不在带你们看了，直接就上代码了，因为前面的部分都已经讲的很清楚了。

在这里插入图片描述
Python代码：

import math
import random
import requests
import json
import os
import hashlib
import time
from urllib3 import encode_multipart_formdata

class xunjie():
    def __init__(self):
        self.session = requests.Session()
        self.get_uuid()
        # 持久化登陆代码
        if 'cookie.txt' in os.listdir('.'):
            with open('cookie.txt', 'r') as f:
                cookie_data = f.read()
                if cookie_data:
                    self.session.cookies = requests.utils.cookiejar_from_dict(json.loads(cookie_data))
                else:
                    print('cookie.txt文件内容为空，请删除后在运行')
                    return True
            with open('token.txt', 'r') as f:
                token_data = f.read()
                if token_data:
                    self.token = token_data
                else:
                    print('token.txt文件内容为空，请删除后在运行')
                    return True
        else:
            self.send_message()
            self.login()
        self.start_upload_file()
        self.store_upload_file()
        self.end_upload_file()

    # 获取uuid
    def get_uuid(self):
        s = ['' for i in range(36)]
        hexDigits = "0123456789abcdef"
        for i in range(36):
            s[i] = hexDigits[math.floor(random.random() * 0x10)]
        s[14] = "4"
        s[19] = hexDigits[(int(s[19]) if s[19].isdecimal() else 0 & 0x3) | 0x8]
        s[8] = s[13] = s[18] = s[23] = ""
        self.uuid = ''.join(s)

    # 发送短信
    def send_message(self):
        self.phone = int(input('输入你的手机号码：'))
        url = "https://user.api.hudunsoft.com/v1/sms"
        headers = {
    
    
            'authority': 'user.api.hudunsoft.com',
            'method': 'POST',
            'path': '/v1/sms',
            'scheme': 'https',
            'accept': 'application/json, text/javascript, */*; q=0.01',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9',
            'content-length': '163',
            'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
            'origin': 'http://voice.xunjiepdf.com',
            'referer': 'http://voice.xunjiepdf.com/voice2text.html',
            'sec-fetch-dest': 'empty',
            'sec-fetch-mode': 'cors',
            'sec-fetch-site': 'cross-site',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        data = {
    
    
            'client': 'web',
            'source': '335',
            'soft_version': 'v3.0.1.1',
            'device_id': self.uuid,
            'version': 'v1.0.0',
            'phone': self.phone,
            'uuid': self.uuid,
            'code': ''
        }
        while True:
            response = self.session.post(url,headers=headers,data=data)
            message = response.json().get('message')
            if "ok" in message:
                print("短信发送成功",message)
                break
            else:
                print("短信发送失败",u'%s' % message)
                data['code'] = self.recognition_image()

    # 识别图片验证码
    def recognition_image(self):
        url = 'https://user.api.hudunsoft.com/v1/captcha?uuid={uuid}&time={time}&client=web&source=335'.format(uuid=self.uuid,time=str(time.time()).replace('.','')[:13])
        headers = {
    
    
            'authority': 'user.api.hudunsoft.com',
            'method': 'GET',
            'path': '/v1/captcha?uuid=e884673549de432f8487c6078bc38685&time=1597927148527&client=web&source=335',
            'scheme': 'https',
            'accept': 'image/webp,image/apng,image/*,*/*;q=0.8',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9',
            'referer': 'http://voice.xunjiepdf.com/voice2text.html',
            'sec-fetch-dest': 'image',
            'sec-fetch-mode': 'no-cors',
            'sec-fetch-site': 'cross-site',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        response = self.session.get(url,headers=headers)
        with open('验证码.jpg','wb') as f:
            f.write(response.content)
        code = int(input("请查看文件内的验证码并输入："))
        return code

    # 登陆
    def login(self):
        self.code = int(input('输入你的短信验证码：'))
        url = "https://user.api.hudunsoft.com/v1/user/auto_sign_in"
        headers = {
    
    
             'authority': 'user.api.hudunsoft.com',
             'method': 'POST',
             'path': '/v1/user/auto_sign_in',
             'scheme': 'https',
             'accept': 'application/json, text/javascript, */*; q=0.01',
             'accept-encoding': 'gzip, deflate, br',
             'accept-language': 'zh-CN,zh;q=0.9',
             'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
             'origin': 'http://voice.xunjiepdf.com',
             'referer': 'http://voice.xunjiepdf.com/voice2text.html',
             'sec-fetch-dest': 'empty',
             'sec-fetch-mode': 'cors',
             'sec-fetch-site': 'cross-site',
             'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        data = {
    
    
             'client': 'web',
             'source': '335',
             'soft_version': 'v3.0.1.1',
             'device_id': self.uuid,
             'phone': self.phone,
             'code': self.code
        }
        response = self.session.post(url,headers=headers,data=data)
        json_data = response.json()
        if "ok" in json_data.get('message'):
            print("登陆成功")
            print(json_data)
            self.token = json_data.get('data').get('token')
            with open('cookie.txt','w') as f:
                f.write(json.dumps(requests.utils.dict_from_cookiejar(response.cookies)))
            with open('token.txt','w') as f:
                f.write(self.token)
        else:
            print("登陆失败")

    # 获取GUID
    def get_guid(self):
        guid = os.popen('node guid.js').read().replace('\n', '')
        return guid

    # 获取文件md5值
    def get_md5(self):
        md5 = hashlib.md5()
        with open(self.file, 'rb') as f:
            md5.update(f.read())
            self.md5_file = md5.hexdigest()

    # 开始上传
    def start_upload_file(self):
        path = '/v1/alivoice/uploadaudiofile?r=' + str(random.random())
        url = 'https://user.api.hudunsoft.com' + path
        headers = {
    
    
            'authority': 'user.api.hudunsoft.com',
            'method': 'POST',
            'path': path,
            'scheme': 'https',
            'accept': 'application/json, text/javascript, */*; q=0.01',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9',
            'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
            'origin': 'http://voice.xunjiepdf.com',
            'referer': 'http://voice.xunjiepdf.com/voice2text.html',
            'sec-fetch-dest': 'empty',
            'sec-fetch-mode': 'cors',
            'sec-fetch-site': 'cross-site',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        self.file = '1.mp3'
        self.get_md5()
        self.file_name = self.get_guid() + '_' + self.file
        data = {
    
    
            'action': 'Begin',
            'fileName': self.file_name,
            'md5': self.md5_file
        }
        response = self.session.post(url,headers=headers,data=data)
        print(response.text)

    # 分片上传文件内容
    def store_upload_file(self):
        path = "/v1/alivoice/uploadaudiofile?r=" + str(random.random())
        url = "https://user.api.hudunsoft.com" + path
        headers = {
    
    
            'authority': 'user.api.hudunsoft.com',
            'method': 'POST',
            'path': path,
            'scheme': 'https',
            'accept': 'application/json, text/javascript, */*; q=0.01',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9',
            'content-length': '2097152',
            'content-type': 'multipart/form-data;',
            'origin': 'http://voice.xunjiepdf.com',
            'referer': 'http://voice.xunjiepdf.com/voice2text.html',
            'sec-fetch-dest': 'empty',
            'sec-fetch-mode': 'cors',
            'sec-fetch-site': 'cross-site',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        data = {
    
    
            'action': 'Store',
            'pos': '0',
            'size': '2097152',
            'md5': self.md5_file
        }
        with open(self.file, 'rb') as f:
            while True:
                files = f.read(2 * 1024 * 1024)
                if files:
                    data['size'] = len(files)
                    data['file'] = (self.file, files)
                    encode_data = encode_multipart_formdata(data)
                    data1 = encode_data[0]
                    headers['Content-Type'] = encode_data[1]
                    headers['content-length'] = str(len(files))
                    response = self.session.post(url,data=data1,headers=headers)
                    print(response.text)
                    f.seek(f.tell())
                    data['pos'] = f.tell()
                else:
                    print('上传完成')
                    break

    # 结束上传
    def end_upload_file(self):
        path = '/v1/alivoice/uploadaudiofile?r=' + str(random.random())
        url = 'https://user.api.hudunsoft.com' + path
        headers = {
    
    
            'authority': 'user.api.hudunsoft.com',
            'method': 'POST',
            'path': path,
            'scheme': 'https',
            'accept': 'application/json, text/javascript, */*; q=0.01',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9',
            'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
            'origin': 'http://voice.xunjiepdf.com',
            'referer': 'http://voice.xunjiepdf.com/voice2text.html',
            'sec-fetch-dest': 'empty',
            'sec-fetch-mode': 'cors',
            'sec-fetch-site': 'cross-site',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        data = {
    
    
            'action': 'End',
            'fileName': self.file_name,
            'md5': self.md5_file
        }
        response = self.session.post(url,headers=headers,data=data)
        print(response.text)

if __name__ == '__main__':
    xunjie()

在这里插入图片描述

音频转文字

当我们点击转换文字时，产生的POST请求，为啥会有多个呢，是因为前面没有转换成功就会继续请求。

音频转文字失败：

在这里插入图片描述

音频转文字成功：

在这里插入图片描述
直接使用Python来进行请求，这个请求参数也没有什么变化，直接请求即可。

Python代码：

import math
import random
import requests
import json
import os
import hashlib
import time
from urllib3 import encode_multipart_formdata

class xunjie():
    def __init__(self,file):
        self.file = file
        self.session = requests.Session()
        self.get_uuid()
        # 持久化登陆代码
        if 'cookie.txt' in os.listdir('.'):
            with open('cookie.txt', 'r') as f:
                cookie_data = f.read()
                if cookie_data:
                    self.session.cookies = requests.utils.cookiejar_from_dict(json.loads(cookie_data))
                else:
                    print('cookie.txt文件内容为空，请删除后在运行')
                    return True
            with open('token.txt', 'r') as f:
                token_data = f.read()
                if token_data:
                    self.token = token_data
                else:
                    print('token.txt文件内容为空，请删除后在运行')
                    return True
        else:
            self.send_message()
            self.login()
        self.start_upload_file()
        self.store_upload_file()
        self.end_upload_file()
        self.md5_to_text()

    # 获取uuid
    def get_uuid(self):
        s = ['' for i in range(36)]
        hexDigits = "0123456789abcdef"
        for i in range(36):
            s[i] = hexDigits[math.floor(random.random() * 0x10)]
        s[14] = "4"
        s[19] = hexDigits[(int(s[19]) if s[19].isdecimal() else 0 & 0x3) | 0x8]
        s[8] = s[13] = s[18] = s[23] = ""
        self.uuid = ''.join(s)

    # 发送短信
    def send_message(self):
        self.phone = int(input('输入你的手机号码：'))
        url = "https://user.api.hudunsoft.com/v1/sms"
        headers = {
    
    
            'authority': 'user.api.hudunsoft.com',
            'method': 'POST',
            'path': '/v1/sms',
            'scheme': 'https',
            'accept': 'application/json, text/javascript, */*; q=0.01',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9',
            'content-length': '163',
            'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
            'origin': 'http://voice.xunjiepdf.com',
            'referer': 'http://voice.xunjiepdf.com/voice2text.html',
            'sec-fetch-dest': 'empty',
            'sec-fetch-mode': 'cors',
            'sec-fetch-site': 'cross-site',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        data = {
    
    
            'client': 'web',
            'source': '335',
            'soft_version': 'v3.0.1.1',
            'device_id': self.uuid,
            'version': 'v1.0.0',
            'phone': self.phone,
            'uuid': self.uuid,
            'code': ''
        }
        while True:
            response = self.session.post(url,headers=headers,data=data)
            message = response.json().get('message')
            if "ok" in message:
                print("短信发送成功",message)
                break
            else:
                print("短信发送失败",u'%s' % message)
                data['code'] = self.recognition_image()

    # 识别图片验证码
    def recognition_image(self):
        url = 'https://user.api.hudunsoft.com/v1/captcha?uuid={uuid}&time={time}&client=web&source=335'.format(uuid=self.uuid,time=str(time.time()).replace('.','')[:13])
        headers = {
    
    
            'authority': 'user.api.hudunsoft.com',
            'method': 'GET',
            'path': '/v1/captcha?uuid=e884673549de432f8487c6078bc38685&time=1597927148527&client=web&source=335',
            'scheme': 'https',
            'accept': 'image/webp,image/apng,image/*,*/*;q=0.8',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9',
            'referer': 'http://voice.xunjiepdf.com/voice2text.html',
            'sec-fetch-dest': 'image',
            'sec-fetch-mode': 'no-cors',
            'sec-fetch-site': 'cross-site',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        response = self.session.get(url,headers=headers)
        with open('验证码.jpg','wb') as f:
            f.write(response.content)
        code = int(input("请查看文件内的验证码并输入："))
        return code

    # 登陆
    def login(self):
        self.code = int(input('输入你的短信验证码：'))
        url = "https://user.api.hudunsoft.com/v1/user/auto_sign_in"
        headers = {
    
    
             'authority': 'user.api.hudunsoft.com',
             'method': 'POST',
             'path': '/v1/user/auto_sign_in',
             'scheme': 'https',
             'accept': 'application/json, text/javascript, */*; q=0.01',
             'accept-encoding': 'gzip, deflate, br',
             'accept-language': 'zh-CN,zh;q=0.9',
             'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
             'origin': 'http://voice.xunjiepdf.com',
             'referer': 'http://voice.xunjiepdf.com/voice2text.html',
             'sec-fetch-dest': 'empty',
             'sec-fetch-mode': 'cors',
             'sec-fetch-site': 'cross-site',
             'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        data = {
    
    
             'client': 'web',
             'source': '335',
             'soft_version': 'v3.0.1.1',
             'device_id': self.uuid,
             'phone': self.phone,
             'code': self.code
        }
        response = self.session.post(url,headers=headers,data=data)
        json_data = response.json()
        if "ok" in json_data.get('message'):
            print("登陆成功")
            print(json_data)
            self.token = json_data.get('data').get('token')
            with open('cookie.txt','w') as f:
                f.write(json.dumps(requests.utils.dict_from_cookiejar(response.cookies)))
            with open('token.txt','w') as f:
                f.write(self.token)
        else:
            print("登陆失败")

    # 获取GUID
    def get_guid(self):
        guid = os.popen('node guid.js').read().replace('\n', '')
        return guid

    # 获取文件md5值
    def get_md5(self):
        md5 = hashlib.md5()
        with open(self.file, 'rb') as f:
            md5.update(f.read())
            self.md5_file = md5.hexdigest()

    # 开始上传
    def start_upload_file(self):
        path = '/v1/alivoice/uploadaudiofile?r=' + str(random.random())
        url = 'https://user.api.hudunsoft.com' + path
        headers = {
    
    
            'authority': 'user.api.hudunsoft.com',
            'method': 'POST',
            'path': path,
            'scheme': 'https',
            'accept': 'application/json, text/javascript, */*; q=0.01',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9',
            'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
            'origin': 'http://voice.xunjiepdf.com',
            'referer': 'http://voice.xunjiepdf.com/voice2text.html',
            'sec-fetch-dest': 'empty',
            'sec-fetch-mode': 'cors',
            'sec-fetch-site': 'cross-site',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        self.get_md5()
        self.file_name = self.get_guid() + '_' + self.file
        data = {
    
    
            'action': 'Begin',
            'fileName': self.file_name,
            'md5': self.md5_file
        }
        response = self.session.post(url,headers=headers,data=data)
        print(response.text)

    # 分片上传文件内容
    def store_upload_file(self):
        path = "/v1/alivoice/uploadaudiofile?r=" + str(random.random())
        url = "https://user.api.hudunsoft.com" + path
        headers = {
    
    
            'authority': 'user.api.hudunsoft.com',
            'method': 'POST',
            'path': path,
            'scheme': 'https',
            'accept': 'application/json, text/javascript, */*; q=0.01',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9',
            'content-length': '2097152',
            'content-type': 'multipart/form-data;',
            'origin': 'http://voice.xunjiepdf.com',
            'referer': 'http://voice.xunjiepdf.com/voice2text.html',
            'sec-fetch-dest': 'empty',
            'sec-fetch-mode': 'cors',
            'sec-fetch-site': 'cross-site',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        data = {
    
    
            'action': 'Store',
            'pos': '0',
            'size': '2097152',
            'md5': self.md5_file
        }
        with open(self.file, 'rb') as f:
            while True:
                files = f.read(2 * 1024 * 1024)
                if files:
                    data['size'] = len(files)
                    data['file'] = (self.file, files)
                    encode_data = encode_multipart_formdata(data)
                    data1 = encode_data[0]
                    headers['Content-Type'] = encode_data[1]
                    headers['content-length'] = str(len(files))
                    response = self.session.post(url,data=data1,headers=headers)
                    print(response.text)
                    f.seek(f.tell())
                    data['pos'] = f.tell()
                else:
                    print('上传完成')
                    break

    # 结束上传
    def end_upload_file(self):
        path = '/v1/alivoice/uploadaudiofile?r=' + str(random.random())
        url = 'https://user.api.hudunsoft.com' + path
        headers = {
    
    
            'authority': 'user.api.hudunsoft.com',
            'method': 'POST',
            'path': path,
            'scheme': 'https',
            'accept': 'application/json, text/javascript, */*; q=0.01',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9',
            'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
            'origin': 'http://voice.xunjiepdf.com',
            'referer': 'http://voice.xunjiepdf.com/voice2text.html',
            'sec-fetch-dest': 'empty',
            'sec-fetch-mode': 'cors',
            'sec-fetch-site': 'cross-site',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        data = {
    
    
            'action': 'End',
            'fileName': self.file_name,
            'md5': self.md5_file
        }
        response = self.session.post(url,headers=headers,data=data)
        print(response.text)

    # 访问md5ToText，也就是音频转换为文本
    def md5_to_text(self):
        url = "https://user.api.hudunsoft.com/v1/alivoice/md5Totext"
        headers = {
    
    
            'authority': 'user.api.hudunsoft.com',
            'method': 'POST',
            'path': '/v1/alivoice/md5Totext',
            'scheme': 'https',
            'accept': 'application/json, text/javascript, */*; q=0.01',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9',
            'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
            'origin': 'http://voice.xunjiepdf.com',
            'referer': 'http://voice.xunjiepdf.com/voice2text.html',
            'sec-fetch-dest': 'empty',
            'sec-fetch-mode': 'cors',
            'sec-fetch-site': 'cross-site',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        data = {
    
    
            'client': 'web',
            'source': '335',
            'soft_version': 'v3.0.1.1',
            'device_id': self.uuid,
            'md5': self.md5_file,
            'fileName': self.file,
            'title': self.file,
            'token': self.token
        }
        response = self.session.post(url,headers=headers,data=data)
        json_data = response.json()
        message = json_data['message']
        if message:
            print(message)
            print(json_data)
        else:
            self.task_id = json_data['data']['task_id']
            self.get_task_info()

    # 继续识别音频
    def get_task_info(self):
        url = 'https://user.api.hudunsoft.com/v1/alivoice/getTaskInfo'
        headers = {
    
    
            'authority': 'user.api.hudunsoft.com',
            'method': 'POST',
            'path': '/v1/alivoice/getTaskInfo',
            'scheme': 'https',
            'accept': 'application/json, text/javascript, */*; q=0.01',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9',
            'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
            'origin': 'http://voice.xunjiepdf.com',
            'referer': 'http://voice.xunjiepdf.com/voice2text.html',
            'sec-fetch-dest': 'empty',
            'sec-fetch-mode': 'cors',
            'sec-fetch-site': 'cross-site',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        data = {
    
    
            'client': 'web',
            'source': '335',
            'soft_version': 'v3.0.1.1',
            'device_id': self.uuid,
            'taskId': self.task_id
        }
        while True:
            response = self.session.post(url,headers=headers,data=data)
            json_data = response.json()
            message = json_data['message']
            if message:
                print(message)
                break
            else:
                continue

if __name__ == '__main__':
    xunjie('1.mp3')

在这里插入图片描述
自此，所有的代码都在这里面了。

彩蛋

注意：这是个小彩蛋，你们仔细看看吧，只能帮到这里了。

在这里插入图片描述

声明：本文仅供学习交流使用，请勿用于商业用途，违者后果自负。

Python3爬取迅捷语音转文字(包含持久化登陆和分片上传文件)

前言

QQ群聊

分析迅捷语音转文字网站

登陆账号

解决uuid和basicParams

解决basicParams

发送短信

解决高风险时的图片验证码

持久化登陆

分片上传音频文件

解决POST请求参数

音频转文字

彩蛋

声明：本文仅供学习交流使用，请勿用于商业用途，违者后果自负。

目录

前言

QQ群聊

分析迅捷语音转文字网站

登陆账号

解决uuid和basicParams

解决basicParams

发送短信

解决高风险时的图片验证码

持久化登陆

分片上传音频文件

解决POST请求参数

音频转文字

彩蛋

声明：本文仅供学习交流使用，请勿用于商业用途，违者后果自负。

猜你喜欢

目录

热门文章