前言
在这里我就不再一一介绍每个步骤的具体操作了,因为在上一次爬取今日头条数据的时候都已经讲的非常清楚了,所以在这里我只会在重点上讲述这个是这么实现的,如果想要看具体步骤请先去看我今日头条的文章内容,里面有非常详细的介绍以及是怎么找到加密js代码和api接口。
Python3爬取今日头条文章视频数据,完美解决as、cp、_signature的加密方法
QQ群聊
855262907
分析迅捷语音转文字网站
语音转文字整个过程:
1.登陆账号(因为非VIP只能2分钟,所以我借了一个有VIP手机号过来,但是测试的图片中还是我自己的手机号)
2.分片上传音频文件(为啥是分片上传音频呢,后面有讲解)
3.音频转文字(到这就结束了)
登陆账号
当我们输入手机号码后,点击发送,他会进行POST
请求,这个时候我们看到他的Form Data
中有很多参数,我们一一来逆向。
我们开始搜索关键参数phone
能够发现发送短信的代码就在这里面,那么就简单了。
废话不多说,直接开始打断点,看看他是怎么构造的。
我们可以发现data
的参数中只有uuid
是由Uuid.get()
构造出来的,其他参数一眼就能看出来了,所以我就不多说了,然后data
最终要进行basicParams
转换后才进行POST
请求,所以一步一步来看。
解决uuid和basicParams
通过调试发现uuid
是由Uuid.get()
构造,直接跳到这个函数来,发现有用的部分就是create
函数,get
函数只是用来判断uuid
是否存在于localstorage
中,如果存在就直接取出来用,如果不在就create
创建一个新的。
JS代码:
function create() {
var s = [];
var hexDigits = "0123456789abcdef";
for (var i = 0; i < 36; i++) {
s[i] = hexDigits.substr(Math.floor(Math.random() * 0x10), 1);
}
s[14] = "4";
s[19] = hexDigits.substr((s[19] & 0x3) | 0x8, 1);
s[8] = s[13] = s[18] = s[23] = "";
var uuid = s.join("");
return uuid;
}
我们给他进行Python还原
。
Python代码:
import math
import random
def get_uuid():
s = ['' for i in range(36)]
hexDigits = "0123456789abcdef"
for i in range(36):
s[i] = hexDigits[math.floor(random.random() * 0x10)]
s[14] = "4"
s[19] = hexDigits[(int(s[19]) if s[19].isdecimal() else 0 & 0x3) | 0x8]
s[8] = s[13] = s[18] = s[23] = ""
uuid = ''.join(s)
print(uuid)
if __name__ == '__main__':
get_uuid()
解决basicParams
我们发现basicParams
没有什么变化,只是给我们的data参数更加补充完整了,所以我们不需要逆向啥,直接都写成固定的就可以了。
发送短信
既然所有的参数都解决了,那么下面就直接上代码,开始发送短信。
Python代码:
import math
import random
import requests
class xunjie():
def __init__(self):
self.session = requests.Session()
self.get_uuid()
self.send_message()
def get_uuid(self):
s = ['' for i in range(36)]
hexDigits = "0123456789abcdef"
for i in range(36):
s[i] = hexDigits[math.floor(random.random() * 0x10)]
s[14] = "4"
s[19] = hexDigits[(int(s[19]) if s[19].isdecimal() else 0 & 0x3) | 0x8]
s[8] = s[13] = s[18] = s[23] = ""
self.uuid = ''.join(s)
def send_message(self):
self.phone = int(input('输入你的手机号码:'))
url = "https://user.api.hudunsoft.com/v1/sms"
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'POST',
'path': '/v1/sms',
'scheme': 'https',
'accept': 'application/json, text/javascript, */*; q=0.01',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'content-length': '163',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'origin': 'http://voice.xunjiepdf.com',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
data = {
'client': 'web',
'source': '335',
'soft_version': 'v3.0.1.1',
'device_id': self.uuid,
'version': 'v1.0.0',
'phone': self.phone,
'uuid': self.uuid,
'code': ''
}
while True:
response = self.session.post(url,headers=headers,data=data)
message = response.json().get('message')
if "ok" in message:
print("短信发送成功",message)
break
else:
print("短信发送失败",u'%s' % message)
if __name__ == '__main__':
xunjie()
解决高风险时的图片验证码
高风险的时候会要求你输入图片验证码,这个也非常简单,只不过我现在还没有达到高风险,所以现在看不到,也截不了图,所以就直接给你们上代码了,实现思路就是把图片下载下来,然后手动输入图片验证码,当然你也可以使用pytesseract
库来识别图片验证码,所以这里我采用最简单的方法来实现。
Python代码:
import math
import random
import requests
class xunjie():
def __init__(self):
self.session = requests.Session()
self.get_uuid()
self.send_message()
# 获取uuid
def get_uuid(self):
s = ['' for i in range(36)]
hexDigits = "0123456789abcdef"
for i in range(36):
s[i] = hexDigits[math.floor(random.random() * 0x10)]
s[14] = "4"
s[19] = hexDigits[(int(s[19]) if s[19].isdecimal() else 0 & 0x3) | 0x8]
s[8] = s[13] = s[18] = s[23] = ""
self.uuid = ''.join(s)
# 发送短信
def send_message(self):
self.phone = int(input('输入你的手机号码:'))
url = "https://user.api.hudunsoft.com/v1/sms"
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'POST',
'path': '/v1/sms',
'scheme': 'https',
'accept': 'application/json, text/javascript, */*; q=0.01',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'content-length': '163',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'origin': 'http://voice.xunjiepdf.com',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
data = {
'client': 'web',
'source': '335',
'soft_version': 'v3.0.1.1',
'device_id': self.uuid,
'version': 'v1.0.0',
'phone': self.phone,
'uuid': self.uuid,
'code': ''
}
while True:
response = self.session.post(url,headers=headers,data=data)
message = response.json().get('message')
if "ok" in message:
print("短信发送成功",message)
break
else:
print("短信发送失败",u'%s' % message)
data['code'] = self.recognition_image()
# 识别图片验证码
def recognition_image(self):
url = 'https://user.api.hudunsoft.com/v1/captcha?uuid={uuid}&time={time}&client=web&source=335'.format(uuid=self.uuid,time=str(time.time()).replace('.','')[:13])
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'GET',
'path': '/v1/captcha?uuid=e884673549de432f8487c6078bc38685&time=1597927148527&client=web&source=335',
'scheme': 'https',
'accept': 'image/webp,image/apng,image/*,*/*;q=0.8',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'image',
'sec-fetch-mode': 'no-cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
response = self.session.get(url,headers=headers)
with open('验证码.jpg','wb') as f:
f.write(response.content)
code = int(input("请查看文件内的验证码并输入:"))
return code
if __name__ == '__main__':
xunjie()
持久化登陆
从登陆成功时捕获的链接可以看出来这里面的参数都是固定的了,device_id
就是你第一次获取uuid
时的值,phone
就是你的手机号码,code
就是你的手机验证码了。
我在代码里面加了持久化登陆,因为这个迅捷的操作都是基于token的,所以我们直接记录登陆后的token就可以了。
Python代码:
import math
import random
import requests
import json
import os
import time
class xunjie():
def __init__(self):
self.session = requests.Session()
self.get_uuid()
# 持久化登陆代码
if 'cookie.txt' in os.listdir('.'):
with open('cookie.txt', 'r') as f:
cookie_data = f.read()
if cookie_data:
self.session.cookies = requests.utils.cookiejar_from_dict(json.loads(cookie_data))
else:
print('cookie.txt文件内容为空,请删除后在运行')
return True
with open('token.txt', 'r') as f:
token_data = f.read()
if token_data:
self.token = token_data
else:
print('token.txt文件内容为空,请删除后在运行')
return True
else:
self.send_message()
self.login()
# 获取uuid
def get_uuid(self):
s = ['' for i in range(36)]
hexDigits = "0123456789abcdef"
for i in range(36):
s[i] = hexDigits[math.floor(random.random() * 0x10)]
s[14] = "4"
s[19] = hexDigits[(int(s[19]) if s[19].isdecimal() else 0 & 0x3) | 0x8]
s[8] = s[13] = s[18] = s[23] = ""
self.uuid = ''.join(s)
# 发送短信
def send_message(self):
self.phone = int(input('输入你的手机号码:'))
url = "https://user.api.hudunsoft.com/v1/sms"
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'POST',
'path': '/v1/sms',
'scheme': 'https',
'accept': 'application/json, text/javascript, */*; q=0.01',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'content-length': '163',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'origin': 'http://voice.xunjiepdf.com',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
data = {
'client': 'web',
'source': '335',
'soft_version': 'v3.0.1.1',
'device_id': self.uuid,
'version': 'v1.0.0',
'phone': self.phone,
'uuid': self.uuid,
'code': ''
}
while True:
response = self.session.post(url,headers=headers,data=data)
message = response.json().get('message')
if "ok" in message:
print("短信发送成功",message)
break
else:
print("短信发送失败",u'%s' % message)
data['code'] = self.recognition_image()
# 识别图片验证码
def recognition_image(self):
url = 'https://user.api.hudunsoft.com/v1/captcha?uuid={uuid}&time={time}&client=web&source=335'.format(uuid=self.uuid,time=str(time.time()).replace('.','')[:13])
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'GET',
'path': '/v1/captcha?uuid=e884673549de432f8487c6078bc38685&time=1597927148527&client=web&source=335',
'scheme': 'https',
'accept': 'image/webp,image/apng,image/*,*/*;q=0.8',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'image',
'sec-fetch-mode': 'no-cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
response = self.session.get(url,headers=headers)
with open('验证码.jpg','wb') as f:
f.write(response.content)
code = int(input("请查看文件内的验证码并输入:"))
return code
# 登陆
def login(self):
self.code = int(input('输入你的短信验证码:'))
url = "https://user.api.hudunsoft.com/v1/user/auto_sign_in"
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'POST',
'path': '/v1/user/auto_sign_in',
'scheme': 'https',
'accept': 'application/json, text/javascript, */*; q=0.01',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'origin': 'http://voice.xunjiepdf.com',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
data = {
'client': 'web',
'source': '335',
'soft_version': 'v3.0.1.1',
'device_id': self.uuid,
'phone': self.phone,
'code': self.code
}
response = self.session.post(url,headers=headers,data=data)
json_data = response.json()
if "ok" in json_data.get('message'):
print("登陆成功")
print(json_data)
self.token = json_data.get('data').get('token')
with open('cookie.txt','w') as f:
f.write(json.dumps(requests.utils.dict_from_cookiejar(response.cookies)))
with open('token.txt','w') as f:
f.write(self.token)
else:
print("登陆失败")
if __name__ == '__main__':
xunjie()
分片上传音频文件
分片上传是什么意思呢?就是将大文件切分成多个小文件,把这些小文件都上传上去后在进行合并,合并为一个大文件。
我们上传的时候发现有3个新的POST请求
产生,这就是我们分片上传的请求链接。从这些POST请求
中Form Data
参数就能看出来,第一个POST
是分片上传的开始(仅仅是给服务器提示我要上传,做个记录),第二POST
才是真正分片上传文件的,第三个POST
是分片上传的结束(仅仅是给服务器提示我上传完毕)。
解决POST请求参数
第一个POST请求:
第一个POST请求和第三个POST请求两者参数只有action
有变化,其他均没有发生变化,md5参数和fileName是不固定的,下面开始解决这两个参数。
搜索fileName
关键字,看到下面的md5
和fileName
参数都出来,直接开始调试。
在往上面看看,发现有惊喜,分片大小是每次2M,也就是说大于2M的文件将被分为多个2M的小文件,如:3M大小的文件将被分为2M和1M的文件,然后上传上去。
还发现个大问题就是,webUploader
是有实现类的,所以我们跳进去看看,发现各种东西都是在里面进行处理的。
看到调试的file他的类型为FileInfo。
那么我们搜索他的实现类,ID = Guid.NewGuid().ToString("N");
,MD5
就是对整个文件进行MD5
运算。
Guid JS代码:
function Guid(g) {
var arr = new Array();
if (typeof (g) == "string") {
InitByString(arr, g)
} else {
InitByOther(arr)
}
;this.Equals = function(o) {
if (o && o.IsGuid) {
return this.ToString() == o.ToString()
} else {
return false
}
}
;
this.IsGuid = function() {
}
;
this.ToString = function(format) {
if (typeof (format) == "string") {
if (format == "N" || format == "D" || format == "B" || format == "P") {
return ToStringWithFormat(arr, format)
} else {
return ToStringWithFormat(arr, "D")
}
} else {
return ToStringWithFormat(arr, "D")
}
}
;
function InitByString(arr, g) {
g = g.replace(/\{|\(|\)|\}|-/g, "");
g = g.toLowerCase();
if (g.length != 32 || g.search(/[^0-9,a-f]/i) != -1) {
InitByOther(arr)
} else {
for (var i = 0; i < g.length; i++) {
arr.push(g[i])
}
}
}
;function InitByOther(arr) {
var i = 32;
while (i--) {
arr.push("0")
}
}
;function ToStringWithFormat(arr, format) {
switch (format) {
case "N":
return arr.toString().replace(/,/g, "");
case "D":
var str = arr.slice(0, 8) + "-" + arr.slice(8, 12) + "-" + arr.slice(12, 16) + "-" + arr.slice(16, 20) + "-" + arr.slice(20, 32);
str = str.replace(/,/g, "");
return str;
case "B":
var str = ToStringWithFormat(arr, "D");
str = "{" + str + "}";
return str;
case "P":
var str = ToStringWithFormat(arr, "D");
str = "(" + str + ")";
return str;
default:
return new Guid()
}
}
}
;Guid.Empty = new Guid();
Guid.NewGuid = function() {
var g = "";
var i = 32;
while (i--) {
g += Math.floor(Math.random() * 16.0).toString(16)
}
return new Guid(g)
}
;
//这两行是自己添加上去的
var id = Guid.NewGuid().ToString("N");
console.log(id);
把上面这串JS代码
保存下来,名字为guid.js
Python代码:
import os
def get_guid():
guid = os.popen('node guid.js').read().replace('\n', '')
return guid
if __name__ == '__main__':
print(get_guid())
文件MD5值运算:
Python代码:
import hashlib
def get_md5():
md5 = hashlib.md5()
with open('1.mp3', 'rb') as f:
md5.update(f.read())
md5_file = md5.hexdigest()
print(md5_file)
if __name__ == '__main__':
get_md5()
通过这串代码(webUploader里面的)可以看出,这就是我们的第一个POST请求,参数就是data: { action: 'Begin', fileName: currentFile.ID + "_" + currentFile.Name, md5: currentFile.MD5 }
,这里的currentFile
就是我们看到的FileInfo
。
分析这么久了,开始上代码了。
Python代码:
import math
import random
import requests
import json
import os
import hashlib
import time
class xunjie():
def __init__(self):
self.session = requests.Session()
self.get_uuid()
# 持久化登陆代码
if 'cookie.txt' in os.listdir('.'):
with open('cookie.txt', 'r') as f:
cookie_data = f.read()
if cookie_data:
self.session.cookies = requests.utils.cookiejar_from_dict(json.loads(cookie_data))
else:
print('cookie.txt文件内容为空,请删除后在运行')
return True
with open('token.txt', 'r') as f:
token_data = f.read()
if token_data:
self.token = token_data
else:
print('token.txt文件内容为空,请删除后在运行')
return True
else:
self.send_message()
self.login()
self.start_upload_file()
# 获取uuid
def get_uuid(self):
s = ['' for i in range(36)]
hexDigits = "0123456789abcdef"
for i in range(36):
s[i] = hexDigits[math.floor(random.random() * 0x10)]
s[14] = "4"
s[19] = hexDigits[(int(s[19]) if s[19].isdecimal() else 0 & 0x3) | 0x8]
s[8] = s[13] = s[18] = s[23] = ""
self.uuid = ''.join(s)
# 发送短信
def send_message(self):
self.phone = int(input('输入你的手机号码:'))
url = "https://user.api.hudunsoft.com/v1/sms"
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'POST',
'path': '/v1/sms',
'scheme': 'https',
'accept': 'application/json, text/javascript, */*; q=0.01',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'content-length': '163',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'origin': 'http://voice.xunjiepdf.com',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
data = {
'client': 'web',
'source': '335',
'soft_version': 'v3.0.1.1',
'device_id': self.uuid,
'version': 'v1.0.0',
'phone': self.phone,
'uuid': self.uuid,
'code': ''
}
while True:
response = self.session.post(url,headers=headers,data=data)
message = response.json().get('message')
if "ok" in message:
print("短信发送成功",message)
break
else:
print("短信发送失败",u'%s' % message)
data['code'] = self.recognition_image()
# 识别图片验证码
def recognition_image(self):
url = 'https://user.api.hudunsoft.com/v1/captcha?uuid={uuid}&time={time}&client=web&source=335'.format(uuid=self.uuid,time=str(time.time()).replace('.','')[:13])
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'GET',
'path': '/v1/captcha?uuid=e884673549de432f8487c6078bc38685&time=1597927148527&client=web&source=335',
'scheme': 'https',
'accept': 'image/webp,image/apng,image/*,*/*;q=0.8',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'image',
'sec-fetch-mode': 'no-cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
response = self.session.get(url,headers=headers)
with open('验证码.jpg','wb') as f:
f.write(response.content)
code = int(input("请查看文件内的验证码并输入:"))
return code
# 登陆
def login(self):
self.code = int(input('输入你的短信验证码:'))
url = "https://user.api.hudunsoft.com/v1/user/auto_sign_in"
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'POST',
'path': '/v1/user/auto_sign_in',
'scheme': 'https',
'accept': 'application/json, text/javascript, */*; q=0.01',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'origin': 'http://voice.xunjiepdf.com',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
data = {
'client': 'web',
'source': '335',
'soft_version': 'v3.0.1.1',
'device_id': self.uuid,
'phone': self.phone,
'code': self.code
}
response = self.session.post(url,headers=headers,data=data)
json_data = response.json()
if "ok" in json_data.get('message'):
print("登陆成功")
print(json_data)
self.token = json_data.get('data').get('token')
with open('cookie.txt','w') as f:
f.write(json.dumps(requests.utils.dict_from_cookiejar(response.cookies)))
with open('token.txt','w') as f:
f.write(self.token)
else:
print("登陆失败")
# 获取GUID
def get_guid(self):
guid = os.popen('node guid.js').read().replace('\n', '')
return guid
# 获取文件md5值
def get_md5(self):
md5 = hashlib.md5()
with open(self.file, 'rb') as f:
md5.update(f.read())
self.md5_file = md5.hexdigest()
# 开始上传
def start_upload_file(self):
path = '/v1/alivoice/uploadaudiofile?r=' + str(random.random())
url = 'https://user.api.hudunsoft.com' + path
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'POST',
'path': path,
'scheme': 'https',
'accept': 'application/json, text/javascript, */*; q=0.01',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'origin': 'http://voice.xunjiepdf.com',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
self.file = '1.mp3'
self.get_md5()
self.file_name = self.get_guid() + '_' + self.file
data = {
'action': 'Begin',
'fileName': self.file_name,
'md5': self.md5_file
}
response = self.session.post(url,headers=headers,data=data)
print(response.text)
if __name__ == '__main__':
xunjie()
重复上传一样的,就会返回{"pos":"-1"}
,如果是新上传的就会返回{"pos":"0"}
。
第二个POST请求:
这个请求我就不在带你们看了,直接就上代码了,因为前面的部分都已经讲的很清楚了。
Python代码:
import math
import random
import requests
import json
import os
import hashlib
import time
from urllib3 import encode_multipart_formdata
class xunjie():
def __init__(self):
self.session = requests.Session()
self.get_uuid()
# 持久化登陆代码
if 'cookie.txt' in os.listdir('.'):
with open('cookie.txt', 'r') as f:
cookie_data = f.read()
if cookie_data:
self.session.cookies = requests.utils.cookiejar_from_dict(json.loads(cookie_data))
else:
print('cookie.txt文件内容为空,请删除后在运行')
return True
with open('token.txt', 'r') as f:
token_data = f.read()
if token_data:
self.token = token_data
else:
print('token.txt文件内容为空,请删除后在运行')
return True
else:
self.send_message()
self.login()
self.start_upload_file()
self.store_upload_file()
# 获取uuid
def get_uuid(self):
s = ['' for i in range(36)]
hexDigits = "0123456789abcdef"
for i in range(36):
s[i] = hexDigits[math.floor(random.random() * 0x10)]
s[14] = "4"
s[19] = hexDigits[(int(s[19]) if s[19].isdecimal() else 0 & 0x3) | 0x8]
s[8] = s[13] = s[18] = s[23] = ""
self.uuid = ''.join(s)
# 发送短信
def send_message(self):
self.phone = int(input('输入你的手机号码:'))
url = "https://user.api.hudunsoft.com/v1/sms"
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'POST',
'path': '/v1/sms',
'scheme': 'https',
'accept': 'application/json, text/javascript, */*; q=0.01',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'content-length': '163',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'origin': 'http://voice.xunjiepdf.com',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
data = {
'client': 'web',
'source': '335',
'soft_version': 'v3.0.1.1',
'device_id': self.uuid,
'version': 'v1.0.0',
'phone': self.phone,
'uuid': self.uuid,
'code': ''
}
while True:
response = self.session.post(url,headers=headers,data=data)
message = response.json().get('message')
if "ok" in message:
print("短信发送成功",message)
break
else:
print("短信发送失败",u'%s' % message)
data['code'] = self.recognition_image()
# 识别图片验证码
def recognition_image(self):
url = 'https://user.api.hudunsoft.com/v1/captcha?uuid={uuid}&time={time}&client=web&source=335'.format(uuid=self.uuid,time=str(time.time()).replace('.','')[:13])
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'GET',
'path': '/v1/captcha?uuid=e884673549de432f8487c6078bc38685&time=1597927148527&client=web&source=335',
'scheme': 'https',
'accept': 'image/webp,image/apng,image/*,*/*;q=0.8',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'image',
'sec-fetch-mode': 'no-cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
response = self.session.get(url,headers=headers)
with open('验证码.jpg','wb') as f:
f.write(response.content)
code = int(input("请查看文件内的验证码并输入:"))
return code
# 登陆
def login(self):
self.code = int(input('输入你的短信验证码:'))
url = "https://user.api.hudunsoft.com/v1/user/auto_sign_in"
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'POST',
'path': '/v1/user/auto_sign_in',
'scheme': 'https',
'accept': 'application/json, text/javascript, */*; q=0.01',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'origin': 'http://voice.xunjiepdf.com',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
data = {
'client': 'web',
'source': '335',
'soft_version': 'v3.0.1.1',
'device_id': self.uuid,
'phone': self.phone,
'code': self.code
}
response = self.session.post(url,headers=headers,data=data)
json_data = response.json()
if "ok" in json_data.get('message'):
print("登陆成功")
print(json_data)
self.token = json_data.get('data').get('token')
with open('cookie.txt','w') as f:
f.write(json.dumps(requests.utils.dict_from_cookiejar(response.cookies)))
with open('token.txt','w') as f:
f.write(self.token)
else:
print("登陆失败")
# 获取GUID
def get_guid(self):
guid = os.popen('node guid.js').read().replace('\n', '')
return guid
# 获取文件md5值
def get_md5(self):
md5 = hashlib.md5()
with open(self.file, 'rb') as f:
md5.update(f.read())
self.md5_file = md5.hexdigest()
# 开始上传
def start_upload_file(self):
path = '/v1/alivoice/uploadaudiofile?r=' + str(random.random())
url = 'https://user.api.hudunsoft.com' + path
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'POST',
'path': path,
'scheme': 'https',
'accept': 'application/json, text/javascript, */*; q=0.01',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'origin': 'http://voice.xunjiepdf.com',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
self.file = '1.mp3'
self.get_md5()
self.file_name = self.get_guid() + '_' + self.file
data = {
'action': 'Begin',
'fileName': self.file_name,
'md5': self.md5_file
}
response = self.session.post(url,headers=headers,data=data)
print(response.text)
# 分片上传文件内容
def store_upload_file(self):
path = "/v1/alivoice/uploadaudiofile?r=" + str(random.random())
url = "https://user.api.hudunsoft.com" + path
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'POST',
'path': path,
'scheme': 'https',
'accept': 'application/json, text/javascript, */*; q=0.01',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'content-length': '2097152',
'content-type': 'multipart/form-data;',
'origin': 'http://voice.xunjiepdf.com',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
data = {
'action': 'Store',
'pos': '0',
'size': '2097152',
'md5': self.md5_file
}
with open(self.file, 'rb') as f:
while True:
files = f.read(2 * 1024 * 1024)
if files:
data['size'] = len(files)
data['file'] = (self.file, files)
encode_data = encode_multipart_formdata(data)
data1 = encode_data[0]
headers['Content-Type'] = encode_data[1]
headers['content-length'] = str(len(files))
response = self.session.post(url,data=data1,headers=headers)
print(response.text)
f.seek(f.tell())
data['pos'] = f.tell()
else:
print('上传完成')
break
if __name__ == '__main__':
xunjie()
第三个POST请求:
这个请求我就也不在带你们看了,直接就上代码了,因为前面的部分都已经讲的很清楚了。
Python代码:
import math
import random
import requests
import json
import os
import hashlib
import time
from urllib3 import encode_multipart_formdata
class xunjie():
def __init__(self):
self.session = requests.Session()
self.get_uuid()
# 持久化登陆代码
if 'cookie.txt' in os.listdir('.'):
with open('cookie.txt', 'r') as f:
cookie_data = f.read()
if cookie_data:
self.session.cookies = requests.utils.cookiejar_from_dict(json.loads(cookie_data))
else:
print('cookie.txt文件内容为空,请删除后在运行')
return True
with open('token.txt', 'r') as f:
token_data = f.read()
if token_data:
self.token = token_data
else:
print('token.txt文件内容为空,请删除后在运行')
return True
else:
self.send_message()
self.login()
self.start_upload_file()
self.store_upload_file()
self.end_upload_file()
# 获取uuid
def get_uuid(self):
s = ['' for i in range(36)]
hexDigits = "0123456789abcdef"
for i in range(36):
s[i] = hexDigits[math.floor(random.random() * 0x10)]
s[14] = "4"
s[19] = hexDigits[(int(s[19]) if s[19].isdecimal() else 0 & 0x3) | 0x8]
s[8] = s[13] = s[18] = s[23] = ""
self.uuid = ''.join(s)
# 发送短信
def send_message(self):
self.phone = int(input('输入你的手机号码:'))
url = "https://user.api.hudunsoft.com/v1/sms"
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'POST',
'path': '/v1/sms',
'scheme': 'https',
'accept': 'application/json, text/javascript, */*; q=0.01',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'content-length': '163',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'origin': 'http://voice.xunjiepdf.com',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
data = {
'client': 'web',
'source': '335',
'soft_version': 'v3.0.1.1',
'device_id': self.uuid,
'version': 'v1.0.0',
'phone': self.phone,
'uuid': self.uuid,
'code': ''
}
while True:
response = self.session.post(url,headers=headers,data=data)
message = response.json().get('message')
if "ok" in message:
print("短信发送成功",message)
break
else:
print("短信发送失败",u'%s' % message)
data['code'] = self.recognition_image()
# 识别图片验证码
def recognition_image(self):
url = 'https://user.api.hudunsoft.com/v1/captcha?uuid={uuid}&time={time}&client=web&source=335'.format(uuid=self.uuid,time=str(time.time()).replace('.','')[:13])
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'GET',
'path': '/v1/captcha?uuid=e884673549de432f8487c6078bc38685&time=1597927148527&client=web&source=335',
'scheme': 'https',
'accept': 'image/webp,image/apng,image/*,*/*;q=0.8',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'image',
'sec-fetch-mode': 'no-cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
response = self.session.get(url,headers=headers)
with open('验证码.jpg','wb') as f:
f.write(response.content)
code = int(input("请查看文件内的验证码并输入:"))
return code
# 登陆
def login(self):
self.code = int(input('输入你的短信验证码:'))
url = "https://user.api.hudunsoft.com/v1/user/auto_sign_in"
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'POST',
'path': '/v1/user/auto_sign_in',
'scheme': 'https',
'accept': 'application/json, text/javascript, */*; q=0.01',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'origin': 'http://voice.xunjiepdf.com',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
data = {
'client': 'web',
'source': '335',
'soft_version': 'v3.0.1.1',
'device_id': self.uuid,
'phone': self.phone,
'code': self.code
}
response = self.session.post(url,headers=headers,data=data)
json_data = response.json()
if "ok" in json_data.get('message'):
print("登陆成功")
print(json_data)
self.token = json_data.get('data').get('token')
with open('cookie.txt','w') as f:
f.write(json.dumps(requests.utils.dict_from_cookiejar(response.cookies)))
with open('token.txt','w') as f:
f.write(self.token)
else:
print("登陆失败")
# 获取GUID
def get_guid(self):
guid = os.popen('node guid.js').read().replace('\n', '')
return guid
# 获取文件md5值
def get_md5(self):
md5 = hashlib.md5()
with open(self.file, 'rb') as f:
md5.update(f.read())
self.md5_file = md5.hexdigest()
# 开始上传
def start_upload_file(self):
path = '/v1/alivoice/uploadaudiofile?r=' + str(random.random())
url = 'https://user.api.hudunsoft.com' + path
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'POST',
'path': path,
'scheme': 'https',
'accept': 'application/json, text/javascript, */*; q=0.01',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'origin': 'http://voice.xunjiepdf.com',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
self.file = '1.mp3'
self.get_md5()
self.file_name = self.get_guid() + '_' + self.file
data = {
'action': 'Begin',
'fileName': self.file_name,
'md5': self.md5_file
}
response = self.session.post(url,headers=headers,data=data)
print(response.text)
# 分片上传文件内容
def store_upload_file(self):
path = "/v1/alivoice/uploadaudiofile?r=" + str(random.random())
url = "https://user.api.hudunsoft.com" + path
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'POST',
'path': path,
'scheme': 'https',
'accept': 'application/json, text/javascript, */*; q=0.01',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'content-length': '2097152',
'content-type': 'multipart/form-data;',
'origin': 'http://voice.xunjiepdf.com',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
data = {
'action': 'Store',
'pos': '0',
'size': '2097152',
'md5': self.md5_file
}
with open(self.file, 'rb') as f:
while True:
files = f.read(2 * 1024 * 1024)
if files:
data['size'] = len(files)
data['file'] = (self.file, files)
encode_data = encode_multipart_formdata(data)
data1 = encode_data[0]
headers['Content-Type'] = encode_data[1]
headers['content-length'] = str(len(files))
response = self.session.post(url,data=data1,headers=headers)
print(response.text)
f.seek(f.tell())
data['pos'] = f.tell()
else:
print('上传完成')
break
# 结束上传
def end_upload_file(self):
path = '/v1/alivoice/uploadaudiofile?r=' + str(random.random())
url = 'https://user.api.hudunsoft.com' + path
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'POST',
'path': path,
'scheme': 'https',
'accept': 'application/json, text/javascript, */*; q=0.01',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'origin': 'http://voice.xunjiepdf.com',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
data = {
'action': 'End',
'fileName': self.file_name,
'md5': self.md5_file
}
response = self.session.post(url,headers=headers,data=data)
print(response.text)
if __name__ == '__main__':
xunjie()
音频转文字
当我们点击转换文字时,产生的POST请求,为啥会有多个呢,是因为前面没有转换成功就会继续请求。
音频转文字失败:
音频转文字成功:
直接使用Python
来进行请求,这个请求参数也没有什么变化,直接请求即可。
Python代码:
import math
import random
import requests
import json
import os
import hashlib
import time
from urllib3 import encode_multipart_formdata
class xunjie():
def __init__(self,file):
self.file = file
self.session = requests.Session()
self.get_uuid()
# 持久化登陆代码
if 'cookie.txt' in os.listdir('.'):
with open('cookie.txt', 'r') as f:
cookie_data = f.read()
if cookie_data:
self.session.cookies = requests.utils.cookiejar_from_dict(json.loads(cookie_data))
else:
print('cookie.txt文件内容为空,请删除后在运行')
return True
with open('token.txt', 'r') as f:
token_data = f.read()
if token_data:
self.token = token_data
else:
print('token.txt文件内容为空,请删除后在运行')
return True
else:
self.send_message()
self.login()
self.start_upload_file()
self.store_upload_file()
self.end_upload_file()
self.md5_to_text()
# 获取uuid
def get_uuid(self):
s = ['' for i in range(36)]
hexDigits = "0123456789abcdef"
for i in range(36):
s[i] = hexDigits[math.floor(random.random() * 0x10)]
s[14] = "4"
s[19] = hexDigits[(int(s[19]) if s[19].isdecimal() else 0 & 0x3) | 0x8]
s[8] = s[13] = s[18] = s[23] = ""
self.uuid = ''.join(s)
# 发送短信
def send_message(self):
self.phone = int(input('输入你的手机号码:'))
url = "https://user.api.hudunsoft.com/v1/sms"
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'POST',
'path': '/v1/sms',
'scheme': 'https',
'accept': 'application/json, text/javascript, */*; q=0.01',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'content-length': '163',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'origin': 'http://voice.xunjiepdf.com',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
data = {
'client': 'web',
'source': '335',
'soft_version': 'v3.0.1.1',
'device_id': self.uuid,
'version': 'v1.0.0',
'phone': self.phone,
'uuid': self.uuid,
'code': ''
}
while True:
response = self.session.post(url,headers=headers,data=data)
message = response.json().get('message')
if "ok" in message:
print("短信发送成功",message)
break
else:
print("短信发送失败",u'%s' % message)
data['code'] = self.recognition_image()
# 识别图片验证码
def recognition_image(self):
url = 'https://user.api.hudunsoft.com/v1/captcha?uuid={uuid}&time={time}&client=web&source=335'.format(uuid=self.uuid,time=str(time.time()).replace('.','')[:13])
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'GET',
'path': '/v1/captcha?uuid=e884673549de432f8487c6078bc38685&time=1597927148527&client=web&source=335',
'scheme': 'https',
'accept': 'image/webp,image/apng,image/*,*/*;q=0.8',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'image',
'sec-fetch-mode': 'no-cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
response = self.session.get(url,headers=headers)
with open('验证码.jpg','wb') as f:
f.write(response.content)
code = int(input("请查看文件内的验证码并输入:"))
return code
# 登陆
def login(self):
self.code = int(input('输入你的短信验证码:'))
url = "https://user.api.hudunsoft.com/v1/user/auto_sign_in"
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'POST',
'path': '/v1/user/auto_sign_in',
'scheme': 'https',
'accept': 'application/json, text/javascript, */*; q=0.01',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'origin': 'http://voice.xunjiepdf.com',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
data = {
'client': 'web',
'source': '335',
'soft_version': 'v3.0.1.1',
'device_id': self.uuid,
'phone': self.phone,
'code': self.code
}
response = self.session.post(url,headers=headers,data=data)
json_data = response.json()
if "ok" in json_data.get('message'):
print("登陆成功")
print(json_data)
self.token = json_data.get('data').get('token')
with open('cookie.txt','w') as f:
f.write(json.dumps(requests.utils.dict_from_cookiejar(response.cookies)))
with open('token.txt','w') as f:
f.write(self.token)
else:
print("登陆失败")
# 获取GUID
def get_guid(self):
guid = os.popen('node guid.js').read().replace('\n', '')
return guid
# 获取文件md5值
def get_md5(self):
md5 = hashlib.md5()
with open(self.file, 'rb') as f:
md5.update(f.read())
self.md5_file = md5.hexdigest()
# 开始上传
def start_upload_file(self):
path = '/v1/alivoice/uploadaudiofile?r=' + str(random.random())
url = 'https://user.api.hudunsoft.com' + path
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'POST',
'path': path,
'scheme': 'https',
'accept': 'application/json, text/javascript, */*; q=0.01',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'origin': 'http://voice.xunjiepdf.com',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
self.get_md5()
self.file_name = self.get_guid() + '_' + self.file
data = {
'action': 'Begin',
'fileName': self.file_name,
'md5': self.md5_file
}
response = self.session.post(url,headers=headers,data=data)
print(response.text)
# 分片上传文件内容
def store_upload_file(self):
path = "/v1/alivoice/uploadaudiofile?r=" + str(random.random())
url = "https://user.api.hudunsoft.com" + path
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'POST',
'path': path,
'scheme': 'https',
'accept': 'application/json, text/javascript, */*; q=0.01',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'content-length': '2097152',
'content-type': 'multipart/form-data;',
'origin': 'http://voice.xunjiepdf.com',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
data = {
'action': 'Store',
'pos': '0',
'size': '2097152',
'md5': self.md5_file
}
with open(self.file, 'rb') as f:
while True:
files = f.read(2 * 1024 * 1024)
if files:
data['size'] = len(files)
data['file'] = (self.file, files)
encode_data = encode_multipart_formdata(data)
data1 = encode_data[0]
headers['Content-Type'] = encode_data[1]
headers['content-length'] = str(len(files))
response = self.session.post(url,data=data1,headers=headers)
print(response.text)
f.seek(f.tell())
data['pos'] = f.tell()
else:
print('上传完成')
break
# 结束上传
def end_upload_file(self):
path = '/v1/alivoice/uploadaudiofile?r=' + str(random.random())
url = 'https://user.api.hudunsoft.com' + path
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'POST',
'path': path,
'scheme': 'https',
'accept': 'application/json, text/javascript, */*; q=0.01',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'origin': 'http://voice.xunjiepdf.com',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
data = {
'action': 'End',
'fileName': self.file_name,
'md5': self.md5_file
}
response = self.session.post(url,headers=headers,data=data)
print(response.text)
# 访问md5ToText,也就是音频转换为文本
def md5_to_text(self):
url = "https://user.api.hudunsoft.com/v1/alivoice/md5Totext"
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'POST',
'path': '/v1/alivoice/md5Totext',
'scheme': 'https',
'accept': 'application/json, text/javascript, */*; q=0.01',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'origin': 'http://voice.xunjiepdf.com',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
data = {
'client': 'web',
'source': '335',
'soft_version': 'v3.0.1.1',
'device_id': self.uuid,
'md5': self.md5_file,
'fileName': self.file,
'title': self.file,
'token': self.token
}
response = self.session.post(url,headers=headers,data=data)
json_data = response.json()
message = json_data['message']
if message:
print(message)
print(json_data)
else:
self.task_id = json_data['data']['task_id']
self.get_task_info()
# 继续识别音频
def get_task_info(self):
url = 'https://user.api.hudunsoft.com/v1/alivoice/getTaskInfo'
headers = {
'authority': 'user.api.hudunsoft.com',
'method': 'POST',
'path': '/v1/alivoice/getTaskInfo',
'scheme': 'https',
'accept': 'application/json, text/javascript, */*; q=0.01',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'origin': 'http://voice.xunjiepdf.com',
'referer': 'http://voice.xunjiepdf.com/voice2text.html',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
data = {
'client': 'web',
'source': '335',
'soft_version': 'v3.0.1.1',
'device_id': self.uuid,
'taskId': self.task_id
}
while True:
response = self.session.post(url,headers=headers,data=data)
json_data = response.json()
message = json_data['message']
if message:
print(message)
break
else:
continue
if __name__ == '__main__':
xunjie('1.mp3')
自此,所有的代码都在这里面了。
彩蛋
注意:这是个小彩蛋,你们仔细看看吧,只能帮到这里了。