利用科大讯飞API来实现语音识别,利用Java SWT来封装界面。
科大讯飞API
语音识别的API可以免费试用5小时,许多厂家已经开放了语音识别的API例如百度,阿里等,这里使用科大讯飞的API来实现。其实也可以自己训练数据来实现语音识别的功能,只不过识别率可能不是太高,具体实现原理可以参考如下:日后有时间可以研究一下。
https://blog.ailemon.me/2018/08/29/asrt-a-chinese-speech-recognition-system/
https://github.com/nl8590687/ASRT_SpeechRecognition
声学模型通过采用卷积神经网络(CNN)和连接性时序分类(CTC)方法,使用大量中文语音数据集进行训练,将声音转录为中文拼音,并通过语言模型,将拼音序列转换为中文文本。
登录科大讯飞网址:https://www.xfyun.cn/services/lfasr
下载Java SDK
新建应用,获取appId以及secret
在SDK中配置appId以及secret
# APP ID
app_id=
# secret key
secret_key=
# we support both http and https prototype
lfasr_host=http://raasr.xfyun.cn/api
# file piece size
file_piece_size=10485760
# store path: this is not the store path for the result json file, but the path for the file piece during upload
store_path=F://
Demo中给出一个测试用例:
使用过程如下:
初始化LFASRClient实例
// 初始化LFASRClient实例
LfasrClientImp lc = null;
try {
lc = LfasrClientImp.initLfasrClient();
} catch (LfasrException e) {
// 初始化异常,解析异常描述信息
Message initMsg = JSON.parseObject(e.getMessage(), Message.class);
System.out.println("ecode=" + initMsg.getErr_no());
System.out.println("failed=" + initMsg.getFailed());
}
上传语音文件
// 上传音频文件
Message uploadMsg = lc.lfasrUpload(local_file, type, params);
// 判断返回值
int ok = uploadMsg.getOk();
if (ok == 0) {
// 创建任务成功
task_id = uploadMsg.getData();
循环等待任务处理结果:
// 循环等待音频处理结果
while (true) {
try {
// 等待20s在获取任务进度
Thread.sleep(sleepSecond * 1000);
System.out.println("waiting ...");
} catch (InterruptedException e) {
e.printStackTrace();
}
try {
// 获取处理进度
Message progressMsg = lc.lfasrGetProgress(task_id);
// 如果返回状态不等于0,则任务失败
if (progressMsg.getOk() != 0) {
System.out.println("task was fail. task_id:" + task_id);
System.out.println("ecode=" + progressMsg.getErr_no());
System.out.println("failed=" + progressMsg.getFailed());
return;
} else {
ProgressStatus progressStatus = JSON.parseObject(progressMsg.getData(), ProgressStatus.class);
if (progressStatus.getStatus() == 9) {
// 处理完成
System.out.println("task was completed. task_id:" + task_id);
break;
} else {
// 未处理完成
System.out.println("task is incomplete. task_id:" + task_id + ", status:" + progressStatus.getDesc());
continue;
}
}
} catch (LfasrException e) {
// 获取进度异常处理,根据返回信息排查问题后,再次进行获取
Message progressMsg = JSON.parseObject(e.getMessage(), Message.class);
System.out.println("ecode=" + progressMsg.getErr_no());
System.out.println("failed=" + progressMsg.getFailed());
}
}
获取最终结果:
// 获取任务结果
try {
Message resultMsg = lc.lfasrGetResult(task_id);
// 如果返回状态等于0,则获取任务结果成功
if (resultMsg.getOk() == 0) {
// 打印转写结果
System.out.println(resultMsg.getData());
System.out.println(Test.getFinalResult(resultMsg.getData()));
} else {
// 获取任务结果失败
System.out.println("ecode=" + resultMsg.getErr_no());
System.out.println("failed=" + resultMsg.getFailed());
}
} catch (LfasrException e) {
// 获取结果异常处理,解析异常描述信息
Message resultMsg = JSON.parseObject(e.getMessage(), Message.class);
System.out.println("ecode=" + resultMsg.getErr_no());
System.out.println("failed=" + resultMsg.getFailed());
}
resultMsg.getData()返回一个json数组,里面有多个元素,在此将“onebest”元素取出拼接组成最终的输出文本。
String str = "[{\"bg\":\"0\",\"ed\":\"2180\",\"onebest\":\"科大讯飞是中国最大!\",\"si\":\"0\",\"speaker\":\"0\","
+ "\"wordsResultList\":[{\"alternativeList\":[],\"wc\":\"1.0000\",\"wordBg\":\"6\",\"wordEd\":\"114\",\"wordsName\":"
+ "\"科大讯飞\",\"wp\":\"n\"},{\"alternativeList\":[],\"wc\":\"1.0000\",\"wordBg\":\"118\",\"wordEd\":\"147\",\"wordsName\""
+ ":\"是\",\"wp\":\"n\"},{\"alternativeList\":[],\"wc\":\"1.0000\",\"wordBg\":\"148\",\"wordEd\":\"193\",\"wordsName\":\"中国\","
+ "\"wp\":\"n\"},{\"alternativeList\":[],\"wc\":\"1.0000\",\"wordBg\":\"194\",\"wordEd\":\"213\",\"wordsName\":\"最\","
+ "\"wp\":\"n\"},{\"alternativeList\":[],\"wc\":\"1.0000\",\"wordBg\":\"214\",\"wordEd\":\"218\",\"wordsName\":\"大\","
+ "\"wp\":\"n\"},{\"alternativeList\":[],\"wc\":\"0.0000\",\"wordBg\":\"218\",\"wordEd\":\"218\",\"wordsName\":\"!\","
+ "\"wp\":\"p\"},{\"alternativeList\":[],\"wc\":\"0.0000\",\"wordBg\":\"218\",\"wordEd\":\"218\",\"wordsName\":\"\","
+ "\"wp\":\"g\"}]},{\"bg\":\"2190\",\"ed\":\"3080\",\"onebest\":\"的智能。\",\"si\":\"1\",\"speaker\":\"0\","
+ "\"wordsResultList\":[{\"alternativeList\":[],\"wc\":\"1.0000\",\"wordBg\":\"15\",\"wordEd\":\"42\","
+ "\"wordsName\":\"的\",\"wp\":\"n\"},{\"alternativeList\":[],\"wc\":\"1.0000\",\"wordBg\":\"47\",\"wordEd\":\"89\","
+ "\"wordsName\":\"智能\",\"wp\":\"n\"},{\"alternativeList\":[],\"wc\":\"0.0000\",\"wordBg\":\"89\",\"wordEd\":\"89\","
+ "\"wordsName\":\"。\",\"wp\":\"p\"},{\"alternativeList\":[],\"wc\":\"0.0000\",\"wordBg\":\"89\",\"wordEd\":\"89\","
+ "\"wordsName\":\"\",\"wp\":\"g\"}]},{\"bg\":\"3090\",\"ed\":\"4950\",\"onebest\":\"语音技术提供商,\",\"si\":\"2\","
+ "\"speaker\":\"0\",\"wordsResultList\":[{\"alternativeList\":[],\"wc\":\"1.0000\",\"wordBg\":\"4\",\"wordEd\":\"46\","
+ "\"wordsName\":\"语音\",\"wp\":\"n\"},{\"alternativeList\":[],\"wc\":\"1.0000\",\"wordBg\":\"47\",\"wordEd\":\"92\","
+ "\"wordsName\":\"技术\",\"wp\":\"n\"},{\"alternativeList\":[],\"wc\":\"1.0000\",\"wordBg\":\"93\",\"wordEd\":\"164\","
+ "\"wordsName\":\"提供商\",\"wp\":\"n\"},{\"alternativeList\":[],\"wc\":\"0.0000\",\"wordBg\":\"164\",\"wordEd\":\"164\","
+ "\"wordsName\":\",\",\"wp\":\"p\"}]}]";
public static String getFinalResult(String data){
JSONArray ja = JSONArray.parseArray(data);
StringBuilder sb = new StringBuilder();
for(int i=0; i<ja.size(); i++){
//System.out.println(ja.get(i));
sb.append(JSON.parseObject(ja.get(i).toString()).get("onebest"));
//System.out.println(JSON.parseObject(ja.get(i).toString()).get("onebest"));
}
return sb.toString();
}
SWT界面
直接使用有点费劲,想利用SWT来封装一个客户端,这里使用Eclipse来开发,首先安装SWT环境
参考地址如下:https://www.cnblogs.com/xinyan123/p/6225194.html
下载SWT插件:https://www.eclipse.org/windowbuilder/download.php
将安装包features以及plugins放入到eclipse安装目录对应文件夹下,重启eclipse
新建SWT工程
新建一个ApplicationWindow
可以使用图形化界面来进行界面UI设计
SWT核心实现:开始转换按钮的实现逻辑
//开始转换按钮
Button startThansfer = new Button(container, SWT.NONE);
startThansfer.addSelectionListener(new SelectionAdapter() {
@Override
public void widgetSelected(SelectionEvent e) {
logDetailText.append(datePrefix + "开始转换........" + "\n");
startThansfer.setEnabled(false);
voicePath = voicePathText.getText();
textPath = textPathText.getText();
int status = 0;
Callable<Integer> f = new TransferThread(logDetailText, countDownLatch, datePrefix, voicePath, textPath);
//Callable<Integer> f = new TransferThreadAsyc(parent, logDetailText, countDownLatch, datePrefix, voicePath, textPath);
try {
status = f.call();
} catch (Exception e2) {
// TODO Auto-generated catch block
e2.printStackTrace();
}
try {
countDownLatch.await();
} catch (InterruptedException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
if(status == 1){
logDetailText.append(datePrefix + "转换完成" + "\n");
}else{
logDetailText.append(datePrefix + "转换失败" + "\n");
}
startThansfer.setEnabled(true);
}
});
转换执行线程工作类:
package com.voice.text;
import java.io.FileOutputStream;
import java.util.HashMap;
import java.util.concurrent.Callable;
import java.util.concurrent.CountDownLatch;
import org.eclipse.swt.widgets.Text;
import com.alibaba.fastjson.JSON;
import com.iflytek.msp.cpdb.lfasr.client.LfasrClientImp;
import com.iflytek.msp.cpdb.lfasr.exception.LfasrException;
import com.iflytek.msp.cpdb.lfasr.model.LfasrType;
import com.iflytek.msp.cpdb.lfasr.model.Message;
import com.iflytek.msp.cpdb.lfasr.model.ProgressStatus;
import com.iflytek.voicecloud.lfasr.demo.Test;
public class TransferThread implements Callable<Integer> {
private Text logDetailText;
private CountDownLatch countDownLatch;
private LfasrType type = LfasrType.LFASR_STANDARD_RECORDED_AUDIO;
private int sleepSecond = 20;
private String datePrefix;
private String voicePath;
private String textPath;
public TransferThread(Text logDetailText, CountDownLatch countDownLatch, String datePrefix, String voicePath, String textPath) {
this.logDetailText = logDetailText;
this.countDownLatch = countDownLatch;
this.datePrefix = datePrefix;
this.voicePath = voicePath;
this.textPath = textPath;
}
@Override
public Integer call() throws Exception {
// 初始化LFASRClient实例
LfasrClientImp lc = null;
try {
lc = LfasrClientImp.initLfasrClient();
} catch (LfasrException e) {
// 初始化异常,解析异常描述信息
Message initMsg = JSON.parseObject(e.getMessage(), Message.class);
logDetailText.append(datePrefix + "ecode=" + initMsg.getErr_no() + "\n");
////System.out.println("ecode=" + initMsg.getErr_no());
logDetailText.append(datePrefix + "failed=" + initMsg.getFailed() + "\n");
//System.out.println(datePrefix + "failed=" + initMsg.getFailed());
countDownLatch.countDown();
return -1;
}
// 获取上传任务ID
String task_id = "";
HashMap<String, String> params = new HashMap<String, String>();
params.put("has_participle", "true");
//合并后标准版开启电话版功能
//params.put("has_seperate", "true");
try {
// 上传音频文件
Message uploadMsg = lc.lfasrUpload(voicePath, type, params);
// 判断返回值
int ok = uploadMsg.getOk();
if (ok == 0) {
// 创建任务成功
task_id = uploadMsg.getData();
//System.out.println("创建任务成功 task_id=" + task_id);
logDetailText.append(datePrefix + "创建任务成功 task_id=" + task_id + "\n");
} else {
// 创建任务失败-服务端异常
//System.out.println(datePrefix + "ecode=" + uploadMsg.getErr_no());
logDetailText.append(datePrefix + "ecode=" + uploadMsg.getErr_no() + "\n");
//System.out.println(datePrefix + "failed=" + uploadMsg.getFailed());
logDetailText.append(datePrefix + "failed=" + uploadMsg.getFailed() + "\n");
countDownLatch.countDown();
return -1;
}
} catch (LfasrException e) {
// 上传异常,解析异常描述信息
Message uploadMsg = JSON.parseObject(e.getMessage(), Message.class);
//System.out.println(datePrefix + "ecode=" + uploadMsg.getErr_no());
logDetailText.append(datePrefix + "ecode=" + uploadMsg.getErr_no() + "\n");
//System.out.println(datePrefix + "failed=" + uploadMsg.getFailed());
logDetailText.append(datePrefix + "failed=" + uploadMsg.getFailed() + "\n");
countDownLatch.countDown();
return -1;
}
// 循环等待音频处理结果
while (true) {
try {
// 等待20s在获取任务进度
Thread.sleep(sleepSecond * 1000);
//System.out.println("waiting ...");
logDetailText.append(datePrefix + "failed=" + "waiting ..." + "\n");
} catch (InterruptedException e) {
e.printStackTrace();
}
try {
// 获取处理进度
Message progressMsg = lc.lfasrGetProgress(task_id);
// 如果返回状态不等于0,则任务失败
if (progressMsg.getOk() != 0) {
//System.out.println("task was fail. task_id:" + task_id);
//System.out.println("ecode=" + progressMsg.getErr_no());
//System.out.println("failed=" + progressMsg.getFailed());
logDetailText.append(datePrefix + "task was fail. task_id:" + task_id + "\n");
logDetailText.append(datePrefix + "ecode=" + progressMsg.getErr_no() + "\n");
logDetailText.append(datePrefix + "failed=" + progressMsg.getFailed() + "\n");
countDownLatch.countDown();
return -1;
} else {
ProgressStatus progressStatus = JSON.parseObject(progressMsg.getData(), ProgressStatus.class);
if (progressStatus.getStatus() == 9) {
// 处理完成
//System.out.println(datePrefix + "task was completed. task_id:" + task_id + "\n");
logDetailText.append(datePrefix + "task was completed. task_id:" + task_id + "\n");
break;
} else {
// 未处理完成
//System.out.println(datePrefix + "task is incomplete. task_id:" + task_id + ", status:" + progressStatus.getDesc() + "\n");
logDetailText.append(datePrefix + "task is incomplete. task_id:" + task_id + ", status:" + progressStatus.getDesc() + "\n");
continue;
}
}
} catch (LfasrException e) {
// 获取进度异常处理,根据返回信息排查问题后,再次进行获取
Message progressMsg = JSON.parseObject(e.getMessage(), Message.class);
//System.out.println(datePrefix + "ecode=" + progressMsg.getErr_no() + "\n");
//System.out.println(datePrefix + "failed=" + progressMsg.getFailed() + "\n");
logDetailText.append(datePrefix + "ecode=" + progressMsg.getErr_no() + "\n");
logDetailText.append(datePrefix + "failed=" + progressMsg.getFailed() + "\n");
}
}
// 获取任务结果
try {
Message resultMsg = lc.lfasrGetResult(task_id);
// 如果返回状态等于0,则获取任务结果成功
if (resultMsg.getOk() == 0) {
// 打印转写结果
String result = Test.getFinalResult(resultMsg.getData());
String output = textPath + "\\" + System.currentTimeMillis() + ".txt";
FileOutputStream f = new FileOutputStream(output);
f.write(result.getBytes());
//System.out.println(result);
logDetailText.append(datePrefix + "结果存放路径: " + output + "\n");
logDetailText.append(datePrefix + "最终转换结果: " + "\n");
logDetailText.append(datePrefix + result + "\n");
} else {
// 获取任务结果失败
//System.out.println(datePrefix + "ecode=" + resultMsg.getErr_no() + "\n");
//System.out.println(datePrefix + "failed=" + resultMsg.getFailed() + "\n");
logDetailText.append(datePrefix + "ecode=" + resultMsg.getErr_no() + "\n");
logDetailText.append(datePrefix + "failed=" + resultMsg.getFailed() + "\n");
countDownLatch.countDown();
return -1;
}
} catch (LfasrException e) {
// 获取结果异常处理,解析异常描述信息
Message resultMsg = JSON.parseObject(e.getMessage(), Message.class);
//System.out.println(datePrefix + "ecode=" + resultMsg.getErr_no() + "\n");
//System.out.println(datePrefix + "failed=" + resultMsg.getFailed() + "\n");
logDetailText.append(datePrefix + "ecode=" + resultMsg.getErr_no() + "\n");
logDetailText.append(datePrefix + "failed=" + resultMsg.getFailed() + "\n");
countDownLatch.countDown();
return -1;
}
countDownLatch.countDown();
return 1;
}
}
整合代码,实现最终效果如下:
代码位置:https://github.com/ChenWenKaiVN/VoiceToText
下一阶段优化方向
1.主线程会出现假死现象。需要深入研究一下SWT UI线程与非UI线程的运行机制。
https://blog.csdn.net/dollyn/article/details/38582743/
2.研究一下进度条的问题,显示转换进度。
3.配置界面需要与SDK配置文件进一步相结合,许多变量还是写死在SDK配置文件中。
4.研究一下可执行jar的打包方法,将JRE一起加入到可执行jar中。
5.研究一下语音识别的技术原理