Spring AI 文本转语音（Text-To-Speech）入门示例

企业开发 2025-04-11 17:53:56 阅读次数: 0

在Spring AI框架中，提供了文本语音转换Text-To-Speech（TTS）的API接口，并为OpenAI的Speech API提供了支持。当实现其他语音提供程序时，将提取一个通用的SpeechModel和StreamingSpeechModel接口。

本文使用国内阿里巴巴百炼人工智能平台的文本转换语音（Text-To-Speech）服务，并基于Spring AI Alibaba开源AI框架，开发一个文本转语音的验证程序。

1、使用Spring AI 前提条件

JDK为17以上版本，本人使用的jdk21版本；
SpringBoot版本为3.x以上，本项目使用的是SpringBoot 3.3.3版本；
开通阿里大模型服务（目前是免费6个月），获取 API-KEY，后面代码里要使用。具体操作，请参考阿里云大模型服务平台百炼：如何获取API Key_大模型服务平台百炼(Model Studio)-阿里云帮助中心

2、创建SpringBoot工程

通过IDEA开发工具，创建一个普通的Java工程，注意JDK版本至少是17以上，我这里使用的是jdk21版本。

3、配置pom.xml文件

工程创建完成后，在pom.xml里添加依赖。

首先，需要在项目中添加 spring-boot-starter-parent声明，指定springboot框架的版本号：

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-starter-parent</artifactId>

</parent>

在dependencies标签中引入spring-boot-starter-web和spring-ai-alibaba-starter依赖：

<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>

<groupId>com.alibaba.cloud.ai</groupId>

<artifactId>spring-ai-alibaba-starter</artifactId>

</dependency>

由于 spring-ai 相关依赖包还没有发布到中央仓库，如出现 spring-ai-core 等相关依赖解析问题，请在您项目的 pom.xml 依赖中加入如下仓库配置。

<id>spring-milestones</id>

<name>Spring Milestones</name>

<url>https://repo.spring.io/milestone</url>

<enabled>false</enabled>

</snapshots>

</repository>

</repositories>

最后pom.xml文件完整内容如下：

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.yuncheng</groupId>
    <artifactId>spring-ai-helloworld</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>21</maven.compiler.source>
        <maven.compiler.target>21</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>3.3.3</version>
        <relativePath/>
    </parent>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <dependency>
            <groupId>com.alibaba.cloud.ai</groupId>
            <artifactId>spring-ai-alibaba-starter</artifactId>
            <version>1.0.0-M3.3</version>
        </dependency>
        <dependency>
            <groupId>commons-io</groupId>
            <artifactId>commons-io</artifactId>
            <version>2.18.0</version>
        </dependency>
    </dependencies>

    <repositories>
        <repository>
            <id>spring-milestones</id>
            <name>Spring Milestones</name>
            <url>https://repo.spring.io/milestone</url>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </repository>
    </repositories>

</project>

4、配置yml文件

在工程的resources目录下的application-dev.yml文件里增加如下配置：

注意：api-key要替换成自己从阿里百炼平台申请的key

server:
  port: 8080

spring:
  application:
    name: spring-ai-helloworld
  ai:
    dashscope:
      api-key: sk-b90ad31bb3eb4a1585251928356d39dc5

5、创建TTSController

SpeechSynthesisModel 类是Spring AI Alibaba框架中用于表示和管理文本转语音模型的核心组件之一。

DashScopeSpeechSynthesisOptions 类通常用于配置文本转语音（TTS）服务的选项，这个类允许开发者指定一系列参数（比如：语速、音调、音量等）来定制化语音合成的结果，从而满足不同的应用场景需求。

package com.yuncheng;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.util.concurrent.CountDownLatch;

import com.alibaba.cloud.ai.dashscope.audio.DashScopeSpeechSynthesisOptions;
import com.alibaba.cloud.ai.dashscope.audio.synthesis.SpeechSynthesisModel;
import com.alibaba.cloud.ai.dashscope.audio.synthesis.SpeechSynthesisPrompt;
import com.alibaba.cloud.ai.dashscope.audio.synthesis.SpeechSynthesisResponse;
import jakarta.annotation.PreDestroy;
import org.apache.commons.io.FileUtils;
import reactor.core.publisher.Flux;

import org.springframework.boot.ApplicationArguments;
import org.springframework.boot.ApplicationRunner;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;


@RestController
@RequestMapping("/ai/tts")
public class TTSController implements ApplicationRunner {

    private final SpeechSynthesisModel speechSynthesisModel;

    private static final String TEXT = "床前明月光， 疑是地上霜。 举头望明月， 低头思故乡。";

    private static final String FILE_PATH = "src/main/resources/tts";

    public TTSController(SpeechSynthesisModel speechSynthesisModel) {
        this.speechSynthesisModel = speechSynthesisModel;
    }

    @GetMapping("/simple")
    public void tts() throws IOException {

        // 使用构建器模式创建 DashScopeSpeechSynthesisOptions 实例并设置参数
        DashScopeSpeechSynthesisOptions options = DashScopeSpeechSynthesisOptions.builder()
                .withSpeed(1.0)        // 设置语速
                .withPitch(0.9)         // 设置音调
                .withVolume(60)         // 设置音量
                .build();

        SpeechSynthesisResponse response = speechSynthesisModel.call(
                new SpeechSynthesisPrompt(TEXT,options)
        );

        File file = new File(FILE_PATH + "/output.mp3");
        try (FileOutputStream fos = new FileOutputStream(file)) {
            ByteBuffer byteBuffer = response.getResult().getOutput().getAudio();
            fos.write(byteBuffer.array());
        }
        catch (IOException e) {
            throw new IOException(e.getMessage());
        }
    }

    @GetMapping("/stream")
    public void streamTTS() {

        Flux<SpeechSynthesisResponse> response = speechSynthesisModel.stream(
                new SpeechSynthesisPrompt(TEXT)
        );

        CountDownLatch latch = new CountDownLatch(1);
        File file = new File(FILE_PATH + "/output-stream.mp3");
        try (FileOutputStream fos = new FileOutputStream(file)) {

            response.doFinally(
                    signal -> latch.countDown()
            ).subscribe(synthesisResponse -> {
                ByteBuffer byteBuffer = synthesisResponse.getResult().getOutput().getAudio();
                byte[] bytes = new byte[byteBuffer.remaining()];
                byteBuffer.get(bytes);
                try {
                    fos.write(bytes);
                }
                catch (IOException e) {
                    throw new RuntimeException(e);
                }
            });

            latch.await();
        }
        catch (IOException | InterruptedException e) {
            throw new RuntimeException(e);
        }
    }

    @Override
    public void run(ApplicationArguments args) {

        File file = new File(FILE_PATH);
        if (!file.exists()) {
            file.mkdirs();
        }
    }

    @PreDestroy
    public void destroy() throws IOException {

        FileUtils.deleteDirectory(new File(FILE_PATH));
    }

}

6、创建Application启动类

package com.yuncheng;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.ConfigurableApplicationContext;
import org.springframework.core.env.Environment;

@SpringBootApplication
public class AiApplication {

    public static void main(String[] args) {
        ConfigurableApplicationContext application = SpringApplication.run(AiApplication.class, args);
        Environment env = application.getEnvironment();
        String port = env.getProperty("server.port");
        System.out.println("AiApplication启动成功，服务端口为：" + port);

    }
}