【信息技术】【2006】熵与语音

在这里插入图片描述

本文为瑞典皇家理工学院（作者：Mattias Nilsson）的博士论文，共54页。

在本文中，我们研究了语音信号的表示以及从包含语音信号特征的观测值估计信息论测度。本文主要由四篇论文组成。

论文A提出了一种便于完美重构的语音信号的紧凑表示方法，该方法由模型、模型参数和信号系数构成。与现有的语音表示法相比，一个不同之处在于，我们根据所选择的能量集中准则，通过模型适应以最大限度地集中信号系数的能量来寻求紧凑表示。该表达式的各部分与语音信号的特性密切相关，例如频谱包络、基音和浊音/清音信号系数，这对语音编码和修改都是有益的。

从熵的信息理论测度，可以推导出编码和分类的性能极限。论文B和C讨论微分熵的估计。论文B描述了当向量观测集（来自表达式）位于嵌入空间中的低维表面（流形）上时，微分熵的估计方法。与论文B提出的方法相比，论文C介绍了一种通过约束观测空间的分辨率来破坏流形结构的方法。这有助于对分类错误率的边界进行估计，即使流形在嵌入空间内具有不同的维数。

最后，论文D研究了窄带（0.3 - 3.4kHz）和高频带（3.4 - 8kHz）语音频谱特征之间的共享信息量。论文D的研究结果表明，在没有传输描述高频带额外信息的情况下，高频带和窄带之间共享的信息对于高质量宽带语音编码(0.3 - 8kHz)是不够的。

In this thesis, we study the representationof speech signals and the estimation of information-theoretical measures fromobservations containing features of the speech signal. The main body of the thesisconsists of four research papers. Paper A presents a compact representation ofthe speech signal that facilitates perfect reconstruction. The representationis constituted of models, model parameters, and signal coefficients. Adifference compared to existing speech representations is that we seek acompact representation by adapting the models to maximally concentrate theenergy of the signal coefficients according to a selected energy concentrationcriterion. The individual parts of the representation are closely related tospeech signal properties such as spectral envelope, pitch, and voiced/unvoicedsignal coefficients, beneficial for both speech coding and modification. Fromthe information-theoretical measure of entropy, performance limits in codingand classification can be derived. Papers B and C discuss the estimation ofdifferential entropy. Paper B describes a method for estimation of thedifferential entropies in the case when the set of vector observations (fromthe representation) lie on a lower-dimensional surface (manifold) in theembedding space. In contrast to the method presented in Paper B, Paper Cintroduces a method where the manifold structures are destroyed by constrainingthe resolution of the observation space. This facilitates the estimation ofbounds on classification error rates even when the manifolds are of varyingdimensionality within the embedding space. Finally, Paper D investigates theamount of shared information between spectral features of narrow-band (0.3-3.4kHz) and high-band (3.4-8 kHz) speech. The results in Paper D indicate that theinformation shared between the high-band and the narrow-band is insufficientfor high-quality wideband speech coding (0.3-8 kHz) without transmission ofextra information describing the high-band.

1 引言
2 语音规范表示
3 关于嵌入流形上数据的微分熵估计
4 模式分类中的内在维度及其对性能预测的意义
5 基于高斯混合模型的语音频带间互信息估计

下载英文原文地址：

http://page5.dfpan.com/fs/dlcaj28211293169e77/

更多精彩文章请关注微信号：在这里插入图片描述

扫描二维码关注公众号，回复： 4327216 查看本文章

【信息技术】【2006】熵与语音

猜你喜欢