最近再看深度学习解决语音识别的问题,疑惑的是语音信号的语谱图是个什么东东,特地查了一下,仅供参考:
1 定义:
语音信号的傅里叶分析的显示图形称为语谱图(sonogram或者spectrogram)。语谱图是一种三维频谱,它是表示语音频谱随时间变化的图形,其纵轴为频率,横轴为时间。任一给定频率成分在给定时刻的强弱用相应点的灰度或色调的浓淡来表示。用语谱图分析语音又称为语谱分析。语谱图中显示了大量的与语音的语句特性有关的信息,它综合了频谱图和时域波形的特点,明显地显示出语音频谱随时间的变化情况,或者说是一种动态的频谱。可以用语谱仪来记录这种谱图。
2 求解:
对于一段语音信号x(t),首先分帧,变为x(m,n)(n为帧长,m为帧的个数),做FFT变换,得到X(m,n),做周期图Y(m,n)(Y(m,n) = X(m,n) * X(m,n)’),然后取10 *log10(Y(m,n)),把m根据时间变换一下刻度M,n根据频率变化一下刻度N,就(M,N, 10*log10(Y(m,n) 画成二维图就是语谱图了(也可画成三维图)。
3 如何看图(看图说话,哈哈):
我们可以观察语音不同频段的信号强度随时间的变化情况。由于信号本身频率丰富,不太容易看出规律,我们可以观察一下纯粹的语音数据的语谱图(见上图)。从图中可以看到明显的一条条横方向的条纹,我们称为“声纹”(不清楚这个叫法准不准确),有很多应用。条纹的地方实际是颜色深的点聚集的地方,随时间延续,就延长成条纹,也就是表示语音中频率值为该点横坐标值的能量较强,在整个语音中所占比重大,那么相应影响人感知的效果要强烈得多。而一般语音中数据是周期性的,所以,能量强点的频率分布是频率周期的,即存在300Hz强点,则一般在n*300Hz点也会出现强点,所以我们看到的语谱图都是条纹状的。
尽管客观人发声器官的音域是有限度的,即一般人发声最高频率为4000Hz,乐器的音域要比人宽很多,打击乐器的上限可以到20KHz。但是,由于我们数字分析频率时,采用的是算法实现的,一般是FFT,所以其结果是由采样率决定的,即尽管是上限为4000Hz的语音数据,如果采用16Khz的采样率来分析,则仍然可以在4000Hz以上的频段发现有数据分布,则可以认为是算法误差,非客观事实。
4 matlab程序(已调试过,正确):
Main:
[x,fs,nbits]=wavread('keshi.wav');
specgram(x,512,fs,100)%语谱图函数
xlabel('时间(s)')
ylabel('频率(Hz)')
title('“概率”语谱图')
function [yo,fo,to] = specgram(varargin)
%SPECGRAM Spectrogram using aShort-Time Fourier Transform (STFT).
% SPECGRAM has been replaced by SPECTROGRAM. SPECGRAM still works but
% may be removed in the future. Use SPECTROGRAM instead. Type help
% SPECTROGRAM for details.
%
% See also PERIODOGRAM, SPECTRUM/PERIODOGRAM, PWELCH, SPECTRUM/WELCH,GOERTZEL.
% Author(s): L. Shure, 1-1-91
% T. Krauss, 4-2-93, updated
% Copyright 1988-2010 The MathWorks, Inc.
% $Revision: 1.8.4.6 $ $Date:2010/02/17 19:00:23 $
error(nargchk(1,5,nargin,'struct'))
[msg,x,nfft,Fs,window,noverlap]=specgramchk(varargin);
if ~isempty(msg), error(generatemsgid('SigErr'),msg); end
nx = length(x);
nwind = length(window);
if nx < nwind % zero-pad x if it has length lessthan the window length
x(nwind)=0; nx=nwind;
end
x = x(:); % make a column vector for ease later
window = window(:); % be consistent with data set
ncol =fix((nx-noverlap)/(nwind-noverlap));
colindex = 1 +(0:(ncol-1))*(nwind-noverlap);
rowindex = (1:nwind)';
if length(x)<(nwind+colindex(ncol)-1)
x(nwind+colindex(ncol)-1) = 0; % zero-pad x
end
if length(nfft)>1
df = diff(nfft);
evenly_spaced = all(abs(df-df(1))/Fs<1e-12); % evenly spaced flag (boolean)
use_chirp = evenly_spaced & (length(nfft)>20);
else
use_chirp = 0;
end
if (length(nfft)==1) || use_chirp
y = zeros(nwind,ncol);
% put x into columns of y with theproper offset
% should be able to do this withfancy indexing!
y(:) = x(rowindex(:,ones(1,ncol))+colindex(ones(nwind,1),:)-1);
% Apply the window to the array ofoffset signal segments.
y = window(:,ones(1,ncol)).*y;
if ~use_chirp % USE FFT
% now fft ywhich does the columns
y = fft(y,nfft);
if ~any(any(imag(x))) % x purely real
if rem(nfft,2), % nfft odd
select = 1:(nfft+1)/2;
else
select = 1:nfft/2+1;
end
y = y(select,:);
else
select = 1:nfft;
end
f = (select - 1)'*Fs/nfft;
else % USE CHIRP Z TRANSFORM
f = nfft(:);
f1 = f(1);
f2 = f(end);
m = length(f);
w = exp(-1i*2*pi*(f2-f1)/(m*Fs));
a = exp(1i*2*pi*f1/Fs);
y = czt(y,m,w,a);
end
else % evaluate DFT on given set offrequencies
f = nfft(:);
q = nwind - noverlap;
extras = floor(nwind/q);
x = [zeros(q-rem(nwind,q)+1,1); x];
% create windowed DTFT matrix(filter bank)
D =window(:,ones(1,length(f))).*exp((-1i*2*pi/Fs*((nwind-1):-1:0)).'*f');
y = upfirdn(x,D,1,q).';
y(:,[1:extras+1 end-extras+1:end]) = [];
end
t = (colindex-1)'/Fs;
% take abs, and use image to displayresults
if nargout == 0
newplot;
if length(t)==1
imagesc([0 1/f(2)],f,20*log10(abs(y)+eps));axis xy; colormap(jet)
else
% Shift timevector by half window length; the overlap factor has
% already beenaccounted for in the colindex variable.
t = ((colindex-1)+((nwind)/2)')/Fs;
imagesc(t,f,20*log10(abs(y)+eps));axis xy; colormap(jet)
end
xlabel('Time')
ylabel('Frequency')
elseif nargout == 1,
yo = y;
elseif nargout == 2,
yo = y;
fo = f;
elseif nargout == 3,
yo = y;
fo = f;
to = t;
end
function [msg,x,nfft,Fs,window,noverlap] = specgramchk(P)
%SPECGRAMCHK Helper function forSPECGRAM.
% SPECGRAMCHK(P) takes the cell array P and uses each cell as
% an input argument. Assumes P hasbetween 1 and 5 elements.
msg = [];
x = P{1};
if (length(P) > 1) && ~isempty(P{2})
nfft = P{2};
else
nfft = min(length(x),256);
end
if (length(P) > 2) && ~isempty(P{3})
Fs = P{3};
else
Fs = 2;
end
if length(P) > 3 && ~isempty(P{4})
window = P{4};
else
if length(nfft) == 1
window = hanning(nfft);
else
msg = 'You must specify awindow function.';
end
end
if length(window) == 1, window = hanning(window); end
if (length(P) > 4) && ~isempty(P{5})
noverlap = P{5};
else
noverlap = ceil(length(window)/2);
end
% NOW do error checking
if (length(nfft)==1) && (nfft<length(window)),
msg = 'Requires window''slength to be no greater than the FFT length.';
end
if (noverlap >= length(window)),
msg = 'Requires NOVERLAPto be strictly less than the window length.';
end
if (length(nfft)==1) && (nfft ~= abs(round(nfft)))
msg = 'Requires positiveinteger values for NFFT.';
end
if (noverlap ~= abs(round(noverlap))),
msg = 'Requires positiveinteger value for NOVERLAP.';
end
if min(size(x))~=1,
msg = 'Requires vector(either row or column) input.';
end
感谢:
1http://blog.csdn.net/jiangyangbo/article/details/5899264
————————————————
版权声明:本文为CSDN博主「qinghu7987」的原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/u014332048/article/details/44569285