【Python百日进阶-Web开发-音频】Day711 - 光谱表示 librosa.stft 短时傅里叶变换

文章目录

一、光谱表示 Spectral representations
- 1.1 librosa.stft
- - 1.1.1 语法与参数
  - 1.1.2 示例

一、光谱表示 Spectral representations

1.1 librosa.stft

https://librosa.org/doc/latest/generated/librosa.stft.html

1.1.1 语法与参数

librosa.stft(y, *, n_fft=2048, hop_length=None, win_length=None, window='hann', center=True, dtype=None, pad_mode='constant')[source]

短时傅里叶变换 (STFT)。
STFT 通过在短重叠窗口上计算离散傅里叶变换 (DFT) 来表示时频域中的信号。
此函数返回一个复值矩阵 D 使得
np.abs(D[…, f, t])f 是帧处频率仓的幅度t，并且
np.angle(D[…, f, t])f 是帧处频率仓的相位t。
整数t和f可以通过实用函数frames_to_sample和转换为物理单位fft_frequencies。

>参数
	ynp.ndarray [shape=(…, n)], real-valued
		输入信号。支持多通道。

	n_fftint > 0 [scalar]
		用零填充后窗口化信号的长度。STFT 矩阵的行数D为(1 + n_fft/2)。默认值n_fft=2048 samples 对应于 93 毫秒的物理持续时间，采样率为 22050 Hz，即 librosa 中的默认采样率。该值非常适合音乐信号。但是，在语音处理中，推荐值为 512，对应于 23 毫秒，采样率为 22050 Hz。无论如何，我们建议设置n_fft 为 2 的幂以优化快速傅里叶变换 (FFT) 算法的速度。
	hop_length：int > 0 [scalar]

number of audio samples between adjacent STFT columns.

Smaller values increase the number of columns in D without affecting the frequency resolution of the STFT.

If unspecified, defaults to win_length // 4 (see below).

	win_length：int <= n_fft [scalar]

Each frame of audio is windowed by window of length win_length and then padded with zeros to match n_fft.

Smaller values improve the temporal resolution of the STFT (i.e. the ability to discriminate impulses that are closely spaced in time) at the expense of frequency resolution (i.e. the ability to discriminate pure tones that are closely spaced in frequency). This effect is known as the time-frequency localization trade-off and needs to be adjusted according to the properties of the input signal y.

If unspecified, defaults to win_length = n_fft.

	windows：tring, tuple, number, function, or np.ndarray [shape=(n_fft,)]

Either:

a window specification (string, tuple, or number); see scipy.signal.get_window

a window function, such as scipy.signal.windows.hann

a vector or array of length n_fft

Defaults to a raised cosine window (‘hann’), which is adequate for most applications in audio signal processing.

	center：boolean
		如果True，y则填充信号以使帧 D[:, t] 以 y[t * hop_length]为中心。
		如果False，则D[:, t]从y[t * hop_length]开始。
		默认为True，这通过 librosa.frames_to_samples简化了D在时间网格上的对齐。但是请注意，在使用librosa.stream 分析信号时，center必须将其设置为False。

	dtype：np.dtype, optional
		D的复数类型。推断默认值以匹配输入信号的精度。

	pad_mode：string or function
		如果center=True, 这个参数被传递给np.pad用于填充信号的边缘y。默认情况下 ( pad_mode="constant") y两边用零填充。如果center=False，则忽略此参数。

>Returns
	D：np.ndarray [shape=(…, 1 + n_fft/2, n_frames), dtype=dtype]
		短期傅里叶变换系数的复值矩阵。

笔记
此函数缓存在 20 级。

1.1.2 示例

import librosa
import scipy
import librosa.display
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

y, sr = librosa.load(librosa.ex('trumpet'))
print(sr)  # 22050
print(y)
"""
[-1.4068224e-03 -4.4607223e-04 -4.1098078e-04 ...  7.9623060e-06
 -3.0417003e-05  1.2765067e-05]
"""
print(y.shape)  # (117601,)

S_center = librosa.stft(y)
print(S_center)
"""
[[ 1.9942708e-03+0.00000000e+00j  1.6542288e-03+0.00000000e+00j
   1.0328183e-03+0.00000000e+00j ... -2.2926326e-08+0.00000000e+00j
  -1.3249499e-07+0.00000000e+00j -1.3283957e-06+0.00000000e+00j]
 [-9.1869739e-04-3.66205000e-04j -1.4027286e-03+1.79366514e-04j
   4.1004395e-04-9.79877426e-04j ...  2.2314891e-08-1.08651239e-08j
  -9.0820201e-08-4.22441389e-08j  6.5005867e-07-1.15266187e-06j]
 [ 4.9593160e-04+2.89754127e-03j  2.1955024e-03-1.88621401e-03j
  -1.5042156e-03+4.51300322e-04j ... -1.6460707e-08+1.98847889e-08j
  -4.0321687e-08-1.12811755e-07j  6.7993113e-07+1.12290854e-06j]
 ...
 [-7.4050279e-04+9.60371017e-06j  3.6689549e-04-4.40829444e-06j
  -3.1608189e-07-1.19399476e-06j ... -1.4079569e-04+3.04138841e-04j
  -3.0000968e-04-6.39684324e-04j  8.6307281e-04+6.27736968e-04j]
 [ 7.4102602e-04-3.95785128e-06j -1.2137341e-06-3.67751083e-04j
   1.3524847e-06-2.02049563e-07j ... -1.5341908e-04-1.93013460e-04j
  -3.2897147e-05-1.13135975e-04j -8.3123025e-04-3.35008168e-04j]
 [-7.4161147e-04+0.00000000e+00j -3.6996530e-04+0.00000000e+00j
  -1.9221090e-06+0.00000000e+00j ...  1.9005347e-04+0.00000000e+00j
   5.1085511e-04+0.00000000e+00j  9.5983199e-04+0.00000000e+00j]]
"""
print(S_center.shape)  # (1025, 230)

S = np.abs(S_center)
print(S)
"""
[[1.9942708e-03 1.6542288e-03 1.0328183e-03 ... 2.2926326e-08
  1.3249499e-07 1.3283957e-06]
 [9.8899496e-04 1.4141499e-03 1.0622127e-03 ... 2.4819453e-08
  1.0016424e-07 1.3233313e-06]
 [2.9396757e-03 2.8944833e-03 1.5704575e-03 ... 2.5813943e-08
  1.1980121e-07 1.3127185e-06]
 ...
 [7.4056507e-04 3.6692197e-04 1.2351239e-06 ... 3.3514752e-04
  7.0654217e-04 1.0672152e-03]
 [7.4103661e-04 3.6775309e-04 1.3674936e-06 ... 2.4655956e-04
  1.1782178e-04 8.9619990e-04]
 [7.4161147e-04 3.6996530e-04 1.9221090e-06 ... 1.9005347e-04
  5.1085511e-04 9.5983199e-04]]
"""

# 使用左对齐的框架，而不是居中的框架
S_left = librosa.stft(y, center=False)
print(S_left)
"""
[[ 1.0328183e-03+0.00000000e+00j  5.3383765e-04+0.00000000e+00j
   4.0781524e-04+0.00000000e+00j ... -7.4833135e-09+0.00000000e+00j
  -1.3176041e-08+0.00000000e+00j -2.2926326e-08+0.00000000e+00j]
 [ 4.1004395e-04-9.79877426e-04j -7.3341822e-04+4.68304846e-04j
   2.9052317e-04-5.92359866e-04j ...  9.7249346e-09-5.20855403e-09j
  -1.3737805e-09+5.24641885e-10j  2.2314891e-08-1.08651239e-08j]
 [-1.5042156e-03+4.51300322e-04j  8.8845036e-04-9.47999768e-04j
  -8.5169246e-04+1.14679744e-03j ... -8.4719254e-09+9.02283581e-09j
   1.1467454e-08-1.22313732e-08j -1.6460707e-08+1.98847889e-08j]
 ...
 [-3.1608189e-07-1.19399476e-06j  1.2029817e-06+3.07988870e-07j
  -1.4841049e-07-1.11454945e-07j ... -4.5909670e-05+6.52485323e-06j
  -5.4250672e-06-6.11315627e-05j -1.4079569e-04+3.04138841e-04j]
 [ 1.3524847e-06-2.02049563e-07j  3.1585279e-07-7.99633824e-07j
   2.3395089e-07+9.38624680e-08j ... -4.4754495e-05-3.46280540e-05j
   1.0162088e-05+1.18790449e-05j -1.5341908e-04-1.93013460e-04j]
 [-1.9221090e-06+0.00000000e+00j -1.0158723e-06+0.00000000e+00j
  -4.8456741e-07+0.00000000e+00j ... -3.5932208e-05+0.00000000e+00j
   1.8500381e-05+0.00000000e+00j  1.9005347e-04+0.00000000e+00j]]
"""
print(S_left.shape)  # (1025, 226)

# 使用较短的跳跃长度
D_short = librosa.stft(y, hop_length=64)
print(D_short)
"""
[[ 1.9942708e-03+0.0000000e+00j  2.0245595e-03+0.0000000e+00j
   1.9468555e-03+0.0000000e+00j ... -1.6863929e-06+0.0000000e+00j
  -1.7487063e-06+0.0000000e+00j -1.7789796e-06+0.0000000e+00j]
 [-9.1869739e-04-3.6620500e-04j -7.4865157e-04-3.3484653e-04j
  -7.1702205e-04-2.1911871e-04j ...  1.4982434e-06-7.6076549e-07j
   1.6798145e-06-4.7158815e-07j  1.7701935e-06-1.3795959e-07j]
 [ 4.9593160e-04+2.8975413e-03j -1.0332313e-03+2.7302452e-03j
  -2.3815844e-03+1.8284258e-03j ... -9.8341661e-07+1.3487496e-06j
  -1.4770260e-06+9.0465647e-07j -1.7408884e-06+2.7701415e-07j]
 ...
 [-7.4050279e-04+9.6037102e-06j -6.7337602e-04+2.8923893e-04j
  -4.9616216e-04+5.0932058e-04j ...  7.8239013e-04-5.4693053e-04j
   4.7231969e-04-7.4939081e-04j  1.3690814e-04-7.9165393e-04j]
 [ 7.4102602e-04-3.9578513e-06j  7.1854488e-04-1.4679512e-04j
   6.5641786e-04-2.7571729e-04j ... -9.3884434e-04+1.8708868e-04j
  -8.5937575e-04+3.5740886e-04j -7.3320372e-04+4.9383269e-04j]
 [-7.4161147e-04+0.0000000e+00j -7.3425204e-04+0.0000000e+00j
  -7.1275508e-04+0.0000000e+00j ...  9.5271977e-04+0.0000000e+00j
   9.2443411e-04+0.0000000e+00j  8.8211324e-04+0.0000000e+00j]]
"""
print(D_short.shape)  # (1025, 1838)

# 显示频谱图
fig, ax = plt.subplots()
img = librosa.display.specshow(librosa.amplitude_to_db(S, ref=np.max),
                               y_axis='log', x_axis='time', ax=ax)
ax.set_title('Power spectrogram')
fig.colorbar(img, ax=ax, format='%+2.0f dB')
plt.show()