来源B站,但是代码微改,更好理解。
B站网址:https://www.bilibili.com/video/BV1vT411m7yc/?spm_id_from=333.788&vd_source=c105ef445d9ba79ff025b5ba5869ce2b
import math
import librosa
import numpy as np
import matplotlib.pyplot as plt
"""
幅值包络Amplitude Envelope
寻找每一帧中的幅值最大值,将每一帧中幅值最大值连起来就是幅值包络
提取第t帧的AE值,k采样点数,t是帧序列数,K为每一帧的帧长,采样点k在{t*K,(t+1)*K-1}之间
分帧:将信号根据时间尺寸进行分割,每一段的长度为帧长frame_length,帧的个数为frame_num,则信号的总采样点数为frame_length*frame_num
hop_length:帧移,相邻两帧的采样点之间的间隔,即相邻两帧的起始点的距离
分帧重叠:为了让信号更加平滑,相邻两帧之间会有重叠部分
分帧不补0:只取长度完整的帧,不补0,总样本数为N,则帧的个数为frame_num=math.floor [ (N-frame_length) / hop_length ] +1
分帧补0:如果总样本N%hop_length!=0,则需要补0的帧数为frame_length-N%hop_length
总帧数为math.floor [N/hop_length] +1
如果总样本N%hop_length=0
则总帧数为math.floor [N%hop_length]
(补0可以看成每次仅移动hop_length个点且和下一个点没有交集,+1是因为补0还可以形成一帧)
绘制图像
红色代表幅值包络,蓝色是音频信号
"""
# 1 加载信号
wave_path = "audio01-happy-birthday-to-you.wav"
waveform,sample_rate = librosa.load(wave_path,sr=None)
# print(len(waveform))
# 2 定义AE函数,去信号每一帧中幅值最值为该帧的包络
def Calc_Amplitude_Envelope(waveform, frame_length, hop_length):
if len(waveform.shape) % hop_length != 0:
frame_num = math.floor(len(waveform)/hop_length)+1
print("frame_num",frame_num)
pad_num = frame_length - len(waveform)%hop_length
print("pad_num",pad_num)
waveform = np.pad(waveform,(0,pad_num),mode="constant")
frame_num = math.floor(len(waveform)/hop_length)
print(frame_num)
waveform_ae = []
for t in range (frame_num):
current_frame = waveform[t*(frame_length-hop_length):t*(frame_length-hop_length)+frame_length]
current_ae = max(current_frame)
waveform_ae.append(current_ae)
return np.array(waveform_ae)
# 3 设置参数
frame_length = 1024
hop_size = int(frame_length//2)
# 幅值包络的信息
wa_ae = Calc_Amplitude_Envelope(waveform,frame_length,hop_size)
# 4 绘制信息
frame_scale = np.arange(0,len(wa_ae))
time_scale = librosa.frames_to_time(frame_scale,hop_length=hop_size)
plt.figure(figsize=(20,10))
librosa.display.waveshow(waveform)
plt.plot(time_scale, wa_ae,color="r")
plt.title("Amplitude Envelope")
plt.show()