本项目介绍如何用 Verilog 实现一个带有预生成系数的简单 FIR 滤波器。
Things used in this project 、
Story
简陋的 FIR 滤波器是 FPGA 数字信号处理中最基本的构建模块之一,因此了解如何利用给定的抽头数和相应的系数值组装一个基本模块非常重要。因此,在这个关于在 FPGA 上入门 DSP 基础知识的实用方法迷你系列中,我将从一个简单的 15 抽头低通滤波器 FIR 开始,先在 Matlab 中生成初始系数值,然后将这些数值转换为 Verilog 模块中的使用值。
有限脉冲响应或 FIR 滤波器的定义是,滤波器的脉冲响应在一定时间内趋于零值,因此它是有限的。脉冲响应归零所需的时间与滤波器的阶(抽头数)直接相关,而阶(抽头数)就是 FIR 底部传递函数多项式的阶数。FIR 的传递函数不包含反馈,因此如果输入一个值为 1 的脉冲,然后是一堆零值,那么输出将只是滤波器的系数值。
任何滤波器的作用都是对信号进行调节,主要侧重于选择滤除或允许哪些频率通过。最简单的例子之一就是低通滤波器,它允许低于某一阈值(截止频率)的频率通过,同时大大衰减高于该阈值的频率,如下图所示。
本项目的主要重点是在 HDL(特别是 Verilog,但其概念可以很容易地转换为 VHDL)中实现 FIR,可将其分解为三个主要逻辑组件:将每个采样时钟送入的循环缓冲器,该缓冲器可适当考虑串行输入的延迟;每个抽头系数值的乘法器;以及每个抽头输出相加结果的累加器寄存器。
由于我的重点是 FPGA 逻辑中的 FIR 机制,因此我只是使用 Simulink 和 Matlab 中的 FDA 工具为低通滤波器插入一些简单的参数,然后使用生成的系数值为 Verilog 模块计算适当的寄存器值(稍后完成)。
我选择实施一个简单的 15 抽头低通滤波器 FIR,采样频率为 1Ms/s,通带频率为 200 kHz,截止频率为 355kHz,从而得到以下系数:
Create Design File for FIR Module
在新的 Vivado
项目中从头开始,使用流程导航器窗口中的添加源选项为 FIR 模块创建新的设计源。
在确定了 FIR 的阶数(抽头数)和系数值之后,下一组必须定义的参数是输入采样、输出采样和系数本身的位宽。
对于这个 FIR,我选择将输入采样和系数寄存器设置为 16 位宽,输出采样寄存器设置为 32 位宽,因为两个 16 位值的乘积是一个 32 位值(两个值相乘的宽度相加得出乘积的宽度,因此如果我选择 16 位输入采样和 8 位抽头,那么输出采样的宽度将是 24 位)。
这些值也都是带符号的,因此 MSB 被用作符号位,其余较低的位数则是值必须包含的位数(在选择输入采样寄存器的初始宽度时务必牢记这一点)。要在 Verilog 中将这些值设置为有符号数据类型,需要使用关键字 signed:
reg signed [15:0] register_name;
接下来要解决的问题是如何在 Verilog
中处理系数值,需要将十进制点值转换为定点值。由于所有的系数值都小于 1,因此寄存器的全部 15 位(总共 16 位中的 MSB
为带符号位)都可以用于小数位。通常,你必须决定寄存器中的整数部分和小数部分各占多少位。因此,转换小数抽头的数学方法是:(小数系数值)*(2^(15)),如果系数值为负数,该乘积的任何十进制值都将被四舍五入,并计算该值的两位数:
现在,我们终于可以专注于 FIR 模块的逻辑了,首先是循环缓冲器,它引入串行输入采样流,并为滤波器的 15 个抽头创建一个包含 15 个输入采样的数组。
always @ (posedge clk)
begin
if(enable_buff == 1'b1)
begin
buff0 <= in_sample;
buff1 <= buff0;
buff2 <= buff1;
buff3 <= buff2;
buff4 <= buff3;
buff5 <= buff4;
buff6 <= buff5;
buff7 <= buff6;
buff8 <= buff7;
buff9 <= buff8;
buff10 <= buff9;
buff11 <= buff10;
buff12 <= buff11;
buff13 <= buff12;
buff14 <= buff13;
end
end
接下来,乘法阶段将每个样本乘以每个系数值:
always @ (posedge clk)
begin
if (enable_fir == 1'b1)
begin
acc0 <= tap0 * buff0;
acc1 <= tap1 * buff1;
acc2 <= tap2 * buff2;
acc3 <= tap3 * buff3;
acc4 <= tap4 * buff4;
acc5 <= tap5 * buff5;
acc6 <= tap6 * buff6;
acc7 <= tap7 * buff7;
acc8 <= tap8 * buff8;
acc9 <= tap9 * buff9;
acc10 <= tap10 * buff10;
acc11 <= tap11 * buff11;
acc12 <= tap12 * buff12;
acc13 <= tap13 * buff13;
acc14 <= tap14 * buff14;
end
end
乘法阶段的结果值通过加法累积到寄存器中,最终成为滤波器的输出数据流。
/* Accumulate stage of FIR */
always @ (posedge clk)
begin
if (enable_fir == 1'b1)
begin
m_axis_fir_tdata <= acc0 + acc1 + acc2 + acc3 + acc4 + acc5 + acc6 + acc7 + acc8 + acc9 + acc10 + acc11 + acc12 + acc13 + acc14;
end
end
最后,逻辑的最后一部分是 FIR 模块的数据流接口。AXI 流接口是最常见的接口之一,因此我选择了它来实现。其关键在于有效信号和就绪信号,这两个信号可以控制上下游设备之间的数据流。这意味着 FIR 模块需要向其下游设备提供一个有效信号,以表明其输出为有效数据,同时在下游设备取消就绪信号时,能够暂停(但仍保留)其输出。FIR 模块还必须能够在其主站接口上以同样的方式与上游设备进行通信。
以下是 FIR 模块的逻辑设计概览:
请注意有效信号和就绪信号是如何设置 FIR 输入循环缓冲器和乘法级的使能值的,以及数据或系数通过的每个寄存器是如何声明为带符号的。
FIR 模块 Verilog 代码:
`timescale 1ns / 1ps
module FIR(
input clk,
input reset,
input signed [15:0] s_axis_fir_tdata,
input [3:0] s_axis_fir_tkeep,
input s_axis_fir_tlast,
input s_axis_fir_tvalid,
input m_axis_fir_tready,
output reg m_axis_fir_tvalid,
output reg s_axis_fir_tready,
output reg m_axis_fir_tlast,
output reg [3:0] m_axis_fir_tkeep,
output reg signed [31:0] m_axis_fir_tdata
);
always @ (posedge clk)
begin
m_axis_fir_tkeep <= 4'hf;
end
always @ (posedge clk)
begin
if (s_axis_fir_tlast == 1'b1)
begin
m_axis_fir_tlast <= 1'b1;
end
else
begin
m_axis_fir_tlast <= 1'b0;
end
end
// 15-tap FIR
reg enable_fir, enable_buff;
reg [3:0] buff_cnt;
reg signed [15:0] in_sample;
reg signed [15:0] buff0, buff1, buff2, buff3, buff4, buff5, buff6, buff7, buff8, buff9, buff10, buff11, buff12, buff13, buff14;
wire signed [15:0] tap0, tap1, tap2, tap3, tap4, tap5, tap6, tap7, tap8, tap9, tap10, tap11, tap12, tap13, tap14;
reg signed [31:0] acc0, acc1, acc2, acc3, acc4, acc5, acc6, acc7, acc8, acc9, acc10, acc11, acc12, acc13, acc14;
/* Taps for LPF running @ 1MSps with a cutoff freq of 400kHz*/
assign tap0 = 16'hFC9C; // twos(-0.0265 * 32768) = 0xFC9C
assign tap1 = 16'h0000; // 0
assign tap2 = 16'h05A5; // 0.0441 * 32768 = 1445.0688 = 1445 = 0x05A5
assign tap3 = 16'h0000; // 0
assign tap4 = 16'hF40C; // twos(-0.0934 * 32768) = 0xF40C
assign tap5 = 16'h0000; // 0
assign tap6 = 16'h282D; // 0.3139 * 32768 = 10285.8752 = 10285 = 0x282D
assign tap7 = 16'h4000; // 0.5000 * 32768 = 16384 = 0x4000
assign tap8 = 16'h282D; // 0.3139 * 32768 = 10285.8752 = 10285 = 0x282D
assign tap9 = 16'h0000; // 0
assign tap10 = 16'hF40C; // twos(-0.0934 * 32768) = 0xF40C
assign tap11 = 16'h0000; // 0
assign tap12 = 16'h05A5; // 0.0441 * 32768 = 1445.0688 = 1445 = 0x05A5
assign tap13 = 16'h0000; // 0
assign tap14 = 16'hFC9C; // twos(-0.0265 * 32768) = 0xFC9C
/* This loop sets the tvalid flag on the output of the FIR high once
* the circular buffer has been filled with input samples for the
* first time after a reset condition. */
always @ (posedge clk or negedge reset)
begin
if (reset == 1'b0) //if (reset == 1'b0 || tvalid_in == 1'b0)
begin
buff_cnt <= 4'd0;
enable_fir <= 1'b0;
in_sample <= 8'd0;
end
else if (m_axis_fir_tready == 1'b0 || s_axis_fir_tvalid == 1'b0)
begin
enable_fir <= 1'b0;
buff_cnt <= 4'd15;
in_sample <= in_sample;
end
else if (buff_cnt == 4'd15)
begin
buff_cnt <= 4'd0;
enable_fir <= 1'b1;
in_sample <= s_axis_fir_tdata;
end
else
begin
buff_cnt <= buff_cnt + 1;
in_sample <= s_axis_fir_tdata;
end
end
always @ (posedge clk)
begin
if(reset == 1'b0 || m_axis_fir_tready == 1'b0 || s_axis_fir_tvalid == 1'b0)
begin
s_axis_fir_tready <= 1'b0;
m_axis_fir_tvalid <= 1'b0;
enable_buff <= 1'b0;
end
else
begin
s_axis_fir_tready <= 1'b1;
m_axis_fir_tvalid <= 1'b1;
enable_buff <= 1'b1;
end
end
/* Circular buffer bring in a serial input sample stream that
* creates an array of 15 input samples for the 15 taps of the filter. */
always @ (posedge clk)
begin
if(enable_buff == 1'b1)
begin
buff0 <= in_sample;
buff1 <= buff0;
buff2 <= buff1;
buff3 <= buff2;
buff4 <= buff3;
buff5 <= buff4;
buff6 <= buff5;
buff7 <= buff6;
buff8 <= buff7;
buff9 <= buff8;
buff10 <= buff9;
buff11 <= buff10;
buff12 <= buff11;
buff13 <= buff12;
buff14 <= buff13;
end
else
begin
buff0 <= buff0;
buff1 <= buff1;
buff2 <= buff2;
buff3 <= buff3;
buff4 <= buff4;
buff5 <= buff5;
buff6 <= buff6;
buff7 <= buff7;
buff8 <= buff8;
buff9 <= buff9;
buff10 <= buff10;
buff11 <= buff11;
buff12 <= buff12;
buff13 <= buff13;
buff14 <= buff14;
end
end
/* Multiply stage of FIR */
always @ (posedge clk)
begin
if (enable_fir == 1'b1)
begin
acc0 <= tap0 * buff0;
acc1 <= tap1 * buff1;
acc2 <= tap2 * buff2;
acc3 <= tap3 * buff3;
acc4 <= tap4 * buff4;
acc5 <= tap5 * buff5;
acc6 <= tap6 * buff6;
acc7 <= tap7 * buff7;
acc8 <= tap8 * buff8;
acc9 <= tap9 * buff9;
acc10 <= tap10 * buff10;
acc11 <= tap11 * buff11;
acc12 <= tap12 * buff12;
acc13 <= tap13 * buff13;
acc14 <= tap14 * buff14;
end
end
/* Accumulate stage of FIR */
always @ (posedge clk)
begin
if (enable_fir == 1'b1)
begin
m_axis_fir_tdata <= acc0 + acc1 + acc2 + acc3 + acc4 + acc5 + acc6 + acc7 + acc8 + acc9 + acc10 + acc11 + acc12 + acc13 + acc14;
end
end
endmodule
Create a Simulation Source for its Testbench
要测试 FIR 模块,需要创建一个测试平台作为新的模拟源:
在 FIR 模块中需要测试两个主要部分:滤波器数学和 AXI 流接口。为此,我在测试台中创建了一个状态机,用于生成一个简单的 200kHz 正弦波,同时切换 FIR 接口从属侧的有效信号和主控侧的就绪信号。
Testbench for FIR module:
`timescale 1ns / 1ps
module tb_FIR;
reg clk, reset, s_axis_fir_tvalid, m_axis_fir_tready;
reg signed [15:0] s_axis_fir_tdata;
wire m_axis_fir_tvalid;
wire [3:0] m_axis_fir_tkeep;
wire [31:0] m_axis_fir_tdata;
/*
* 100Mhz (10ns) clock
*/
always begin
clk = 1; #5;
clk = 0; #5;
end
always begin
reset = 1; #20;
reset = 0; #50;
reset = 1; #1000000;
end
always begin
s_axis_fir_tvalid = 0; #100;
s_axis_fir_tvalid = 1; #1000;
s_axis_fir_tvalid = 0; #50;
s_axis_fir_tvalid = 1; #998920;
end
always begin
m_axis_fir_tready = 1; #1500;
m_axis_fir_tready = 0; #100;
m_axis_fir_tready = 1; #998400;
end
/* Instantiate FIR module to test. */
FIR FIR_i(
.clk(clk),
.reset(reset),
.s_axis_fir_tdata(s_axis_fir_tdata),
.s_axis_fir_tkeep(s_axis_fir_tkeep),
.s_axis_fir_tlast(s_axis_fir_tlast),
.s_axis_fir_tvalid(s_axis_fir_tvalid),
.m_axis_fir_tready(m_axis_fir_tready),
.m_axis_fir_tvalid(m_axis_fir_tvalid),
.s_axis_fir_tready(s_axis_fir_tready),
.m_axis_fir_tlast(m_axis_fir_tlast),
.m_axis_fir_tkeep(m_axis_fir_tkeep),
.m_axis_fir_tdata(m_axis_fir_tdata));
reg [4:0] state_reg;
reg [3:0] cntr;
parameter wvfm_period = 4'd4;
parameter init = 5'd0;
parameter sendSample0 = 5'd1;
parameter sendSample1 = 5'd2;
parameter sendSample2 = 5'd3;
parameter sendSample3 = 5'd4;
parameter sendSample4 = 5'd5;
parameter sendSample5 = 5'd6;
parameter sendSample6 = 5'd7;
parameter sendSample7 = 5'd8;
/* This state machine generates a 200kHz sinusoid. */
always @ (posedge clk or posedge reset)
begin
if (reset == 1'b0)
begin
cntr <= 4'd0;
s_axis_fir_tdata <= 16'd0;
state_reg <= init;
end
else
begin
case (state_reg)
init : //0
begin
cntr <= 4'd0;
s_axis_fir_tdata <= 16'h0000;
state_reg <= sendSample0;
end
sendSample0 : //1
begin
s_axis_fir_tdata <= 16'h0000;
if (cntr == wvfm_period)
begin
cntr <= 4'd0;
state_reg <= sendSample1;
end
else
begin
cntr <= cntr + 1;
state_reg <= sendSample0;
end
end
sendSample1 : //2
begin
s_axis_fir_tdata <= 16'h5A7E;
if (cntr == wvfm_period)
begin
cntr <= 4'd0;
state_reg <= sendSample2;
end
else
begin
cntr <= cntr + 1;
state_reg <= sendSample1;
end
end
sendSample2 : //3
begin
s_axis_fir_tdata <= 16'h7FFF;
if (cntr == wvfm_period)
begin
cntr <= 4'd0;
state_reg <= sendSample3;
end
else
begin
cntr <= cntr + 1;
state_reg <= sendSample2;
end
end
sendSample3 : //4
begin
s_axis_fir_tdata <= 16'h5A7E;
if (cntr == wvfm_period)
begin
cntr <= 4'd0;
state_reg <= sendSample4;
end
else
begin
cntr <= cntr + 1;
state_reg <= sendSample3;
end
end
sendSample4 : //5
begin
s_axis_fir_tdata <= 16'h0000;
if (cntr == wvfm_period)
begin
cntr <= 4'd0;
state_reg <= sendSample5;
end
else
begin
cntr <= cntr + 1;
state_reg <= sendSample4;
end
end
sendSample5 : //6
begin
s_axis_fir_tdata <= 16'hA582;
if (cntr == wvfm_period)
begin
cntr <= 4'd0;
state_reg <= sendSample6;
end
else
begin
cntr <= cntr + 1;
state_reg <= sendSample5;
end
end
sendSample6 : //6
begin
s_axis_fir_tdata <= 16'h8000;
if (cntr == wvfm_period)
begin
cntr <= 4'd0;
state_reg <= sendSample7;
end
else
begin
cntr <= cntr + 1;
state_reg <= sendSample6;
end
end
sendSample7 : //6
begin
s_axis_fir_tdata <= 16'hA582;
if (cntr == wvfm_period)
begin
cntr <= 4'd0;
state_reg <= sendSample0;
end
else
begin
cntr <= cntr + 1;
state_reg <= sendSample7;
end
end
endcase
end
end
endmodule
在 "source "窗口的模拟来源下,右击测试平台模块并选择 “Set as Top”,将其设置为顶层文件。
Run a Behavioral Simulation
安装好 FIR 模块及其测试平台后,从 Flow Navigator 窗口启动 Vivado 中的仿真器,选择 Run Behavioral Simulation(运行行为仿真)选项(如果没有综合或实现结果,这是唯一可用的选项)。
正如行为仿真所显示的那样,FIR 正在对信号进行正确的滤波,并对 AXI 流信号做出正确的响应。
许多人可能会注意到,在使用这种特定 FIR 模块的设计上运行综合和实现,会导致设计无法满足时序要求(我相信阅读此文的经验丰富的 FPGA 工程师只需查看一下 FIR 模块的 Verilog 就能知道这一点)。这将在 FPGA DSP 系列的下一篇文章中讨论,因为它为我们提供了很好的见解,让我们了解在无法满足设置时序要求时如何重新思考设计。