为什么延迟太高,即使它只是 RGB 到灰色转换 (Vivado HLS)?

Why latency is too high even though it's just RGB to gray conversion (Vivado HLS)?

我正在 Vivado HLS 2015.4 上处理图像。

我的延迟非常高,约为 311774 个时钟周期。即使程序只需要两个输入图像并将其从 RGB 转换为灰色。总延迟为 311774,因为我得到所有三个 Axi2MatRGB2GRAYMat2AXI.

的 77-78k 延迟

有什么方法可以减少它,以便我可以将其流水线化,使最终延迟达到 78k 左右?

我附上我的代码和综合报告:

#include <hls_video.h>
#include <hls/hls_video_types.h>
#include "top.h"


void toGray(AXI_IN_STREAM &IN_STREAM_1, AXI_IN_STREAM &IN_STREAM_2, AXI_OUT_STREAM &OUT_STREAM_1, AXI_OUT_STREAM &OUT_STREAM_2, unsigned int cols, unsigned int rows){
    #pragma HLS INTERFACE axis port=IN_STREAM_1
    #pragma HLS INTERFACE axis port=OUT_STREAM_1

    #pragma HLS INTERFACE axis port=IN_STREAM_2
    #pragma HLS INTERFACE axis port=OUT_STREAM_2


    #pragma HLS RESOURCE core=AXI_SLAVE variable=rows metadata="-bus_bundle CONTROL"
    #pragma HLS RESOURCE core=AXI_SLAVE variable=cols metadata="-bus_bundle CONTROL"
    #pragma HLS RESOURCE core=AXI_SLAVE variable=return metadata="-bus_bundle CONTROL"

    #pragma HLS INTERFACE ap_stable port=rows
    #pragma HLS INTERFACE ap_stable port=cols

    hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC3> inMat_1(rows, cols);
    hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC3> inMat_2(rows, cols);

    hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC1> grayMat_1(rows, cols);
    hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC1> grayMat_2(rows, cols);


 // hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC1> outMat(rows, cols);

    hls::AXIvideo2Mat(IN_STREAM_1, inMat_1);
    hls::AXIvideo2Mat(IN_STREAM_2, inMat_2);

    hls::CvtColor<HLS_BGR2GRAY, HLS_8UC3, HLS_8UC1>(inMat_1, grayMat_1);
    hls::CvtColor<HLS_BGR2GRAY, HLS_8UC3, HLS_8UC1>(inMat_2, grayMat_2);
 // hls::EqualizeHist(grayMat, outMat );




    hls::Mat2AXIvideo(grayMat_1, OUT_STREAM_1);
    hls::Mat2AXIvideo(grayMat_2, OUT_STREAM_2);

}

UG902: Vivado Design Suite User Guide P. 293: Since the functions are already pipelined, adding the DATAFLOW optimization ensures the pipelined functions will execute in parallel.

因此,只需将 #pragma HLS dataflow 指令添加到您的代码中,即可确保您每个时钟处理一个样本,并在函数之间进行数据流处理。因此,延迟应该减少到 77-78k(我假设是 cols*rows)。