将预缓存数据绘制到许多子图中时，Matplotlib 速度很慢

Question

虽然周围有很多matplotlib优化帖子，但我没有在这里找到我想要的确切提示，例如： Matplotlib slow with large data sets, how to enable decimation?

Matplotlib - Fast way to create many subplots?

我的问题是我缓存了时间序列数据的 CSV 文件（其中 40 个）。我想将它们绘制在一个垂直系列中有 40 个子图的图中，并将它们输出到单个光栅化图像。

我使用matplotlib的代码如下：

def _Draw(self):
    """Output a graph of subplots."""
    BigFont = 10
    # Prepare subplots.
    nFiles = len(self.inFiles)
    fig = plt.figure()
    plt.axis('off')
    for i, f in enumerate(self.inFiles[0:3]):
        pltTitle = '{}:{}'.format(i, f)
        colorFile = self._GenerateOutpath(f, '_rgb.csv')
        data = np.loadtxt(colorFile, delimiter=Separator)
        nRows = data.shape[0]
        ind = np.arange(nRows)
        vals = np.ones((nRows, 1))
        ax = fig.add_subplot(nFiles, 1, i+1)
        ax.set_title(pltTitle, fontsize=BigFont, loc='left')
        ax.axis('off')
        ax.bar(ind, vals, width=1.0, edgecolor='none', color=data)
    figout = plt.gcf()
    plt.savefig(self.args.outFile, dpi=300, bbox_inches='tight')

脚本挂了一晚上。平均而言，我的数据都是 ~10,000 x 3 到 ~30,000 x 3 矩阵。

就我而言，我不认为我可以使用 memmapfile 来避免内存占用，因为子图似乎是这里的问题，而不是每个循环导入的数据。

我不知道从哪里开始优化这个工作流程。然而，我可以忘记子图，一次为每个数据生成一个绘图图像，然后拼接 40 张图像，但这并不理想。

matplotlib 中有没有简单的方法来做到这一点？

Answer 1

您的问题在于您绘制数据的方式。

使用 bar 绘制数万个大小完全相同的柱状图与使用 imshow 完成相同的事情相比非常效率低下。

例如：

import numpy as np
import matplotlib.pyplot as plt

# Random r,g,b data similar to what you seem to be loading in....
data = np.random.random((30000, 3))

# Make data a 1 x size x 3 array
data = data[None, ...]

# Plotting using `imshow` instead of `bar` will be _much_ faster.
fig, ax = plt.subplots()
ax.imshow(data, interpolation='nearest', aspect='auto')
plt.show()

这应该基本上等同于您当前正在执行的操作，但会绘制得更快并且使用更少的内存。

将预缓存数据绘制到许多子图中时，Matplotlib 速度很慢

Matplotlib slow when plotting pre-cached data into many subplots

python

matplotlib