在 Python 中使用 statsmodels 减少动态因素模型估计的时间

Question

我试图在 Python 中使用 statsmodels 估计动态因子模型，按照示例 https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_dfm_coincident.html 但是，我没有使用示例数据集，而是使用了我自己的包含 282 个变量的数据集，其中包含 124 个观察值（不同国家/地区的每月 inflation 比率）。然而，在运行编写代码超过六个小时后，我没有得到任何结果。尝试使用不同数量的变量和不同的求解器，我得到了这些时间估计：

Number of variables Initial params in seconds   Model estimate in seconds
Powell solver: 
    10                  57,3                        4,9
    20                  167,6                       19,9
    40                  1498,8                      137,8
BFGS
    10                  9,1                         6,3
    20                  89,2                        18,5
    40                  597,5                       138,2

根据这些计算，时间正在增长 n^2*log(n)，这意味着要使用 Powell 求解器计算所有 280 个变量的模型，我将需要大约 30 个小时，这太长了。 BFGS 更快，但是对于 20 和 40 个变量，我发现似然优化无法收敛。

我运行在我的笔记本电脑（WIN10、32gb、i7-4700MQ、2.40GHz）上运行它，它看起来并没有用完所有资源，只使用了大约 10gb 的内存并且CPU 的 ~25-50%。所以问题是如何使 DFM 模型的估计更快并收敛？如果我运行云上的此代码（例如 Amazone 或 Google with 32-64 CPUs），多线程是否有助于提高速度，或者统计模型的并行性几乎没有改进？切换到 Matlab 或其他软件进行此类计算是否有意义？ scipy.optimize 中有一些大型问题的求解器（如 krylov、broyden2 或 anderson），但我不确定它们是否可以与 statsmodels.LikelihoodModel.fit 一起使用。

对于如何加快估算速度的任何想法，我将不胜感激！代码 I 运行:

 import statsmodels.api as sm
    import time
    # Create the model
    mod = sm.tsa.DynamicFactor(data_cpi, k_factors=3, factor_order=1, error_order=1)
    tic = time.perf_counter()
    initial_res = mod.fit(method='powell', disp=True)
    toc = time.perf_counter()
    print(f"Initial params in {toc - tic:0.4f} seconds")
    res = mod.fit(initial_res.params, disp=True)
    tic = time.perf_counter()
    print(f"Model estimate in {tic - toc:0.4f} seconds")
    print(res.summary(separate_params=False))

Answer 1

如果不需要参数的标准误差，减少拟合时间的一种方法是将 cov_type='none' 传递给 fit 方法。但它仍然会很慢。当使用像 BFGS 这样的拟牛顿方法甚至像 Powell 这样的无导数方法时，对具有大量变量的动态因子模型的参数进行数值优化会非常慢。

大型动态因素模型通常通过使用 EM 算法优化参数来实现。 Statsmodels 在 v0.11 中没有该选项，但它很可能会在 v0.12 版本中用于动态因子模型。

在 Python 中使用 statsmodels 减少动态因素模型估计的时间

Reducing the time of dynamic factor model estimation with statsmodels in Python

python

statsmodels