select 具有最佳预测能力的向量组合的自动程序

Automatic procedure to select combination of vectors with best predictive power

我有一个包含 23 个元素 (Var01-Var23)、增长和年份的时间序列的矩阵 (参见下面的数据)。 然后,我想知道是哪一组化学元素阻止了我的树的生长。

当我将每个单独的元素与 growth 变量相关联时,结果非常低。但是当我计算所有元素的 mean 并将其与 growth 变量相关联时,相关性会增加。或者,当我将增长与选定元素的 mean 相关联时,相关性会增加更多(例如,Var02、Var04 和 Var09 到 Var13 的平均值)。我有几棵树的数据,然后我需要一些脚本来自动执行我的计算,因为目前我正在以手动方式进行。

# Example:
# My data
data <- read.delim("clipboard")

# Correlation with Mean of all variables (r= -0.64)
cor.test(data$growth, rowMeans(data[,3:25]))

# Correlation with Mean of seven variables (r= -0.65)
cor.test(data$growth, rowMeans(data[,c(4,6,11:15)]))

基于 PCA 和层次聚类分析,我正在计算某些元素(具有所有可能的元素组组合)的 growthmean 之间的相关性。在这一点上,当我添加一个或多个元素时,我的相关性增加或减少,反之亦然。此外,我将growth与所有变量相关联,我只选择了第一个高相关元素,但这并不是我想要的。

我想知道与 growth 变量具有最高相关性的元素的精确组合。

我已经看到与 dplR 包的函数 strip.rwl() 的类似计算。此函数丢弃所有无助于改善某种相关性 (EPS) 的时间序列。此函数为您提供具有最高系列间相关性的精确变量。

类似,但我想知道给我最好的 mean 来计算与增长的最高相关性的变量的精确组合是什么?

是否已经有一个函数可以计算这个,或者你能给我一些指导吗?

数据:

year    growth  var01   var02   var03   var04   var05   var06   var07   var08   var09   var10   var11   var12   var13   var14   var15   var16   var17   var18   var19   var20   var21   var22   var23
2015    0.7355  -1.5093 -0.4718 0.4724  -1.2182 1.2762  2.1297  -0.3869 2.5204  0.5939  0.4910  -0.6156 -0.7688 -0.7586 -1.3430 2.0780  -1.5219 -1.2929 -0.6673 -1.2759 -1.3003 -1.1383 -1.0050 -1.0538
2014    1.0499  -1.5731 -0.8291 0.3671  -1.2009 1.2957  1.5425  -0.5630 1.3536  1.9637  0.4473  -0.5620 -0.6266 -0.6508 -1.4364 1.5605  -1.6061 -0.9849 -0.4060 -1.4001 -0.5928 -0.9355 -0.9086 -1.1744
2013    0.3023  -1.3480 -0.9086 1.0562  -1.4676 1.6038  1.6197  -0.9654 0.8675  -0.5802 0.3953  -0.6743 -1.2814 -0.9289 -1.7962 1.6508  -1.6769 -1.2483 -0.4202 -1.7725 -0.5117 -0.8456 -1.1292 -1.1822
2012    0.4691  -1.4137 -1.3467 0.8196  -0.9280 1.5524  1.6472  -0.6955 1.2168  0.3453  0.4625  -0.4587 -0.8154 -0.4288 -1.8088 1.5263  -1.5974 -1.2437 -0.6041 -1.9023 -0.4178 -0.7794 -1.0962 -1.4127
2011    1.5789  -1.3981 -0.9584 1.1080  -1.5778 1.3515  1.4592  -1.2458 1.2315  -0.8985 0.3111  -0.7732 -1.5045 -1.3703 -1.8152 1.4385  -1.5522 -1.3145 -0.5637 -1.7576 -0.8766 -1.0122 -1.5530 -1.3255
2010    1.2607  -1.3424 -0.8443 1.1052  -1.1118 1.3660  1.3892  -1.0938 1.2785  -0.9152 0.3842  -0.4285 -0.5272 -0.5904 -1.7213 1.4067  -1.4321 -0.2067 -0.3469 -1.7433 -0.1480 -0.2845 -0.7672 0.7892
2009    -0.5909 -0.7340 -0.5030 0.1297  -0.9245 1.3174  1.2938  -0.4455 1.0780  -0.5008 0.3727  -0.3743 -0.6238 -0.6003 -1.8261 1.3141  -0.9217 -0.3761 -0.4310 -1.8382 -0.3053 -0.5241 -0.7800 0.2114
2008    -0.6983 -1.1673 -0.1231 0.5825  -1.2651 1.6569  1.3955  -0.7211 0.9760  -0.7845 0.3861  0.0815  -0.3776 -0.6394 -1.7565 1.4102  -1.1523 -0.4078 -0.3300 -1.8786 -0.4062 -0.4016 -0.4600 -0.0071
2007    0.1183  -1.3682 -0.9443 -0.0971 -1.3220 1.3293  1.2750  -0.8371 1.0943  -1.0089 0.3398  -0.4498 -1.2298 -1.2716 -1.6478 1.3486  -1.3283 -0.7784 -0.2968 -1.5428 -0.1280 -0.7084 -0.9944 -0.5956
2006    0.5534  -1.2314 -0.7071 -0.7115 -1.2178 1.0437  1.1603  -0.8700 1.0272  -0.7154 0.3785  -0.7280 -0.8290 -0.7931 -1.4326 1.2680  -1.2664 -1.1744 -0.3001 -1.2794 -0.2198 -0.3347 -0.7982 -0.8576
2005    -0.1290 -0.6186 -0.7554 -0.8953 -0.7179 0.8193  1.2671  0.0563  1.1459  -0.3939 0.5788  -0.4018 -0.4051 -0.0909 -0.3482 1.1628  -0.6649 -0.5135 -0.1499 -0.5293 1.7077  1.3397  -0.5928 -0.3255
2004    -1.3960 -0.7433 3.3826  0.0168  0.6878  2.3632  2.0095  1.4930  1.1984  1.6115  0.9325  6.0082  3.4345  4.0293  -0.7450 1.8678  -1.1456 1.7131  -0.0351 -0.8672 -1.1172 0.6631  1.9708  1.3615
2003    -1.2024 -1.1421 1.4662  0.1289  -0.2702 1.5784  1.8177  0.4645  1.0738  0.3043  0.6369  0.1920  0.5853  0.6188  -0.9899 1.7807  -1.1835 0.8453  -0.3640 -0.8570 -0.2409 -0.7386 0.7062  0.1594
2002    -1.2695 -1.2228 -0.7048 -0.5232 -0.6171 1.9463  1.6148  0.2365  0.7777  1.1024  0.6519  0.3360  0.8234  0.8263  -0.7476 1.6094  -1.3752 1.7203  -0.2869 -0.9334 0.3433  0.2585  0.9249  0.3958
2001    2.2805  -2.0591 -0.8628 -3.9804 -1.5726 0.1381  0.5142  -0.6226 0.3808  -1.2949 0.7876  -0.5202 -0.4475 -0.2446 -1.7901 0.7399  -2.4933 0.4975  0.3481  -1.8721 2.6124  2.2446  -0.6788 0.8185
2000    1.2243  -0.3546 -0.3555 -0.9080 0.6723  0.7487  1.2527  0.2992  1.7156  1.4165  2.3882  -0.1943 -0.5190 -0.0279 -0.9017 1.3534  -0.8618 -0.0725 1.9422  -0.8831 2.9700  2.1966  -0.3275 0.5609
1999    1.1170  -1.0957 -0.5224 -0.9462 -0.8757 0.8819  1.2233  -0.1417 1.4077  0.3444  0.7656  -0.5936 -0.5658 -0.5246 1.7319  1.2449  -0.9524 0.1268  0.5618  1.6841  2.6959  2.1735  -0.6886 -0.0353
1998    1.5080  -0.9427 -0.5292 -1.9068 -0.9694 0.5602  0.9686  -0.5317 0.9646  -0.5927 0.5123  -0.5573 -0.6479 -0.6202 -1.2297 1.1412  -1.0400 -0.3132 -0.1830 -1.1215 2.2558  2.0275  -0.7357 -0.2513
1997    1.2109  -1.2957 -0.2941 -1.7398 -0.4912 0.4444  0.7539  -0.4458 0.7561  -0.3942 0.4130  -0.6011 -0.5307 -0.1443 -1.0119 0.8553  -1.2888 -1.1635 -0.2902 -1.0984 -0.3384 -0.7099 -0.7028 -0.6451
1996    1.9374  -0.9590 -0.5634 -1.1346 -0.8146 0.1582  0.5529  -0.7560 0.3428  -0.7315 0.3842  -0.9918 -1.1590 -0.8275 -1.0101 0.6369  -0.9615 -1.1021 -0.2902 -1.0126 -0.3163 -0.3251 -1.0070 -0.8537
1995    -0.0906 -0.6876 -0.5482 -0.7573 -0.8502 0.0115  0.3705  -0.8278 0.3808  -0.4898 0.4215  -0.7297 -0.9322 -0.9299 -0.4233 0.2834  -0.5797 -0.7969 0.0192  -0.4001 -0.0401 -0.2343 -0.8086 -0.4427
1994    -0.6005 -0.0975 -0.6942 -0.5902 -0.0880 0.3175  0.1632  0.7675  0.5207  1.0282  0.5350  0.0978  0.7515  0.6528  0.2238  0.1230  0.0881  -0.4784 0.2601  0.2429  0.1688  0.1807  0.9592  0.0418
1993    -0.6331 -0.1956 0.0405  -0.8816 -0.7516 -0.2010 -0.1611 -0.4313 0.3369  1.1248  0.4827  -0.8354 -0.7042 -0.9057 0.2060  -0.1867 0.1458  -0.7784 -0.0915 0.3305  -0.0211 -0.3948 -0.2624 -0.1793
1992    -1.4688 0.9073  3.1301  0.8338  2.5185  1.8601  0.5797  2.6687  1.9719  4.6638  1.2484  2.0706  2.6003  2.8108  1.4070  0.3997  0.9439  2.4897  1.1890  1.3518  1.0730  0.8020  3.1918  2.0749
1991    0.1202  -0.0531 -0.0162 0.0803  -0.1292 -0.2617 0.1123  -0.2343 0.1041  0.1202  0.5609  -0.2215 0.2995  0.3069  0.1439  0.1829  0.2389  1.6160  -0.0490 0.4088  0.8250  -0.1432 0.4282  2.4041
1990    0.7739  0.1974  -0.2758 -0.2005 -0.4915 -0.4294 -0.1930 -0.3400 -0.1200 -0.6279 0.4248  -0.5160 -0.3597 -0.2878 0.1752  -0.2011 0.5998  -0.2399 -0.3001 0.3875  0.4752  -0.2927 -0.4844 0.4067
1989    1.4409  -0.1443 -0.4217 -0.7657 -1.0045 -1.0412 -0.4217 -0.8033 -0.1983 -0.6299 0.4042  -0.6739 -0.4189 -0.2231 -0.0075 -0.4847 0.1006  -1.2786 -0.2707 0.0786  -0.0884 -0.3562 -0.5116 -0.9430
1988    0.0627  0.0790  -0.3464 -0.9041 -1.0201 -0.6182 -0.5480 -0.4564 -0.2100 -0.9290 0.4112  -0.8355 -0.7661 -0.8463 0.1953  -0.5956 0.2049  -0.5149 0.0899  0.2429  0.2949  -0.3993 -0.4629 -1.6034
1987    -0.0772 0.3881  -0.1524 -0.5225 0.1709  -0.6857 -0.4846 0.3772  -0.2441 0.0915  -0.7341 -0.1923 0.1421  1.1470  0.5189  -0.6089 0.6509  0.1051  -0.1116 0.5543  -0.3053 0.2307  -0.0151 1.7381
1986    0.9272  0.1445  -0.0866 -0.5388 -0.2163 -0.9456 -0.5572 -0.6548 -0.2948 -1.0206 -0.9377 -0.5081 -0.0798 -0.4386 0.3222  -0.6297 0.4495  -0.5263 -0.2200 0.2290  -1.2341 -0.6906 -0.2357 0.2041
1985    1.1131  0.0253  -0.4906 -0.5509 -0.2975 -0.9225 -0.6997 -0.6370 -0.0884 -0.3245 -1.1308 -0.7209 -0.8256 -0.9482 0.2401  -0.7110 0.2473  -0.7735 -0.4202 0.3121  -1.2059 -0.7069 -0.8015 -0.2975
1984    1.8013  0.2284  -0.5440 -0.1447 -0.4211 -0.8235 -0.6575 -0.7925 -0.1492 -0.9069 -1.1281 -0.7848 -0.6862 -0.7732 0.4834  -0.6695 0.5650  -1.0714 -0.4167 0.4992  -0.8736 -0.3715 -0.8133 -0.7442
1983    0.9502  0.0912  -0.3639 -0.6158 -0.3501 -0.8718 -0.7431 -0.8826 -0.8025 -1.1922 -1.1150 -0.4250 0.2337  -0.0468 0.4108  -0.7577 0.3113  -0.5710 -0.3989 0.3311  -1.8749 -1.9467 0.0756  -0.6873
1982    -0.9436 0.9775  0.7716  0.2479  0.9319  -0.1762 -0.6461 -0.1360 -1.8662 -0.3247 -1.1207 -0.1339 1.3616  0.7641  0.7340  -0.7842 0.7577  1.3445  -0.4060 0.5994  0.1340  0.3584  1.7261  0.3133
1981    -0.9666 0.4902  -0.6478 -0.4053 0.9247  -0.2615 -0.8484 -0.2314 -1.6397 -0.8245 -1.3230 0.5140  1.0220  1.1154  0.6691  -0.8737 0.4194  0.8998  -0.7992 0.5904  0.1163  0.3389  0.8467  -0.1124
1980    -1.2541 1.7357  1.7745  1.7795  1.6191  0.5960  -0.7116 1.6093  -1.5534 0.7996  -0.6887 1.5322  2.6049  2.8926  1.2064  -0.6855 1.1873  1.7319  -0.0972 1.1336  0.1602  0.3776  2.4082  0.2887
1979    0.9234  0.3475  -0.4153 -0.5224 0.1081  -0.8532 -0.9357 -0.0814 -0.5229 -1.0353 -1.0966 -0.1029 0.0437  -0.0171 0.4097  -0.7625 0.3251  1.0325  -0.3744 0.3936  -0.8812 -0.3848 -0.2112 2.3548
1978    0.5477  0.3268  -0.4658 -0.4173 -0.2025 -0.9164 -0.8704 -0.6012 -0.4608 -1.0767 -0.9210 -0.3823 -0.1133 -0.5651 0.3620  -0.7968 0.4289  -0.2014 -0.2075 0.4020  -1.1600 -0.8439 -0.2556 0.4952
1977    0.2065  0.4730  -0.1826 0.0941  -0.1233 -0.9184 -0.7367 -0.0390 -0.6020 -0.5310 -0.8984 0.0119  0.1358  0.2026  0.7328  -0.7439 0.6711  -0.1219 -0.1921 0.7615  -0.4529 -0.1719 -0.1456 0.0670
1976    0.9004  0.3434  -0.3918 0.7682  0.1170  -0.7074 -0.5424 -0.7441 -0.1253 -0.0578 -1.0464 -0.4098 -0.9362 0.0920  1.0475  -0.7216 0.9563  -0.6707 -0.3166 0.8140  -0.5776 -0.0814 -0.7861 -0.4623
1975    0.6972  0.1800  -0.8167 0.1120  -0.1196 -0.8904 -0.8327 -0.1215 -0.4762 -0.4726 0.3593  -0.3703 -0.2568 -0.2583 0.7316  -0.8034 0.5018  -1.0593 -0.5518 0.6697  -1.2341 -0.7069 -0.5976 -0.8553
1974    -1.2120 1.1185  1.5724  0.8716  0.7787  0.1795  -0.8420 0.6933  -1.9117 0.0430  -1.2172 0.1498  -0.1294 -0.4590 1.3863  -0.7974 1.0743  1.1988  -0.5597 1.3714  0.9767  1.0511  0.4705  0.4016
1973    -0.9628 0.4276  -0.0410 0.1087  0.2920  -0.2293 -0.9448 -0.2077 -1.0053 -0.5260 -0.9422 -0.1569 -0.8008 -0.5443 0.4898  -0.8743 0.4793  -0.7277 -0.2231 0.5532  0.1515  0.3713  -0.4173 -0.1950
1972    -0.3417 0.5028  -0.0892 0.1256  0.2009  -0.8769 -0.9136 -0.4308 -1.1777 -0.5572 -1.3502 0.0552  0.2952  -0.0647 0.4340  -0.8732 0.5752  -0.7586 -0.8786 0.4637  0.1074  0.3324  0.1065  -0.7151
1971    -1.0663 1.4514  0.7596  1.0064  1.4265  -0.1857 -0.6966 0.0646  -1.5851 -0.1682 -0.9928 0.8045  1.2418  1.5867  1.0944  -0.7248 1.0222  0.3959  -0.2643 1.1645  0.1427  0.3649  1.0996  -0.4215
1970    -0.1731 0.2830  -0.2867 0.3696  0.1652  -0.7090 -0.8112 -0.3736 -0.3142 -0.5996 0.4910  0.0731  -0.0576 0.2005  0.6857  -0.6739 0.5739  1.4645  0.1127  0.8136  -0.9955 -0.3454 -0.2405 2.7218
1969    0.5496  0.0708  -1.1229 -0.2215 -0.0224 -1.1272 -1.0123 -0.2998 0.0002  0.7823  -1.2852 -0.4108 -0.6424 -0.0993 0.4756  -0.9020 0.2522  0.0435  -0.7023 0.4398  -1.7905 -1.7240 -0.7576 0.4713
1968    0.4557  0.3754  -0.1914 0.0782  -0.0442 -1.0424 -0.8591 -0.5058 -0.0528 -0.6357 -0.8632 -0.2303 -0.0315 -0.4494 0.8054  -0.8174 0.4188  -0.1702 -0.1709 0.8400  0.1515  0.3713  -0.0779 0.1860
1967    -0.0331 0.4082  -0.1135 0.3716  0.3029  -0.8140 -0.8333 -0.3632 -0.3441 -0.6185 -0.8795 -0.0115 0.4341  -0.2016 0.5365  -0.8199 0.6325  0.4662  -0.1799 0.5568  0.1515  0.3713  0.3658  0.0823
1966    -1.3091 2.4265  1.9120  3.5159  1.7204  0.3925  -0.3358 4.0710  -1.1894 1.1428  -0.7170 1.0888  2.2487  1.3671  1.8517  -0.4379 2.4150  2.5052  -0.1059 1.8671  0.1602  0.3776  2.4620  1.4373
1965    -0.3865 0.6265  0.4303  0.2184  0.2952  -0.5675 -0.8397 -0.5385 -0.4155 0.8830  0.1583  0.2678  0.6830  0.5453  0.7177  -0.8522 0.7218  0.2577  -0.0269 0.6639  0.1688  0.3840  0.4992  -0.2566
1964    0.0761  0.2908  -0.2917 0.0262  0.8445  -0.8938 -0.9006 -0.4216 -0.4443 0.3291  -0.7933 -0.2331 0.0098  -0.4144 0.2669  -0.7975 0.4580  -0.4564 -0.1351 0.3079  -1.7992 -4.0357 -0.1313 -0.7129
1963    -0.5091 0.3587  -0.3000 0.2313  0.6458  -0.8484 -0.8979 0.2260  -0.5598 0.7006  -0.9991 -0.1602 -0.2566 -0.6252 0.3226  -0.9419 0.5258  0.0035  -0.2707 0.2862  0.1427  0.3649  -0.1694 -1.1526
1962    -1.1161 0.7858  0.1291  0.9148  0.5731  -0.5193 -0.7648 0.1396  -0.5341 0.1245  -1.0139 1.0248  0.4818  0.6698  0.7706  -0.7147 0.9169  0.3912  -0.2837 0.8386  0.1427  0.3642  0.2324  -1.1177
1961    -0.9015 0.5666  0.1987  0.3679  0.9936  -0.8241 -0.7050 0.9734  -0.6841 0.1911  -1.0430 0.0148  0.1850  -0.2893 0.4734  -0.8045 0.6152  -0.3412 -0.3133 0.3074  0.1427  0.3649  0.1296  -1.4248
1960    -1.0433 0.5102  0.4145  0.5807  0.9967  -0.7398 -0.7065 1.4128  -0.4794 1.3586  0.5110  -0.0877 -0.4572 -0.5883 0.6178  -0.6754 0.7916  0.3395  0.1253  0.5598  -0.2515 0.0197  -0.2512 2.0852
1959    -0.8593 0.4067  -0.3155 0.3043  0.1448  -0.9809 -0.7874 0.0866  -0.6011 -0.3097 -0.8248 -0.2912 -0.7388 -0.0724 0.3642  -0.7552 0.6715  -0.2708 -0.1499 0.3971  0.1515  0.3713  -0.4288 0.4693
1958    -1.3883 1.4330  0.6080  0.7613  2.0997  -0.0061 -0.8815 0.6944  -0.7064 1.4439  0.4042  0.3192  0.0227  -0.1728 1.2644  -0.7873 1.5796  0.5583  0.5341  1.2738  0.2201  0.4340  0.4769  0.8129
1957    -0.1501 0.0788  -0.6378 0.5301  -0.0052 -1.4811 -0.7726 -0.5773 -0.1142 -0.3035 0.3192  -0.3458 -0.2647 -0.3562 0.4415  -0.8422 0.7440  -1.1722 0.0219  0.2437  0.1688  0.3840  -0.4811 -0.7914
1956    -1.1660 0.9024  0.9387  -0.4486 0.7415  -0.7435 -0.9087 0.5561  -1.5628 -0.0826 3.3428  0.8977  0.0357  -0.6959 0.0996  -0.8731 0.1990  1.3175  5.4730  -0.0201 1.4525  1.4838  0.7398  0.2129
1955    -1.2810 2.6427  3.0556  1.7682  2.6251  -0.2087 -0.3884 2.1850  -1.2176 1.1117  2.8520  1.7799  2.3472  1.9516  0.9034  -0.6220 1.4277  1.9119  4.3493  1.1407  1.0612  1.1603  2.8571  0.6604
1954    -0.9513 1.7707  1.0759  -0.4117 1.8763  -0.7741 -0.9430 2.8715  -0.9605 0.0427  0.9580  0.7628  -0.5342 0.0212  -0.2429 -0.9756 0.4455  0.4704  0.5851  -0.1011 0.2286  0.4340  0.1032  -0.4706
1953    -0.7213 0.3247  0.8136  -0.1111 0.2263  -1.6928 -0.7778 0.1082  -0.5078 -0.6086 -0.8420 0.6116  0.8281  -0.0156 0.0789  -0.8039 0.2684  -0.2745 -0.1588 -0.0194 0.1515  0.3713  0.7915  -0.9210

首先,对于 23 个变量,您有大约 8,388,607 种可能的组合 (2^23 - 1)。可能你不想去那里。

你可以做的是尝试找到一种算法,让你的组合越来越好。我依靠 stepwise regression 中向后消除背后的逻辑来做到这一点。基本上,我从所有变量的行均值开始,然后一次删除每个变量,计算每个 n-1 组合的行均值和相关性,然后选择最好的一个并重新做一遍整个过程:查看所有可能的 n-2 组合,计算相关性等。您已经明白了。

现在,不能保证这可以找到最佳组合,但结果 (r = -0.76) 是一个很大的改进。

如果您尝试向前选择,看看会发生什么会很有趣。

代码如下:

vars <- data[,-1:-2] #create data frame with only  the variables in it

best <- NA # stores the best correlation in any given iteration
var_names <- NULL #stores variables in the given iteration

while(ncol(vars) > 2 ){ 
        temp_mean <- NULL #stores rowmeans
        temp_cor <- NULL #stores correlations
        var_names <- c(var_names, list(names(vars))) #updates list with variable names
        for(i in 1:ncol(vars)){ 
            low_vars <- vars[,-i] #drop each variable one-by-one
            temp_mean <- rowMeans(low_vars[,1:ncol(low_vars)]) #calculate means
            temp_cor <- c(temp_cor, cor(temp_mean, data$growth)) #calculate correlation
        }
    best <- c(best, min(temp_cor)) #best correlation of given iteration stored
    vars <- vars[, -which.min(temp_cor)] #update vars to best combination
}
min(best, na.rm = T) #check what is the best correlation found
var_names[[which.min(best)]] #check what combination lead to it