查找忽略 NA 的数据帧之间的协方差
Find Covariance between dataframes ignoring NA
我有一个数据框,其中每一列都是一个时间序列。我想找到其中一列与数据框其余部分之间的相关性。问题是,数据框的其余部分包含 NA。有没有一种简洁的方法可以找到所有这些忽略每一列中的 NA 的协方差?
我只能找到一种使用 for 循环执行此操作的方法,这并不理想。
这是一个示例数据
structure(list(Date = structure(c(18628, 18629, 18630, 18631,
18632, 18633, 18634, 18635, 18636, 18637), class = "Date"), X1 = c(NA,
NA, NA, NA, 1.16092168555067, 0.591202293337843, -0.279052669225263,
-0.780435476613128, -0.852870619718068, -0.708611614262357),
X2 = c(NA, NA, -0.222767493777229, 1.50328295132467, 0.934670132217215,
1.37678188537077, 0.343280062984192, 1.23279081824003, -1.08074586121729,
0.208120194894818), X3 = c(NA, NA, NA, NA, NA, 1.72057538716556,
1.37803710718683, 1.24717457500191, -0.00930256437131184,
0.491423553538728), X4 = c(1.15304498847709, -0.154433520961086,
-0.361871232243227, -0.981985961481073, 0.596667113671836,
-0.0960746707238904, -1.53792603627306, 1.00296956396233,
0.128292175597246, -1.12744557711187)), row.names = c(NA,
-10L), class = "data.frame")
使用此数据,我想获得一个具有 x1 和 x4 之间的协方差的向量,删除 x1 中的所有 NA,然后是 x2 和 x4 之间的协方差,删除 x2 中的所有 NA,等等。我
我认为您可以在函数 cov()
中使用参数 use = complete
来忽略 NA 值(也可以使用其他选项)。这可能对您有用(该示例使用您的数据框计算 var-cov 矩阵。
aa = structure(list(Date = structure(c(18628, 18629, 18630, 18631,
18632, 18633, 18634, 18635, 18636, 18637), class = "Date"), X1 = c(NA,
NA, NA, NA, 1.16092168555067, 0.591202293337843, -0.279052669225263,
-0.780435476613128, -0.852870619718068, -0.708611614262357),
X2 = c(NA, NA, -0.222767493777229, 1.50328295132467, 0.934670132217215,
1.37678188537077, 0.343280062984192, 1.23279081824003, -1.08074586121729,
0.208120194894818), X3 = c(NA, NA, NA, NA, NA, 1.72057538716556,
1.37803710718683, 1.24717457500191, -0.00930256437131184,
0.491423553538728), X4 = c(1.15304498847709, -0.154433520961086,
-0.361871232243227, -0.981985961481073, 0.596667113671836,
-0.0960746707238904, -1.53792603627306, 1.00296956396233,
0.128292175597246, -1.12744557711187)), row.names = c(NA,
-10L), class = "data.frame")
cov(aa[, -1], use = 'complete')
X1 X2 X3 X4
X1 0.36049927 0.3436964 0.319734021 -0.095666285
X2 0.34369636 0.9697499 0.620778723 0.220293460
X3 0.31973402 0.6207787 0.498663689 -0.003728815
X4 -0.09566628 0.2202935 -0.003728815 1.034121716
编辑 用于成对完整的观测 use = 'pairwise.complete.obs'
cov(aa[, -1], use = 'pairwise.complete.obs')
X1 X2 X3 X4
X1 0.6975825 0.41039379 0.319734021 0.164427330
X2 0.4103938 0.80303434 0.620778723 0.091645141
X3 0.3197340 0.62077872 0.498663689 -0.003728815
X4 0.1644273 0.09164514 -0.003728815 0.809167748
# Test using aa[5:10, 2] vs aa[5:10, 5]
cov(aa[5:10, 2], aa[5:10, 5])
[1] 0.1644273
我有一个数据框,其中每一列都是一个时间序列。我想找到其中一列与数据框其余部分之间的相关性。问题是,数据框的其余部分包含 NA。有没有一种简洁的方法可以找到所有这些忽略每一列中的 NA 的协方差?
我只能找到一种使用 for 循环执行此操作的方法,这并不理想。 这是一个示例数据
structure(list(Date = structure(c(18628, 18629, 18630, 18631,
18632, 18633, 18634, 18635, 18636, 18637), class = "Date"), X1 = c(NA,
NA, NA, NA, 1.16092168555067, 0.591202293337843, -0.279052669225263,
-0.780435476613128, -0.852870619718068, -0.708611614262357),
X2 = c(NA, NA, -0.222767493777229, 1.50328295132467, 0.934670132217215,
1.37678188537077, 0.343280062984192, 1.23279081824003, -1.08074586121729,
0.208120194894818), X3 = c(NA, NA, NA, NA, NA, 1.72057538716556,
1.37803710718683, 1.24717457500191, -0.00930256437131184,
0.491423553538728), X4 = c(1.15304498847709, -0.154433520961086,
-0.361871232243227, -0.981985961481073, 0.596667113671836,
-0.0960746707238904, -1.53792603627306, 1.00296956396233,
0.128292175597246, -1.12744557711187)), row.names = c(NA,
-10L), class = "data.frame")
使用此数据,我想获得一个具有 x1 和 x4 之间的协方差的向量,删除 x1 中的所有 NA,然后是 x2 和 x4 之间的协方差,删除 x2 中的所有 NA,等等。我
我认为您可以在函数 cov()
中使用参数 use = complete
来忽略 NA 值(也可以使用其他选项)。这可能对您有用(该示例使用您的数据框计算 var-cov 矩阵。
aa = structure(list(Date = structure(c(18628, 18629, 18630, 18631,
18632, 18633, 18634, 18635, 18636, 18637), class = "Date"), X1 = c(NA,
NA, NA, NA, 1.16092168555067, 0.591202293337843, -0.279052669225263,
-0.780435476613128, -0.852870619718068, -0.708611614262357),
X2 = c(NA, NA, -0.222767493777229, 1.50328295132467, 0.934670132217215,
1.37678188537077, 0.343280062984192, 1.23279081824003, -1.08074586121729,
0.208120194894818), X3 = c(NA, NA, NA, NA, NA, 1.72057538716556,
1.37803710718683, 1.24717457500191, -0.00930256437131184,
0.491423553538728), X4 = c(1.15304498847709, -0.154433520961086,
-0.361871232243227, -0.981985961481073, 0.596667113671836,
-0.0960746707238904, -1.53792603627306, 1.00296956396233,
0.128292175597246, -1.12744557711187)), row.names = c(NA,
-10L), class = "data.frame")
cov(aa[, -1], use = 'complete')
X1 X2 X3 X4
X1 0.36049927 0.3436964 0.319734021 -0.095666285
X2 0.34369636 0.9697499 0.620778723 0.220293460
X3 0.31973402 0.6207787 0.498663689 -0.003728815
X4 -0.09566628 0.2202935 -0.003728815 1.034121716
编辑 用于成对完整的观测 use = 'pairwise.complete.obs'
cov(aa[, -1], use = 'pairwise.complete.obs')
X1 X2 X3 X4
X1 0.6975825 0.41039379 0.319734021 0.164427330
X2 0.4103938 0.80303434 0.620778723 0.091645141
X3 0.3197340 0.62077872 0.498663689 -0.003728815
X4 0.1644273 0.09164514 -0.003728815 0.809167748
# Test using aa[5:10, 2] vs aa[5:10, 5]
cov(aa[5:10, 2], aa[5:10, 5])
[1] 0.1644273