距离矩阵到重塑按位置排序的距离向量
Distance matrix to reshaped vector of distances sorted by positions
我有一个5项的数据框,如下:
df = structure(list(item1 = c(1, 1, 1, 1, 2, 2, 2, 3, 3, 4), item2 = c(0,
2, 3, 4, 0, 3, 4, 0, 4, 0)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
此外,我有物品之间的距离矩阵:
Dist1 = structure(c(0, 1.0919530596119, 1.09195161858136, 1.0919463791331,
1.09194754111203, 1.0919530596119, 0, 1.7831197560388, 1.78314749640301,
1.78315668532962, 1.09195161858136, 1.7831197560388, 0, 1.78315765983813,
1.78314839437957, 1.0919463791331, 1.78314749640301, 1.78315765983813,
0, 1.78314787222978, 1.09194754111203, 1.78315668532962, 1.78314839437957,
1.78314787222978, 0), .Dim = c(5L, 5L), .Dimnames = list(c("1",
"2", "3", "4", "5"), c("1", "2", "3", "4", "5")))
我想向 df
添加第三个列,它将包含以某种方式从 Dist1
中提取的距离。它们必须与 df
中的索引指定的顺序相同,不包含自距离等
现在,这几乎是 Dist1
的下三角,但不完全是。 (另请注意,Dist1
中的项目是 1+ df
中的项目标识)。
所以,预期的输出是:
df$Distances = c(1.091953, 1.783120, 1.783147, 1.783157, 1.091952, 1.783158,
1.783148, 1.091946, 1.783148, 1.091948)
我怎样才能有效地提取它(实际的数据结构要大得多)?
我想这就是你想要做的
# Logic
df <- df %>%
group_by(item1, item2) %>%
mutate(Distance = Dist1[(item1)*5 + (item2 + 1)])
# Result
df
# A tibble: 10 x 3
# Groups: item1, item2 [10]
item1 item2 Distance
<dbl> <dbl> <dbl>
1 1 0 1.09
2 1 2 1.78
3 1 3 1.78
4 1 4 1.78
5 2 0 1.09
6 2 3 1.78
7 2 4 1.78
8 3 0 1.09
9 3 4 1.78
10 4 0 1.09
df$Distance
[1] 1.091953 1.783120 1.783147 1.783157 1.091952 1.783158 1.783148 1.091946
[9] 1.783148 1.091948
我有一个5项的数据框,如下:
df = structure(list(item1 = c(1, 1, 1, 1, 2, 2, 2, 3, 3, 4), item2 = c(0,
2, 3, 4, 0, 3, 4, 0, 4, 0)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
此外,我有物品之间的距离矩阵:
Dist1 = structure(c(0, 1.0919530596119, 1.09195161858136, 1.0919463791331,
1.09194754111203, 1.0919530596119, 0, 1.7831197560388, 1.78314749640301,
1.78315668532962, 1.09195161858136, 1.7831197560388, 0, 1.78315765983813,
1.78314839437957, 1.0919463791331, 1.78314749640301, 1.78315765983813,
0, 1.78314787222978, 1.09194754111203, 1.78315668532962, 1.78314839437957,
1.78314787222978, 0), .Dim = c(5L, 5L), .Dimnames = list(c("1",
"2", "3", "4", "5"), c("1", "2", "3", "4", "5")))
我想向 df
添加第三个列,它将包含以某种方式从 Dist1
中提取的距离。它们必须与 df
中的索引指定的顺序相同,不包含自距离等
现在,这几乎是 Dist1
的下三角,但不完全是。 (另请注意,Dist1
中的项目是 1+ df
中的项目标识)。
所以,预期的输出是:
df$Distances = c(1.091953, 1.783120, 1.783147, 1.783157, 1.091952, 1.783158,
1.783148, 1.091946, 1.783148, 1.091948)
我怎样才能有效地提取它(实际的数据结构要大得多)?
我想这就是你想要做的
# Logic
df <- df %>%
group_by(item1, item2) %>%
mutate(Distance = Dist1[(item1)*5 + (item2 + 1)])
# Result
df
# A tibble: 10 x 3
# Groups: item1, item2 [10]
item1 item2 Distance
<dbl> <dbl> <dbl>
1 1 0 1.09
2 1 2 1.78
3 1 3 1.78
4 1 4 1.78
5 2 0 1.09
6 2 3 1.78
7 2 4 1.78
8 3 0 1.09
9 3 4 1.78
10 4 0 1.09
df$Distance
[1] 1.091953 1.783120 1.783147 1.783157 1.091952 1.783158 1.783148 1.091946
[9] 1.783148 1.091948