as.matrix() 和 as.dist() 有不同的结果
as.matrix() and as.dist() have different results
我有一个列表 "simil",其中包含 7 个向量:
> dput(simil)
structure(list(Monday = structure(c(0.889987253484581, 0.882957894295089,
0.882232353177177, 0.874080268021168, 0.851760771472629, 0.811536071048775
), .Names = c("Sunday", "Tuesday", "Friday", "Wednesday", "Thursday",
"Saturday")), Tuesday = structure(c(0.901682757072732, 0.882957894295089,
0.874716806575548, 0.869202937572079, 0.855248496101086, 0.818659253763272
), .Names = c("Sunday", "Monday", "Wednesday", "Friday", "Thursday",
"Saturday")), Wednesday = structure(c(0.88354911311872, 0.874716806575548,
0.874080268021168, 0.853293126413937, 0.851921112754124, 0.841170795359615
), .Names = c("Sunday", "Tuesday", "Monday", "Friday", "Thursday",
"Saturday")), Thursday = structure(c(0.86579834238668, 0.855248496101086,
0.851921112754124, 0.851760771472629, 0.851384896045153, 0.836732564057725
), .Names = c("Sunday", "Tuesday", "Wednesday", "Monday", "Friday",
"Saturday")), Friday = structure(c(0.882232353177177, 0.869202937572079,
0.856441568566172, 0.853293126413937, 0.851384896045153, 0.80098779448239
), .Names = c("Monday", "Tuesday", "Sunday", "Wednesday", "Thursday",
"Saturday")), Saturday = structure(c(0.866654844262859, 0.841170795359615,
0.836732564057725, 0.818659253763272, 0.811536071048775, 0.80098779448239
), .Names = c("Sunday", "Wednesday", "Thursday", "Tuesday", "Monday",
"Friday")), Sunday = structure(c(0.901682757072732, 0.889987253484581,
0.88354911311872, 0.866654844262859, 0.86579834238668, 0.856441568566172
), .Names = c("Tuesday", "Monday", "Wednesday", "Saturday", "Thursday",
"Friday"))), .Names = c("Monday", "Tuesday", "Wednesday", "Thursday",
"Friday", "Saturday", "Sunday"), class = c("similMatrix", "list"
))
我现在想将它转换成一个 dist 对象,然后将它用于 hclust()。所以我使用 as.dist() 并计算:
> as.dist(simil,diag = TRUE, upper = TRUE)
Monday Sunday Tuesday Friday Wednesday Thursday Saturday
Monday 0.0000000 0.8899873 0.8829579 0.8822324 0.8740803 0.8517608 0.8115361
Sunday 0.8899873 0.0000000 1.0000000 0.8692029 0.8747168 0.8552485 0.8186593
Tuesday 0.8829579 1.0000000 0.0000000 0.8532931 1.0000000 0.8519211 0.8411708
Friday 0.8822324 0.8692029 0.8532931 0.0000000 0.8519211 1.0000000 0.8367326
Wednesday 0.8740803 0.8747168 1.0000000 0.8519211 0.0000000 0.8513849 0.8009878
Thursday 0.8517608 0.8552485 0.8519211 1.0000000 0.8513849 0.0000000 1.0000000
Saturday 0.8115361 0.8186593 0.8411708 0.8367326 0.8009878 1.0000000 0.0000000
但这与我使用 as.matrix():
时的结果略有不同
> as.matrix(simil)
Monday Tuesday Wednesday Thursday Friday Saturday Sunday
Monday 1.0000000 0.8829579 0.8740803 0.8517608 0.8822324 0.8115361 0.8899873
Sunday 0.8899873 0.9016828 0.8835491 0.8657983 0.8564416 0.8666548 1.0000000
Tuesday 0.8829579 1.0000000 0.8747168 0.8552485 0.8692029 0.8186593 0.9016828
Friday 0.8822324 0.8692029 0.8532931 0.8513849 1.0000000 0.8009878 0.8564416
Wednesday 0.8740803 0.8747168 1.0000000 0.8519211 0.8532931 0.8411708 0.8835491
Thursday 0.8517608 0.8552485 0.8519211 1.0000000 0.8513849 0.8367326 0.8657983
Saturday 0.8115361 0.8186593 0.8411708 0.8367326 0.8009878 1.0000000 0.8666548
对于 as.dist()
,矩阵不是完全对称的,一些对变得错误,而对于 as.matrix()
则不会发生。这是为什么?我该如何更正它?
所以最后我设法通过先转换为矩阵,然后交换行顺序,最后转换为 dist 对象来修复它:
simil = as.matrix(simil)
simil = simil[ c(1,3,5,6,4,7,2),]
simil = as.dist(1-simil,diag = TRUE, upper = TRUE)
> simil
Monday Tuesday Wednesday Thursday Friday Saturday Sunday
Monday 0.00000000 0.11704211 0.12591973 0.14823923 0.11776765 0.18846393 0.11001275
Tuesday 0.11704211 0.00000000 0.12528319 0.14475150 0.13079706 0.18134075 0.09831724
Wednesday 0.12591973 0.12528319 0.00000000 0.14807889 0.14670687 0.15882920 0.11645089
Thursday 0.14823923 0.14475150 0.14807889 0.00000000 0.14861510 0.16326744 0.13420166
Friday 0.11776765 0.13079706 0.14670687 0.14861510 0.00000000 0.19901221 0.14355843
Saturday 0.18846393 0.18134075 0.15882920 0.16326744 0.19901221 0.00000000 0.13334516
Sunday 0.11001275 0.09831724 0.11645089 0.13420166 0.14355843 0.13334516 0.00000000
可能是因为 "simil" 是从 quanteda 包的 similarity() 函数创建的。
我有一个列表 "simil",其中包含 7 个向量:
> dput(simil)
structure(list(Monday = structure(c(0.889987253484581, 0.882957894295089,
0.882232353177177, 0.874080268021168, 0.851760771472629, 0.811536071048775
), .Names = c("Sunday", "Tuesday", "Friday", "Wednesday", "Thursday",
"Saturday")), Tuesday = structure(c(0.901682757072732, 0.882957894295089,
0.874716806575548, 0.869202937572079, 0.855248496101086, 0.818659253763272
), .Names = c("Sunday", "Monday", "Wednesday", "Friday", "Thursday",
"Saturday")), Wednesday = structure(c(0.88354911311872, 0.874716806575548,
0.874080268021168, 0.853293126413937, 0.851921112754124, 0.841170795359615
), .Names = c("Sunday", "Tuesday", "Monday", "Friday", "Thursday",
"Saturday")), Thursday = structure(c(0.86579834238668, 0.855248496101086,
0.851921112754124, 0.851760771472629, 0.851384896045153, 0.836732564057725
), .Names = c("Sunday", "Tuesday", "Wednesday", "Monday", "Friday",
"Saturday")), Friday = structure(c(0.882232353177177, 0.869202937572079,
0.856441568566172, 0.853293126413937, 0.851384896045153, 0.80098779448239
), .Names = c("Monday", "Tuesday", "Sunday", "Wednesday", "Thursday",
"Saturday")), Saturday = structure(c(0.866654844262859, 0.841170795359615,
0.836732564057725, 0.818659253763272, 0.811536071048775, 0.80098779448239
), .Names = c("Sunday", "Wednesday", "Thursday", "Tuesday", "Monday",
"Friday")), Sunday = structure(c(0.901682757072732, 0.889987253484581,
0.88354911311872, 0.866654844262859, 0.86579834238668, 0.856441568566172
), .Names = c("Tuesday", "Monday", "Wednesday", "Saturday", "Thursday",
"Friday"))), .Names = c("Monday", "Tuesday", "Wednesday", "Thursday",
"Friday", "Saturday", "Sunday"), class = c("similMatrix", "list"
))
我现在想将它转换成一个 dist 对象,然后将它用于 hclust()。所以我使用 as.dist() 并计算:
> as.dist(simil,diag = TRUE, upper = TRUE)
Monday Sunday Tuesday Friday Wednesday Thursday Saturday
Monday 0.0000000 0.8899873 0.8829579 0.8822324 0.8740803 0.8517608 0.8115361
Sunday 0.8899873 0.0000000 1.0000000 0.8692029 0.8747168 0.8552485 0.8186593
Tuesday 0.8829579 1.0000000 0.0000000 0.8532931 1.0000000 0.8519211 0.8411708
Friday 0.8822324 0.8692029 0.8532931 0.0000000 0.8519211 1.0000000 0.8367326
Wednesday 0.8740803 0.8747168 1.0000000 0.8519211 0.0000000 0.8513849 0.8009878
Thursday 0.8517608 0.8552485 0.8519211 1.0000000 0.8513849 0.0000000 1.0000000
Saturday 0.8115361 0.8186593 0.8411708 0.8367326 0.8009878 1.0000000 0.0000000
但这与我使用 as.matrix():
时的结果略有不同> as.matrix(simil)
Monday Tuesday Wednesday Thursday Friday Saturday Sunday
Monday 1.0000000 0.8829579 0.8740803 0.8517608 0.8822324 0.8115361 0.8899873
Sunday 0.8899873 0.9016828 0.8835491 0.8657983 0.8564416 0.8666548 1.0000000
Tuesday 0.8829579 1.0000000 0.8747168 0.8552485 0.8692029 0.8186593 0.9016828
Friday 0.8822324 0.8692029 0.8532931 0.8513849 1.0000000 0.8009878 0.8564416
Wednesday 0.8740803 0.8747168 1.0000000 0.8519211 0.8532931 0.8411708 0.8835491
Thursday 0.8517608 0.8552485 0.8519211 1.0000000 0.8513849 0.8367326 0.8657983
Saturday 0.8115361 0.8186593 0.8411708 0.8367326 0.8009878 1.0000000 0.8666548
对于 as.dist()
,矩阵不是完全对称的,一些对变得错误,而对于 as.matrix()
则不会发生。这是为什么?我该如何更正它?
所以最后我设法通过先转换为矩阵,然后交换行顺序,最后转换为 dist 对象来修复它:
simil = as.matrix(simil)
simil = simil[ c(1,3,5,6,4,7,2),]
simil = as.dist(1-simil,diag = TRUE, upper = TRUE)
> simil
Monday Tuesday Wednesday Thursday Friday Saturday Sunday
Monday 0.00000000 0.11704211 0.12591973 0.14823923 0.11776765 0.18846393 0.11001275
Tuesday 0.11704211 0.00000000 0.12528319 0.14475150 0.13079706 0.18134075 0.09831724
Wednesday 0.12591973 0.12528319 0.00000000 0.14807889 0.14670687 0.15882920 0.11645089
Thursday 0.14823923 0.14475150 0.14807889 0.00000000 0.14861510 0.16326744 0.13420166
Friday 0.11776765 0.13079706 0.14670687 0.14861510 0.00000000 0.19901221 0.14355843
Saturday 0.18846393 0.18134075 0.15882920 0.16326744 0.19901221 0.00000000 0.13334516
Sunday 0.11001275 0.09831724 0.11645089 0.13420166 0.14355843 0.13334516 0.00000000
可能是因为 "simil" 是从 quanteda 包的 similarity() 函数创建的。