使用条件操作矩阵并连接结果
Manipulating a matrix with conditions and concatenating the results
我有一个 8x8 矩阵,其中包含城市及其彼此之间的距离,如下所示:
+--------------+------+--------+------+--------------+---------+------+------+----------+
| | NYC | BOSTON | DC | PHILADELPHIA | CHICAGO | SF | LA | SAN JOSE |
+--------------+------+--------+------+--------------+---------+------+------+----------+
| NYC | 0 | 200 | 300 | 500 | 600 | 1500 | 1800 | 2000 |
| BOSTON | 200 | 0 | 300 | 200 | 700 | 1600 | 1900 | 2100 |
| DC | 300 | 300 | 0 | 250 | 550 | 1400 | 1850 | 2200 |
| PHILADELPHIA | 500 | 200 | 250 | 0 | 650 | 1300 | 1700 | 1900 |
| CHICAGO | 600 | 700 | 550 | 650 | 0 | 1250 | 1600 | 1500 |
| SF | 1500 | 1600 | 1400 | 1300 | 1250 | 0 | 300 | 400 |
| LA | 1800 | 1900 | 1850 | 1700 | 1600 | 300 | 0 | 250 |
| SAN JOSE | 2000 | 2100 | 2200 | 1900 | 1500 | 400 | 250 | 0 |
+--------------+------+--------+------+--------------+---------+------+------+----------+
我正在尝试过滤距离大于 500 的组合,然后将结果连接如下:
+--------------+---------------------------+---------------+
| FROM | TO | DISTANCE |
+--------------+---------------------------+---------------+
| NYC | BOSTON, DC, PHILADELPHIA | 200, 300, 500 |
| BOSTON | NYC,DC, PHILADELPHIA | 200, 300, 200 |
| DC | NYC, BOSTON, PHILADELPHIA | 300,300, 250 |
| PHILADELPHIA | NYC,BOSTON, DC | 500, 200, 250 |
| CHICAGO | | |
| SF | LA, SAN JOSE | 300, 400 |
| LA | SF, SAN JOSE | 300, 250 |
| SAN JOSE | SF, LA | 400, 250 |
+--------------+---------------------------+---------------+
我在这里找到了一个类似的例子:
而且我知道我可以使用聚合函数进行连接
我想出了一个可用的解决方案,但我想知道是否有一种简单的方法可以实现这个
以下是我的解决方案:
result <- t(sapply(seq(nrow(X)), function(i) {
j <- which.min(X[i,])
c(paste(rownames(X)[i], colnames(X)[j], sep='/////'), X[i,j])
}))
a<-data.frame(do.call('rbind', strsplit(as.character(result$col1),'/////',fixed=TRUE)), result$col2)
使用dplyr
,我们可以获取距离小于500的长格式select行的数据,并汇总每个城市的值。
library(dplyr)
df %>%
rownames_to_column('from') %>%
tidyr::pivot_longer(cols = -from) %>%
filter(value <= 500 & from != name) %>%
group_by(from) %>%
summarise(to = toString(name),
distance = toString(value))
# A tibble: 7 x 3
# from to distance
# <chr> <chr> <chr>
#1 BOSTON NYC, DC, PHILADELPHIA 200, 300, 200
#2 DC NYC, BOSTON, PHILADELPHIA 300, 300, 250
#3 LA SF, SANJOSE 300, 250
#4 NYC BOSTON, DC, PHILADELPHIA 200, 300, 500
#5 PHILADELPHIA NYC, BOSTON, DC 500, 200, 250
#6 SANJOSE SF, LA 400, 250
#7 SF LA, SANJOSE 300, 400
数据
df <- structure(list(NYC = c(0L, 200L, 300L, 500L, 600L, 1500L, 1800L,
2000L), BOSTON = c(200L, 0L, 300L, 200L, 700L, 1600L, 1900L,
2100L), DC = c(300L, 300L, 0L, 250L, 550L, 1400L, 1850L, 2200L
), PHILADELPHIA = c(500L, 200L, 250L, 0L, 650L, 1300L, 1700L,
1900L), CHICAGO = c(600L, 700L, 550L, 650L, 0L, 1250L, 1600L,
1500L), SF = c(1500L, 1600L, 1400L, 1300L, 1250L, 0L, 300L, 400L
), LA = c(1800L, 1900L, 1850L, 1700L, 1600L, 300L, 0L, 250L),
SANJOSE = c(2000L, 2100L, 2200L, 1900L, 1500L, 400L, 250L,
0L)), row.names = c("NYC", "BOSTON", "DC", "PHILADELPHIA",
"CHICAGO", "SF", "LA", "SANJOSE"), class = "data.frame")
这里与其他基本 R 解决方案相同:
res <- apply(df, 1, function(x) {
data.frame(
from = names(df)[x == 0],
to = paste0(names(df)[x <= 500 & x > 0], collapse = ", "),
dist = paste0(x[x <= 500 & x > 0], collapse = ", ")
)
})
do.call(rbind, res)
这导致
# from to dist
# NYC NYC BOSTON, DC, PHILADELPHIA 200, 300, 500
# BOSTON BOSTON NYC, DC, PHILADELPHIA 200, 300, 200
# DC DC NYC, BOSTON, PHILADELPHIA 300, 300, 250
# PHILADELPHIA PHILADELPHIA NYC, BOSTON, DC 500, 200, 250
# CHICAGO CHICAGO
# SF SF LA, SANJOSE 300, 400
# LA LA SF, SANJOSE 300, 250
# SANJOSE SANJOSE SF, LA 400, 250
我有一个 8x8 矩阵,其中包含城市及其彼此之间的距离,如下所示:
+--------------+------+--------+------+--------------+---------+------+------+----------+
| | NYC | BOSTON | DC | PHILADELPHIA | CHICAGO | SF | LA | SAN JOSE |
+--------------+------+--------+------+--------------+---------+------+------+----------+
| NYC | 0 | 200 | 300 | 500 | 600 | 1500 | 1800 | 2000 |
| BOSTON | 200 | 0 | 300 | 200 | 700 | 1600 | 1900 | 2100 |
| DC | 300 | 300 | 0 | 250 | 550 | 1400 | 1850 | 2200 |
| PHILADELPHIA | 500 | 200 | 250 | 0 | 650 | 1300 | 1700 | 1900 |
| CHICAGO | 600 | 700 | 550 | 650 | 0 | 1250 | 1600 | 1500 |
| SF | 1500 | 1600 | 1400 | 1300 | 1250 | 0 | 300 | 400 |
| LA | 1800 | 1900 | 1850 | 1700 | 1600 | 300 | 0 | 250 |
| SAN JOSE | 2000 | 2100 | 2200 | 1900 | 1500 | 400 | 250 | 0 |
+--------------+------+--------+------+--------------+---------+------+------+----------+
我正在尝试过滤距离大于 500 的组合,然后将结果连接如下:
+--------------+---------------------------+---------------+
| FROM | TO | DISTANCE |
+--------------+---------------------------+---------------+
| NYC | BOSTON, DC, PHILADELPHIA | 200, 300, 500 |
| BOSTON | NYC,DC, PHILADELPHIA | 200, 300, 200 |
| DC | NYC, BOSTON, PHILADELPHIA | 300,300, 250 |
| PHILADELPHIA | NYC,BOSTON, DC | 500, 200, 250 |
| CHICAGO | | |
| SF | LA, SAN JOSE | 300, 400 |
| LA | SF, SAN JOSE | 300, 250 |
| SAN JOSE | SF, LA | 400, 250 |
+--------------+---------------------------+---------------+
我在这里找到了一个类似的例子:
而且我知道我可以使用聚合函数进行连接
我想出了一个可用的解决方案,但我想知道是否有一种简单的方法可以实现这个
以下是我的解决方案:
result <- t(sapply(seq(nrow(X)), function(i) {
j <- which.min(X[i,])
c(paste(rownames(X)[i], colnames(X)[j], sep='/////'), X[i,j])
}))
a<-data.frame(do.call('rbind', strsplit(as.character(result$col1),'/////',fixed=TRUE)), result$col2)
使用dplyr
,我们可以获取距离小于500的长格式select行的数据,并汇总每个城市的值。
library(dplyr)
df %>%
rownames_to_column('from') %>%
tidyr::pivot_longer(cols = -from) %>%
filter(value <= 500 & from != name) %>%
group_by(from) %>%
summarise(to = toString(name),
distance = toString(value))
# A tibble: 7 x 3
# from to distance
# <chr> <chr> <chr>
#1 BOSTON NYC, DC, PHILADELPHIA 200, 300, 200
#2 DC NYC, BOSTON, PHILADELPHIA 300, 300, 250
#3 LA SF, SANJOSE 300, 250
#4 NYC BOSTON, DC, PHILADELPHIA 200, 300, 500
#5 PHILADELPHIA NYC, BOSTON, DC 500, 200, 250
#6 SANJOSE SF, LA 400, 250
#7 SF LA, SANJOSE 300, 400
数据
df <- structure(list(NYC = c(0L, 200L, 300L, 500L, 600L, 1500L, 1800L,
2000L), BOSTON = c(200L, 0L, 300L, 200L, 700L, 1600L, 1900L,
2100L), DC = c(300L, 300L, 0L, 250L, 550L, 1400L, 1850L, 2200L
), PHILADELPHIA = c(500L, 200L, 250L, 0L, 650L, 1300L, 1700L,
1900L), CHICAGO = c(600L, 700L, 550L, 650L, 0L, 1250L, 1600L,
1500L), SF = c(1500L, 1600L, 1400L, 1300L, 1250L, 0L, 300L, 400L
), LA = c(1800L, 1900L, 1850L, 1700L, 1600L, 300L, 0L, 250L),
SANJOSE = c(2000L, 2100L, 2200L, 1900L, 1500L, 400L, 250L,
0L)), row.names = c("NYC", "BOSTON", "DC", "PHILADELPHIA",
"CHICAGO", "SF", "LA", "SANJOSE"), class = "data.frame")
这里与其他基本 R 解决方案相同:
res <- apply(df, 1, function(x) {
data.frame(
from = names(df)[x == 0],
to = paste0(names(df)[x <= 500 & x > 0], collapse = ", "),
dist = paste0(x[x <= 500 & x > 0], collapse = ", ")
)
})
do.call(rbind, res)
这导致
# from to dist
# NYC NYC BOSTON, DC, PHILADELPHIA 200, 300, 500
# BOSTON BOSTON NYC, DC, PHILADELPHIA 200, 300, 200
# DC DC NYC, BOSTON, PHILADELPHIA 300, 300, 250
# PHILADELPHIA PHILADELPHIA NYC, BOSTON, DC 500, 200, 250
# CHICAGO CHICAGO
# SF SF LA, SANJOSE 300, 400
# LA LA SF, SANJOSE 300, 250
# SANJOSE SANJOSE SF, LA 400, 250