用 R 计算起点-终点关系(没有直接关系)
Count origin-destination relationships (without direct) with R
我有这样的起点-终点table。
library(dplyr)
set.seed(1983)
namevec <- c('Portugal', 'Romania', 'Nigeria', 'Peru', 'Texas', 'New Jersey', 'Colorado', 'Minnesota')
## Create OD pairs
df <- data_frame(origins = sample(namevec, size = 100, replace = TRUE),
destinations = sample(namevec, size = 100, replace = TRUE))
问题
我一直在计算每个起点-终点的关系(没有方向性)。
如何获得将 Colorado-Minnesota 和 Minnesota-Colorado 视为一组的输出?
到目前为止我尝试过的:
## Counts for each OD-pairs
df %>%
group_by(origins, destinations) %>%
summarize(counts = n()) %>%
ungroup() %>%
arrange(desc(counts))
Source: local data frame [48 x 3]
origins destinations counts
(chr) (chr) (int)
1 Nigeria Colorado 5
2 Colorado Portugal 4
3 New Jersey Minnesota 4
4 New Jersey New Jersey 4
5 Peru Nigeria 4
6 Peru Peru 4
7 Romania Texas 4
8 Texas Nigeria 4
9 Minnesota Minnesota 3
10 Nigeria Portugal 3
.. ... ... ...
一种方法是将两个位置的 sorted 组合合并到一个字段中。对此进行总结将删除您的两个原始列,因此您需要将它们重新加入。
paired <- df %>%
mutate(
orderedpair = paste(pmin(origins, destinations), pmax(origins, destinations), sep = "::")
)
paired
# # A tibble: 100 × 3
# origins destinations orderedpair
# <chr> <chr> <chr>
# 1 Peru Colorado Colorado::Peru
# 2 Romania Portugal Portugal::Romania
# 3 Romania Colorado Colorado::Romania
# 4 New Jersey Minnesota Minnesota::New Jersey
# 5 Minnesota Texas Minnesota::Texas
# 6 Romania Texas Romania::Texas
# 7 Peru Peru Peru::Peru
# 8 Romania Nigeria Nigeria::Romania
# 9 Portugal Minnesota Minnesota::Portugal
# 10 Nigeria Colorado Colorado::Nigeria
# # ... with 90 more rows
left_join(
paired,
group_by(paired, orderedpair) %>% count(),
by = "orderedpair"
) %>%
select(-orderedpair) %>%
distinct() %>%
arrange(desc(n))
# # A tibble: 48 × 3
# origins destinations n
# <chr> <chr> <int>
# 1 Romania Portugal 6
# 2 New Jersey Minnesota 6
# 3 Portugal Romania 6
# 4 Minnesota New Jersey 6
# 5 Romania Texas 5
# 6 Nigeria Colorado 5
# 7 Texas Nigeria 5
# 8 Texas Romania 5
# 9 Nigeria Texas 5
# 10 Peru Peru 4
# # ... with 38 more rows
(我使用 "::"
作为分隔符的唯一原因是在不太可能的情况下您需要解析 orderedpair
;使用默认的 " "
(space)不能与(例如)"New Jersey"
一起工作。)
我有这样的起点-终点table。
library(dplyr)
set.seed(1983)
namevec <- c('Portugal', 'Romania', 'Nigeria', 'Peru', 'Texas', 'New Jersey', 'Colorado', 'Minnesota')
## Create OD pairs
df <- data_frame(origins = sample(namevec, size = 100, replace = TRUE),
destinations = sample(namevec, size = 100, replace = TRUE))
问题
我一直在计算每个起点-终点的关系(没有方向性)。
如何获得将 Colorado-Minnesota 和 Minnesota-Colorado 视为一组的输出?
到目前为止我尝试过的:
## Counts for each OD-pairs
df %>%
group_by(origins, destinations) %>%
summarize(counts = n()) %>%
ungroup() %>%
arrange(desc(counts))
Source: local data frame [48 x 3]
origins destinations counts
(chr) (chr) (int)
1 Nigeria Colorado 5
2 Colorado Portugal 4
3 New Jersey Minnesota 4
4 New Jersey New Jersey 4
5 Peru Nigeria 4
6 Peru Peru 4
7 Romania Texas 4
8 Texas Nigeria 4
9 Minnesota Minnesota 3
10 Nigeria Portugal 3
.. ... ... ...
一种方法是将两个位置的 sorted 组合合并到一个字段中。对此进行总结将删除您的两个原始列,因此您需要将它们重新加入。
paired <- df %>%
mutate(
orderedpair = paste(pmin(origins, destinations), pmax(origins, destinations), sep = "::")
)
paired
# # A tibble: 100 × 3
# origins destinations orderedpair
# <chr> <chr> <chr>
# 1 Peru Colorado Colorado::Peru
# 2 Romania Portugal Portugal::Romania
# 3 Romania Colorado Colorado::Romania
# 4 New Jersey Minnesota Minnesota::New Jersey
# 5 Minnesota Texas Minnesota::Texas
# 6 Romania Texas Romania::Texas
# 7 Peru Peru Peru::Peru
# 8 Romania Nigeria Nigeria::Romania
# 9 Portugal Minnesota Minnesota::Portugal
# 10 Nigeria Colorado Colorado::Nigeria
# # ... with 90 more rows
left_join(
paired,
group_by(paired, orderedpair) %>% count(),
by = "orderedpair"
) %>%
select(-orderedpair) %>%
distinct() %>%
arrange(desc(n))
# # A tibble: 48 × 3
# origins destinations n
# <chr> <chr> <int>
# 1 Romania Portugal 6
# 2 New Jersey Minnesota 6
# 3 Portugal Romania 6
# 4 Minnesota New Jersey 6
# 5 Romania Texas 5
# 6 Nigeria Colorado 5
# 7 Texas Nigeria 5
# 8 Texas Romania 5
# 9 Nigeria Texas 5
# 10 Peru Peru 4
# # ... with 38 more rows
(我使用 "::"
作为分隔符的唯一原因是在不太可能的情况下您需要解析 orderedpair
;使用默认的 " "
(space)不能与(例如)"New Jersey"
一起工作。)