当不总是可能成对时,通常如何使用 combn ?
How do use combn generally when pairs are not always possible?
我正在寻找一种通用方法来处理需要组合但数据并不总是满足 combn
函数假设的情况。
具体来说,我有一个国会议员及其委员会任务的数据框。为了检查这个政客网络,我想关联(即在它们之间创建链接)属于同一委员会的任何成员。
数据如下所示:
name_id assignment
A000374 Agriculture
A000370 Agriculture
A000055 Appropriations
A000371 Appropriations
A000372 Agriculture
A000376 Foreign
因此,生成的网络数据应如下所示:
from to committee
A000374 A000370 Agriculture
A000055 A000371 Appropriations
问题是我的代码(下面)抛出了一个错误,因为代码中并不总是有配对( ncombn 命令来识别这种情况。这是正确的方法吗?如果是这样,如何创建一个命令来这个问题一般是什么原因?
这是我的代码,目前:
library(RCurl)
x <- getURL("https://raw.githubusercontent.com/bac3917/Cauldron/master/cstack.csv")
cstack <- read.csv(text = x)
# split the string into two columns that represent name_id and committee assignment
cstack <- cstack %>% separate(namePaste, c("name_id","assignment"))
# use combn and dplyr to create pairs (results in error)
edges<-cstack %>%
group_by(assignment) %>%
do(as.data.frame(t(combn(.[["name_id"]], 2)))) %>%
group_by(V1, V2) %>%
summarise(n( ))
正如 Ben 所提到的,combn(x, 2)
不适用于 x < 2
。
您可以定义一个仅在 x > 1
时计算 combn
的函数。
下面是 data.table
版本。
library(data.table)
cstack <- fread("https://raw.githubusercontent.com/bac3917/Cauldron/master/cstack.csv",
header=TRUE)[, tstrsplit(sub(" ", "", namePaste), "")]
setnames(cstack, c("name_id","assignment"))
mycomb <- function(x) if(length(x) > 1) data.table(t(combn(x, 2)))
cstack <- cstack[, mycomb(name_id), by = "assignment"]
setcolorder(cstack, c(2,3,1))
setnames(cstack, c("V1", "V2"), c("from", "to"))
cstack
#> from to assignment
#> 1: A000374 A000370 Agriculture
#> 2: A000374 A000372 Agriculture
#> 3: A000374 A000378 Agriculture
#> 4: A000374 B001298 Agriculture
#> 5: A000374 B001307 Agriculture
#> ---
#> 12957: C001053 L000491 Ranking Member
#> 12958: C001053 R000582 Ranking Member
#> 12959: D000619 L000491 Ranking Member
#> 12960: D000619 R000582 Ranking Member
#> 12961: L000491 R000582 Ranking Member
我正在寻找一种通用方法来处理需要组合但数据并不总是满足 combn
函数假设的情况。
具体来说,我有一个国会议员及其委员会任务的数据框。为了检查这个政客网络,我想关联(即在它们之间创建链接)属于同一委员会的任何成员。
数据如下所示:
name_id assignment
A000374 Agriculture
A000370 Agriculture
A000055 Appropriations
A000371 Appropriations
A000372 Agriculture
A000376 Foreign
因此,生成的网络数据应如下所示:
from to committee
A000374 A000370 Agriculture
A000055 A000371 Appropriations
问题是我的代码(下面)抛出了一个错误,因为代码中并不总是有配对( ncombn 命令来识别这种情况。这是正确的方法吗?如果是这样,如何创建一个命令来这个问题一般是什么原因?
这是我的代码,目前:
library(RCurl)
x <- getURL("https://raw.githubusercontent.com/bac3917/Cauldron/master/cstack.csv")
cstack <- read.csv(text = x)
# split the string into two columns that represent name_id and committee assignment
cstack <- cstack %>% separate(namePaste, c("name_id","assignment"))
# use combn and dplyr to create pairs (results in error)
edges<-cstack %>%
group_by(assignment) %>%
do(as.data.frame(t(combn(.[["name_id"]], 2)))) %>%
group_by(V1, V2) %>%
summarise(n( ))
正如 Ben 所提到的,combn(x, 2)
不适用于 x < 2
。
您可以定义一个仅在 x > 1
时计算 combn
的函数。
下面是 data.table
版本。
library(data.table)
cstack <- fread("https://raw.githubusercontent.com/bac3917/Cauldron/master/cstack.csv",
header=TRUE)[, tstrsplit(sub(" ", "", namePaste), "")]
setnames(cstack, c("name_id","assignment"))
mycomb <- function(x) if(length(x) > 1) data.table(t(combn(x, 2)))
cstack <- cstack[, mycomb(name_id), by = "assignment"]
setcolorder(cstack, c(2,3,1))
setnames(cstack, c("V1", "V2"), c("from", "to"))
cstack
#> from to assignment
#> 1: A000374 A000370 Agriculture
#> 2: A000374 A000372 Agriculture
#> 3: A000374 A000378 Agriculture
#> 4: A000374 B001298 Agriculture
#> 5: A000374 B001307 Agriculture
#> ---
#> 12957: C001053 L000491 Ranking Member
#> 12958: C001053 R000582 Ranking Member
#> 12959: D000619 L000491 Ranking Member
#> 12960: D000619 R000582 Ranking Member
#> 12961: L000491 R000582 Ranking Member