如何将一组观察结果与二元组相匹配?
How do I match a group of observations with a dyad?
假设我有一个数据框,其中包含名称列表以及他们作为客户的公司:
name <- c("Anne", "Anne", "Mary", "Mary", "Mary", "Joe", "Joe", "Joe", "David", "David", "David", "David", "David")
company <- c("A", "B", "C", "D", "E", "A", "B", "C", "D", "E", "F", "G", "H")
df1 <- data.frame(name, company)
然后我有第二个数据框,其中有正在合作开展项目的公司:
company1 <- c("A", "B", "C", "D", "E", "F", "G", "H")
company2 <- c("B", "C", "E", "E", "G", "A", "B", "C")
df2 <- data.frame(company1, company2)
我希望的结果是这样的:
name A B C D E F G No of sets
1 Anne 1 1 0 0 0 0 0 1
2 David 0 0 0 1 1 1 1 1
3 Joe 1 1 1 0 0 0 0 2
4 Mary 0 0 1 1 1 0 0 1
所以这计算了与 df2 中的集合相匹配的“集合”的数量。例如,Anne 的 A 和 B 均为 1,它与 df2 中的第 1 行匹配。 Joe 有 A、B、C,并且 A 和 B 以及 B 和 C 都是 df2 中的行,因此 Joe 的行有两个匹配项。
我想这可能对你有用。让我知道。它不符合您的预期结果,因为您没有包含 H
,我认为这是一个错字?同样,Mary 的 No_of_sets
也应该等于 2 吗?
# Tabulate the frequency of name x company combinations
r <- as.data.frame.matrix(table(df1$name, df1$company))
r
#> A B C D E F G H
#> Anne 1 1 0 0 0 0 0 0
#> David 0 0 0 1 1 1 1 1
#> Joe 1 1 1 0 0 0 0 0
#> Mary 0 0 1 1 1 0 0 0
# Get "sets" of companies working together
s <- paste(df2$company1, df2$company2)
s
#> [1] "A B" "B C" "C E" "D E" "E G" "F A" "G B" "H C"
# Get all potential company sets associated with each name
m <- apply(r, MARGIN = 1, FUN = function(x) combn(names(which(x==1)), 2))
# Intersect sets of companies potentially working together (m) with
# companies actually working together (df2)
# (You could use a nested apply here, but I thought that it
# would be too opaque. Looping is a little more clear.)
for(name in rownames(r)){
pairs <- m[[name]]
ppairs <- apply(pairs, 2, paste0, collapse = " ")
r[which(rownames(r)==name),"No_of_sets"] <- length(intersect(ppairs, s))
}
r
#> A B C D E F G H No_of_sets
#> Anne 1 1 0 0 0 0 0 0 1
#> David 0 0 0 1 1 1 1 1 2
#> Joe 1 1 1 0 0 0 0 0 2
#> Mary 0 0 1 1 1 0 0 0 2
Created on 2021-10-19 by the reprex package (v2.0.1)
编辑:假设一个名字有可能不与不止一家公司合作。在这种情况下,您需要添加一个条件来在这两个步骤中说明这一点。首先,新数据...请注意名称“Solo”仅与一家公司合作。
r
#> A B C D E F G H
#> Anne 1 1 0 0 0 0 0 0
#> David 0 0 0 1 1 1 1 1
#> Joe 1 1 1 0 0 0 0 0
#> Mary 0 0 1 1 1 0 0 0
#> Solo 1 0 0 0 0 0 0 0
m <- apply(r, MARGIN = 1, FUN = function(x)
if(length(names(which(x==1)))>1) {
combn(names(which(x==1)), 2)
} else names(which(x==1))
)
m
#> $Anne
#> [,1]
#> [1,] "A"
#> [2,] "B"
#>
#> $David
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] "D" "D" "D" "D" "E" "E" "E" "F" "F" "G"
#> [2,] "E" "F" "G" "H" "F" "G" "H" "G" "H" "H"
#>
#> $Joe
#> [,1] [,2] [,3]
#> [1,] "A" "A" "B"
#> [2,] "B" "C" "C"
#>
#> $Mary
#> [,1] [,2] [,3]
#> [1,] "C" "C" "D"
#> [2,] "D" "E" "E"
#>
#> $Solo
#> [1] "A"
for(name in rownames(r)){
pairs <- m[[name]]
if(length(pairs)>1){
ppairs <- apply(pairs, 2, paste0, collapse = " ")
} else ppairs <- pairs
r[which(rownames(r)==name),"No_of_sets"] <- length(intersect(ppairs, s))
}
r
#> A B C D E F G H No_of_sets
#> Anne 1 1 0 0 0 0 0 0 1
#> David 0 0 0 1 1 1 1 1 2
#> Joe 1 1 1 0 0 0 0 0 2
#> Mary 0 0 1 1 1 0 0 0 2
#> Solo 1 0 0 0 0 0 0 0 0
假设我有一个数据框,其中包含名称列表以及他们作为客户的公司:
name <- c("Anne", "Anne", "Mary", "Mary", "Mary", "Joe", "Joe", "Joe", "David", "David", "David", "David", "David")
company <- c("A", "B", "C", "D", "E", "A", "B", "C", "D", "E", "F", "G", "H")
df1 <- data.frame(name, company)
然后我有第二个数据框,其中有正在合作开展项目的公司:
company1 <- c("A", "B", "C", "D", "E", "F", "G", "H")
company2 <- c("B", "C", "E", "E", "G", "A", "B", "C")
df2 <- data.frame(company1, company2)
我希望的结果是这样的:
name A B C D E F G No of sets
1 Anne 1 1 0 0 0 0 0 1
2 David 0 0 0 1 1 1 1 1
3 Joe 1 1 1 0 0 0 0 2
4 Mary 0 0 1 1 1 0 0 1
所以这计算了与 df2 中的集合相匹配的“集合”的数量。例如,Anne 的 A 和 B 均为 1,它与 df2 中的第 1 行匹配。 Joe 有 A、B、C,并且 A 和 B 以及 B 和 C 都是 df2 中的行,因此 Joe 的行有两个匹配项。
我想这可能对你有用。让我知道。它不符合您的预期结果,因为您没有包含 H
,我认为这是一个错字?同样,Mary 的 No_of_sets
也应该等于 2 吗?
# Tabulate the frequency of name x company combinations
r <- as.data.frame.matrix(table(df1$name, df1$company))
r
#> A B C D E F G H
#> Anne 1 1 0 0 0 0 0 0
#> David 0 0 0 1 1 1 1 1
#> Joe 1 1 1 0 0 0 0 0
#> Mary 0 0 1 1 1 0 0 0
# Get "sets" of companies working together
s <- paste(df2$company1, df2$company2)
s
#> [1] "A B" "B C" "C E" "D E" "E G" "F A" "G B" "H C"
# Get all potential company sets associated with each name
m <- apply(r, MARGIN = 1, FUN = function(x) combn(names(which(x==1)), 2))
# Intersect sets of companies potentially working together (m) with
# companies actually working together (df2)
# (You could use a nested apply here, but I thought that it
# would be too opaque. Looping is a little more clear.)
for(name in rownames(r)){
pairs <- m[[name]]
ppairs <- apply(pairs, 2, paste0, collapse = " ")
r[which(rownames(r)==name),"No_of_sets"] <- length(intersect(ppairs, s))
}
r
#> A B C D E F G H No_of_sets
#> Anne 1 1 0 0 0 0 0 0 1
#> David 0 0 0 1 1 1 1 1 2
#> Joe 1 1 1 0 0 0 0 0 2
#> Mary 0 0 1 1 1 0 0 0 2
Created on 2021-10-19 by the reprex package (v2.0.1)
编辑:假设一个名字有可能不与不止一家公司合作。在这种情况下,您需要添加一个条件来在这两个步骤中说明这一点。首先,新数据...请注意名称“Solo”仅与一家公司合作。
r
#> A B C D E F G H
#> Anne 1 1 0 0 0 0 0 0
#> David 0 0 0 1 1 1 1 1
#> Joe 1 1 1 0 0 0 0 0
#> Mary 0 0 1 1 1 0 0 0
#> Solo 1 0 0 0 0 0 0 0
m <- apply(r, MARGIN = 1, FUN = function(x)
if(length(names(which(x==1)))>1) {
combn(names(which(x==1)), 2)
} else names(which(x==1))
)
m
#> $Anne
#> [,1]
#> [1,] "A"
#> [2,] "B"
#>
#> $David
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] "D" "D" "D" "D" "E" "E" "E" "F" "F" "G"
#> [2,] "E" "F" "G" "H" "F" "G" "H" "G" "H" "H"
#>
#> $Joe
#> [,1] [,2] [,3]
#> [1,] "A" "A" "B"
#> [2,] "B" "C" "C"
#>
#> $Mary
#> [,1] [,2] [,3]
#> [1,] "C" "C" "D"
#> [2,] "D" "E" "E"
#>
#> $Solo
#> [1] "A"
for(name in rownames(r)){
pairs <- m[[name]]
if(length(pairs)>1){
ppairs <- apply(pairs, 2, paste0, collapse = " ")
} else ppairs <- pairs
r[which(rownames(r)==name),"No_of_sets"] <- length(intersect(ppairs, s))
}
r
#> A B C D E F G H No_of_sets
#> Anne 1 1 0 0 0 0 0 0 1
#> David 0 0 0 1 1 1 1 1 2
#> Joe 1 1 1 0 0 0 0 0 2
#> Mary 0 0 1 1 1 0 0 0 2
#> Solo 1 0 0 0 0 0 0 0 0