从头到尾生成顺序配对值的不同长度向量
Generate differing length vectors of sequentially paired values from end to start
抱歉,如果之前已经有人问过这个问题 - 我一直在努力思考如何用词来表达我的搜索(因此标题很尴尬)!
我有一个 single-character 值的数据框,像这样:
-------------------------
| Parent | Daughter |
-------------------------
| A | B |
| B | C |
| B | D |
| A | E |
-------------------------
其中每个 parent 总会有两个女儿(就像一个完整的二叉树)。我正在尝试编写一段代码来生成从顶部 parent 到最终女儿的路径向量:
A B C
A B D
A E
但是有不同数量的 parent 和不同长度的向量。
我考虑过使用 for 循环,但没有成功,因为我认为树的每个 'level' 都需要一个,我事先并不知道。
我不一定想要代码,只是关于如何解决此类问题的建议!但非常感谢任何帮助,谢谢!
编辑: 我应该指出 'from end to start' 只是因为我认为那样会更容易 - 这当然没有必要!
数据:
df <- data.frame(Parent = c("A", "B", "B", "A"), Daughter = c("B", "C", "D", "E"))
EDIT2: 下面是一些期望结果的例子。如果我把 table 变大一点,那么:
-------------------------
| Parent | Daughter |
-------------------------
| A | B |
| B | C |
| B | D |
| A | E |
| C | F |
| C | G |
| E | H |
| E | I |
-------------------------
数据 2:
df <- data.frame(Parent = c("A", "B", "B", "A", "C", "C", "E", "E"), Daughter = c("B", "C", "D", "E", "F", "G", "H", "I"))
那么我想要的向量是:
A B C F
A B C G
A B D
A E H
A E I
以下内容可能会有帮助:
parent <- "A"
lev <- df$Daughter[which(df$Parent == parent)]
output <- cbind(parent, lev)
while(length(lev) > 0){
lev <- df$Daughter[which(is.element(df$Parent, lev))]
output <- cbind(output, lev)
}
# which returns
> output
parent lev lev
[1,] "A" "B" "C"
[2,] "A" "E" "D"
这可以很容易地翻译成 function(parent)
:
myfct <- function(parent){
lev <- df$Daughter[which(df$Parent == parent)]
output <- data.frame(parent, lev, stringsAsFactors = F)
while(length(lev) > 0){
dat <- df[which(is.element(df$Parent, lev)),]
newdat <- merge(x = output, y = dat, by.x = "lev", by.y = "Parent", all = TRUE)
col.first <- which(names(newdat) == "parent")
col.last <- which(names(newdat) == "Daughter")
col.sec.last <- which(names(newdat) == "lev")
col.rest <- setdiff(1:dim(newdat)[2], c(col.first, col.sec.last,col.last))
newdat <- newdat[, c(col.first, col.rest, col.sec.last, col.last)]
names(newdat)[2:(length(names(newdat))-1)] <- paste0("x.",2:(length(names(newdat))-1))
names(newdat)[length(names(newdat))] <- "lev"
output <- newdat
lev <- df$Daughter[which(is.element(df$Parent, lev))]
}
cols <- as.numeric(which(!sapply(output, function(x)all(is.na(x)))))
output <- output[,cols]
return(output)
}
这里可以应用函数:
parents.list <- unique(df$Parent)
sapply(parents.list, myfct)
# which returns
$A
parent x.2 x.3 x.4
1 A B C F
2 A B C G
3 A B D <NA>
4 A E H <NA>
5 A E I <NA>
$B
parent x.2 x.3
1 B C F
2 B C G
3 B D <NA>
$C
parent x.2
1 C F
2 C G
$E
parent x.2
1 E H
2 E I
现在您可以随时修改它以更改输出的结构。
编辑
关键是添加一个 while
。我编辑了我的代码,现在它应该可以工作而无需指定级别数。
使用 igraph 包,将数据框转换为图形对象,获取路径,删除属于其他路径子集的路径。
library(igraph)
# example data
df <- data.frame(Parent = c("A", "B", "B", "A", "C", "C", "E", "E"),
Daughter = c("B", "C", "D", "E", "F", "G", "H", "I"))
# convert to graph object
g <- graph_from_data_frame(df)
# get all the paths, extract node ids from paths
res <- all_simple_paths(g, from = "A")
res <- lapply(res, as_ids)
# get index where vector is not subset of other vector
ix <- sapply(res, function(i) {
x <- sapply(res, function(j) length(intersect(i, j)))
sum(length(i) == x) == 1
})
# result
res <- res[ix]
# res
# [[1]]
# [1] "A" "B" "C" "F"
#
# [[2]]
# [1] "A" "B" "C" "G"
#
# [[3]]
# [1] "A" "B" "D"
#
# [[4]]
# [1] "A" "E" "H"
#
# [[5]]
# [1] "A" "E" "I"
抱歉,如果之前已经有人问过这个问题 - 我一直在努力思考如何用词来表达我的搜索(因此标题很尴尬)!
我有一个 single-character 值的数据框,像这样:
-------------------------
| Parent | Daughter |
-------------------------
| A | B |
| B | C |
| B | D |
| A | E |
-------------------------
其中每个 parent 总会有两个女儿(就像一个完整的二叉树)。我正在尝试编写一段代码来生成从顶部 parent 到最终女儿的路径向量:
A B C
A B D
A E
但是有不同数量的 parent 和不同长度的向量。
我考虑过使用 for 循环,但没有成功,因为我认为树的每个 'level' 都需要一个,我事先并不知道。
我不一定想要代码,只是关于如何解决此类问题的建议!但非常感谢任何帮助,谢谢!
编辑: 我应该指出 'from end to start' 只是因为我认为那样会更容易 - 这当然没有必要!
数据:
df <- data.frame(Parent = c("A", "B", "B", "A"), Daughter = c("B", "C", "D", "E"))
EDIT2: 下面是一些期望结果的例子。如果我把 table 变大一点,那么:
-------------------------
| Parent | Daughter |
-------------------------
| A | B |
| B | C |
| B | D |
| A | E |
| C | F |
| C | G |
| E | H |
| E | I |
-------------------------
数据 2:
df <- data.frame(Parent = c("A", "B", "B", "A", "C", "C", "E", "E"), Daughter = c("B", "C", "D", "E", "F", "G", "H", "I"))
那么我想要的向量是:
A B C F
A B C G
A B D
A E H
A E I
以下内容可能会有帮助:
parent <- "A"
lev <- df$Daughter[which(df$Parent == parent)]
output <- cbind(parent, lev)
while(length(lev) > 0){
lev <- df$Daughter[which(is.element(df$Parent, lev))]
output <- cbind(output, lev)
}
# which returns
> output
parent lev lev
[1,] "A" "B" "C"
[2,] "A" "E" "D"
这可以很容易地翻译成 function(parent)
:
myfct <- function(parent){
lev <- df$Daughter[which(df$Parent == parent)]
output <- data.frame(parent, lev, stringsAsFactors = F)
while(length(lev) > 0){
dat <- df[which(is.element(df$Parent, lev)),]
newdat <- merge(x = output, y = dat, by.x = "lev", by.y = "Parent", all = TRUE)
col.first <- which(names(newdat) == "parent")
col.last <- which(names(newdat) == "Daughter")
col.sec.last <- which(names(newdat) == "lev")
col.rest <- setdiff(1:dim(newdat)[2], c(col.first, col.sec.last,col.last))
newdat <- newdat[, c(col.first, col.rest, col.sec.last, col.last)]
names(newdat)[2:(length(names(newdat))-1)] <- paste0("x.",2:(length(names(newdat))-1))
names(newdat)[length(names(newdat))] <- "lev"
output <- newdat
lev <- df$Daughter[which(is.element(df$Parent, lev))]
}
cols <- as.numeric(which(!sapply(output, function(x)all(is.na(x)))))
output <- output[,cols]
return(output)
}
这里可以应用函数:
parents.list <- unique(df$Parent)
sapply(parents.list, myfct)
# which returns
$A
parent x.2 x.3 x.4
1 A B C F
2 A B C G
3 A B D <NA>
4 A E H <NA>
5 A E I <NA>
$B
parent x.2 x.3
1 B C F
2 B C G
3 B D <NA>
$C
parent x.2
1 C F
2 C G
$E
parent x.2
1 E H
2 E I
现在您可以随时修改它以更改输出的结构。
编辑
关键是添加一个 while
。我编辑了我的代码,现在它应该可以工作而无需指定级别数。
使用 igraph 包,将数据框转换为图形对象,获取路径,删除属于其他路径子集的路径。
library(igraph)
# example data
df <- data.frame(Parent = c("A", "B", "B", "A", "C", "C", "E", "E"),
Daughter = c("B", "C", "D", "E", "F", "G", "H", "I"))
# convert to graph object
g <- graph_from_data_frame(df)
# get all the paths, extract node ids from paths
res <- all_simple_paths(g, from = "A")
res <- lapply(res, as_ids)
# get index where vector is not subset of other vector
ix <- sapply(res, function(i) {
x <- sapply(res, function(j) length(intersect(i, j)))
sum(length(i) == x) == 1
})
# result
res <- res[ix]
# res
# [[1]]
# [1] "A" "B" "C" "F"
#
# [[2]]
# [1] "A" "B" "C" "G"
#
# [[3]]
# [1] "A" "B" "D"
#
# [[4]]
# [1] "A" "E" "H"
#
# [[5]]
# [1] "A" "E" "I"