R中的内置家庭嵌套树父/子关系
Built Family nested tree parent / children relationship in R
我正在研究家谱 :
我根据 sqldf https://www.r-bloggers.com/exploring-recursive-ctes-with-sqldf/
改编了 Bob Horton 的例子
我的数据:
person father
Guillou Arthur NA
Cleach Marc NA
Guillou Eric Guillou Arthur
Guillou Jacques Guillou Arthur
Cleach Franck Cleach Marc
Cleach Leo Cleach Marc
Cleach Herbet Cleach Leo
Cleach Adele Cleach Herbet
Guillou Jean Guillou Eric
Guillou Alan Guillou Eric
我的结果,后代按 "Guillou Arthur" 级别排序(没有父亲的最高人):
name parent_name level
Guillou Arthur NA 1
Guillou Eric Guillou Arthur 2
Guillou Jacques Guillou Arthur 2
Guillou Alan Guillou Eric 3
Guillou Jean Guillou Eric 3
您可以使用 sqldf 的递归查询构建此 table :
数据:
person <- c("Guillou Arthur",
"Cleach Marc",
"Guillou Eric",
"Guillou Jacques",
"Cleach Franck",
"Cleach Leo",
"Cleach Herbet",
"Cleach Adele",
"Guillou Jean",
"Guillou Alan" )
father <- c(NA, NA, "Guillou Arthur" , "Guillou Arthur", "Cleach Marc", "Cleach Marc", "Cleach Leo", "Cleach Herbet", "Guillou Eric", "Guillou Eric")
family <- data.frame(person, father)
从大到长的格式转换:
library(tidyr)
long_family <- gather(family, parent, parent_name, -person)
long_family
查找 "Guillou Arthur" 的后代的递归查询(没有父亲的顶级人物):
library(sqldf)
descendants_sql <- "
WITH RECURSIVE descendants (name, parent_name, level) AS (
SELECT person, parent_name, 1 FROM long_family
WHERE person = '%s'
AND parent = '%s'
UNION ALL
SELECT F.person, F.parent_name, D.level + 1
FROM descendants D
JOIN long_family F
ON F.parent_name = D.name)
SELECT * FROM descendants ORDER BY level, name
"
fam <- sqldf(sprintf(descendants_sql, 'Guillou Arthur', 'father'))
fam
我的问题:
如何直接使用 R(而不是 sql)创建一个包含所有家谱的 data.frame 对象。
每棵树都以 "Cleach Marc" 这样的族长(没有父亲)开头。 (使用 R 方法或 sqldf 方法)
您或许可以使用图形工具来完成此操作。所以使用 igraph
,你可以使用 ego
函数获得邻居。
速写(需要检查!)
library(igraph)
family[] = lapply(family, factor, levels=unique(unlist(family)))
g = graph_from_adjacency_matrix(table(family))
cg = connect.neighborhood(g, order=length(V(g)), mode="out")
cbind( V(cg)$name,
sapply(ego(g, mode="out", mindist=1), function(x) replace(names(x), length(names(x))==0, NA)),
ego_size(cg, mode="out") )[grep("Guillou", V(cg)$name),]
[,1] [,2] [,3]
[1,] "Guillou Arthur" NA "1"
[2,] "Guillou Eric" "Guillou Arthur" "2"
[3,] "Guillou Jacques" "Guillou Arthur" "2"
[4,] "Guillou Jean" "Guillou Eric" "3"
[5,] "Guillou Alan" "Guillou Eric" "3"
事实上,也许您不需要创建邻域图并且可以通过:
cbind( V(g)$name,
sapply(ego(g, mode="out", mindist=1), function(x) replace(names(x), length(names(x))==0, NA)),
ego_size(g, mode="out", order=length(V(g))) )[grep("Cleach", V(g)$name),]
我们构建一个递归函数来获取父行,从那里一切都很简单。
首先我们用 stringsAsFactors = FALSE
定义数据,以便更顺利地重新格式化。
family <- data.frame(person, father,stringsAsFactors = FALSE)
函数
father_line <- function(x){
dad <- subset(family,person==x)$father
if(is.na(dad)) return(x)
c(x,father_line(dad))
}
father_line ("Guillou Alan")
# [1] "Guillou Alan" "Guillou Eric" "Guillou Arthur"
用它来获得等级和其他东西
family$father_line <- lapply(family$person,father_line)
family$level <- lengths(family$father_line)
family$patriarch <- sapply(family$father_line,tail,1)
# person father father_line level patriarch
# 1 Guillou Arthur <NA> Guillou Arthur 1 Guillou Arthur
# 2 Cleach Marc <NA> Cleach Marc 1 Cleach Marc
# 3 Guillou Eric Guillou Arthur Guillou Eric, Guillou Arthur 2 Guillou Arthur
# 4 Guillou Jacques Guillou Arthur Guillou Jacques, Guillou Arthur 2 Guillou Arthur
# 5 Cleach Franck Cleach Marc Cleach Franck, Cleach Marc 2 Cleach Marc
# 6 Cleach Leo Cleach Marc Cleach Leo, Cleach Marc 2 Cleach Marc
# 7 Cleach Herbet Cleach Leo Cleach Herbet, Cleach Leo, Cleach Marc 3 Cleach Marc
# 8 Cleach Adele Cleach Herbet Cleach Adele, Cleach Herbet, Cleach Leo, Cleach Marc 4 Cleach Marc
# 9 Guillou Jean Guillou Eric Guillou Jean, Guillou Eric, Guillou Arthur 3 Guillou Arthur
# 10 Guillou Alan Guillou Eric Guillou Alan, Guillou Eric, Guillou Arthur 3 Guillou Arthur
例如,要获得规定的预期输出:
subset(family,patriarch == "Guillou Arthur",select=c(person,father,level))
# person father level
# 1 Guillou Arthur <NA> 1
# 3 Guillou Eric Guillou Arthur 2
# 4 Guillou Jacques Guillou Arthur 2
# 9 Guillou Jean Guillou Eric 3
# 10 Guillou Alan Guillou Eric 3
tidyverse
看起来像这样:
library(tidyverse)
family %>%
mutate(family_line = map(person,father_line),
level = lengths(family_line),
patriarch = map(family_line,last)) %>%
filter(patriarch == "Guillou Arthur") %>%
select(person,father,level)
# person father level
# 1 Guillou Arthur <NA> 1
# 2 Guillou Eric Guillou Arthur 2
# 3 Guillou Jacques Guillou Arthur 2
# 4 Guillou Jean Guillou Eric 3
# 5 Guillou Alan Guillou Eric 3
我正在研究家谱 :
我根据 sqldf https://www.r-bloggers.com/exploring-recursive-ctes-with-sqldf/
改编了 Bob Horton 的例子我的数据:
person father
Guillou Arthur NA
Cleach Marc NA
Guillou Eric Guillou Arthur
Guillou Jacques Guillou Arthur
Cleach Franck Cleach Marc
Cleach Leo Cleach Marc
Cleach Herbet Cleach Leo
Cleach Adele Cleach Herbet
Guillou Jean Guillou Eric
Guillou Alan Guillou Eric
我的结果,后代按 "Guillou Arthur" 级别排序(没有父亲的最高人):
name parent_name level
Guillou Arthur NA 1
Guillou Eric Guillou Arthur 2
Guillou Jacques Guillou Arthur 2
Guillou Alan Guillou Eric 3
Guillou Jean Guillou Eric 3
您可以使用 sqldf 的递归查询构建此 table :
数据:
person <- c("Guillou Arthur",
"Cleach Marc",
"Guillou Eric",
"Guillou Jacques",
"Cleach Franck",
"Cleach Leo",
"Cleach Herbet",
"Cleach Adele",
"Guillou Jean",
"Guillou Alan" )
father <- c(NA, NA, "Guillou Arthur" , "Guillou Arthur", "Cleach Marc", "Cleach Marc", "Cleach Leo", "Cleach Herbet", "Guillou Eric", "Guillou Eric")
family <- data.frame(person, father)
从大到长的格式转换:
library(tidyr)
long_family <- gather(family, parent, parent_name, -person)
long_family
查找 "Guillou Arthur" 的后代的递归查询(没有父亲的顶级人物):
library(sqldf)
descendants_sql <- "
WITH RECURSIVE descendants (name, parent_name, level) AS (
SELECT person, parent_name, 1 FROM long_family
WHERE person = '%s'
AND parent = '%s'
UNION ALL
SELECT F.person, F.parent_name, D.level + 1
FROM descendants D
JOIN long_family F
ON F.parent_name = D.name)
SELECT * FROM descendants ORDER BY level, name
"
fam <- sqldf(sprintf(descendants_sql, 'Guillou Arthur', 'father'))
fam
我的问题:
如何直接使用 R(而不是 sql)创建一个包含所有家谱的 data.frame 对象。
每棵树都以 "Cleach Marc" 这样的族长(没有父亲)开头。 (使用 R 方法或 sqldf 方法)
您或许可以使用图形工具来完成此操作。所以使用 igraph
,你可以使用 ego
函数获得邻居。
速写(需要检查!)
library(igraph)
family[] = lapply(family, factor, levels=unique(unlist(family)))
g = graph_from_adjacency_matrix(table(family))
cg = connect.neighborhood(g, order=length(V(g)), mode="out")
cbind( V(cg)$name,
sapply(ego(g, mode="out", mindist=1), function(x) replace(names(x), length(names(x))==0, NA)),
ego_size(cg, mode="out") )[grep("Guillou", V(cg)$name),]
[,1] [,2] [,3]
[1,] "Guillou Arthur" NA "1"
[2,] "Guillou Eric" "Guillou Arthur" "2"
[3,] "Guillou Jacques" "Guillou Arthur" "2"
[4,] "Guillou Jean" "Guillou Eric" "3"
[5,] "Guillou Alan" "Guillou Eric" "3"
事实上,也许您不需要创建邻域图并且可以通过:
cbind( V(g)$name,
sapply(ego(g, mode="out", mindist=1), function(x) replace(names(x), length(names(x))==0, NA)),
ego_size(g, mode="out", order=length(V(g))) )[grep("Cleach", V(g)$name),]
我们构建一个递归函数来获取父行,从那里一切都很简单。
首先我们用 stringsAsFactors = FALSE
定义数据,以便更顺利地重新格式化。
family <- data.frame(person, father,stringsAsFactors = FALSE)
函数
father_line <- function(x){
dad <- subset(family,person==x)$father
if(is.na(dad)) return(x)
c(x,father_line(dad))
}
father_line ("Guillou Alan")
# [1] "Guillou Alan" "Guillou Eric" "Guillou Arthur"
用它来获得等级和其他东西
family$father_line <- lapply(family$person,father_line)
family$level <- lengths(family$father_line)
family$patriarch <- sapply(family$father_line,tail,1)
# person father father_line level patriarch
# 1 Guillou Arthur <NA> Guillou Arthur 1 Guillou Arthur
# 2 Cleach Marc <NA> Cleach Marc 1 Cleach Marc
# 3 Guillou Eric Guillou Arthur Guillou Eric, Guillou Arthur 2 Guillou Arthur
# 4 Guillou Jacques Guillou Arthur Guillou Jacques, Guillou Arthur 2 Guillou Arthur
# 5 Cleach Franck Cleach Marc Cleach Franck, Cleach Marc 2 Cleach Marc
# 6 Cleach Leo Cleach Marc Cleach Leo, Cleach Marc 2 Cleach Marc
# 7 Cleach Herbet Cleach Leo Cleach Herbet, Cleach Leo, Cleach Marc 3 Cleach Marc
# 8 Cleach Adele Cleach Herbet Cleach Adele, Cleach Herbet, Cleach Leo, Cleach Marc 4 Cleach Marc
# 9 Guillou Jean Guillou Eric Guillou Jean, Guillou Eric, Guillou Arthur 3 Guillou Arthur
# 10 Guillou Alan Guillou Eric Guillou Alan, Guillou Eric, Guillou Arthur 3 Guillou Arthur
例如,要获得规定的预期输出:
subset(family,patriarch == "Guillou Arthur",select=c(person,father,level))
# person father level
# 1 Guillou Arthur <NA> 1
# 3 Guillou Eric Guillou Arthur 2
# 4 Guillou Jacques Guillou Arthur 2
# 9 Guillou Jean Guillou Eric 3
# 10 Guillou Alan Guillou Eric 3
tidyverse
看起来像这样:
library(tidyverse)
family %>%
mutate(family_line = map(person,father_line),
level = lengths(family_line),
patriarch = map(family_line,last)) %>%
filter(patriarch == "Guillou Arthur") %>%
select(person,father,level)
# person father level
# 1 Guillou Arthur <NA> 1
# 2 Guillou Eric Guillou Arthur 2
# 3 Guillou Jacques Guillou Arthur 2
# 4 Guillou Jean Guillou Eric 3
# 5 Guillou Alan Guillou Eric 3