r - 来自 child/parent 关系的分层数据框
r - hierarchical data frame from child/parent relations
我有一个 child - parent data.frame,我想将其转换为包含所有级别和级别编号的完整层次列表。下面的示例数据分为三个级别,但可能更多。我可以使用什么函数来转换数据?
来源:
data.frame(name = c("land", "water", "air", "car", "bicycle", "boat", "balloon",
"airplane", "helicopter", "Ford", "BMW", "Airbus"), parent = c(NA, NA, NA,
"land", "land", "water", "air", "air", "air", "car", "car", "airplane"))
name parent
1 land <NA>
2 water <NA>
3 air <NA>
4 car land
5 bicycle land
6 boat water
7 balloon air
8 airplane air
9 helicopter air
10 Ford car
11 BMW car
12 Airbus airplane
目的地:
data.frame(level1 = c("land", "water", "air", "land", "land", "water", "air",
"air", "air", "land", "land", "air"), level2 = c(NA, NA, NA, "car", "bicylcle",
"boat", "balloon", "airplane", "helicopter", "car", "car", "airplane"),
level3 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, "Ford", "BMW", "Airbus"),
level_number = c(1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3))
level1 level2 level3 level_number
1 land <NA> <NA> 1
2 water <NA> <NA> 1
3 air <NA> <NA> 1
4 land car <NA> 2
5 land bicylcle <NA> 2
6 water boat <NA> 2
7 air balloon <NA> 2
8 air airplane <NA> 2
9 air helicopter <NA> 2
10 land car Ford 3
11 land car BMW 3
12 air airplane Airbus 3
使用 data.table
您可以执行以下操作:
require(data.table)
l <- list() # initialize empty list
setDT(dat)
setkey(dat, parent) # setting up the data as keyed data.table
current_lvl <- dat[is.na(parent), .(level_number = 1), keyby=.(level1 = name)]
不current_lvl看起来如下(按level1键)
level1 level_number
1: air 1
2: land 1
3: water 1
现在的诀窍是加入 dat 和 current_lvl 并适当修改结果:
current_lvl <- current_lvl[dat][ # Join the data.tables
!is.na(level_number)][ #exclude non-child-rows
,level_number := level_number + 1] # increment level_number
setnames(current_lvl, "name", paste0("level",ind+1)) # rename column
setkeyv(current_lvl, paste0("level",ind+1)) # set key
哪个给你(由 level2 键入)
level1 level_number level2
1: air 2 airplane
2: air 2 balloon
3: land 2 bicycle
4: water 2 boat
5: land 2 car
6: air 2 helicopter
按如下方式在 while
循环中运行:
while(nrow(current_lvl) > 0){
ind <- length(l) + 1
l[[ind]] <- current_lvl
current_lvl <- current_lvl[dat][!is.na(level_number)][,level_number := level_number + 1]
if(nrow(current_lvl) == 0L){
break
}
setnames(current_lvl, "name", paste0("level",ind+1))
setkeyv(current_lvl, paste0("level",ind+1))
}
你可以看看l看看结果。通过 rbindlist
结合这个给你你想要的东西
res <- rbindlist(l, fill=TRUE)
setcolorder(res, sort(names(res)))
res
结果是什么
> res
level_number level1 level2 level3
1: 1 air NA NA
2: 1 land NA NA
3: 1 water NA NA
4: 2 air airplane NA
5: 2 air balloon NA
6: 2 land bicycle NA
7: 2 water boat NA
8: 2 land car NA
9: 2 air helicopter NA
10: 3 air airplane Airbus
11: 3 land car BMW
12: 3 land car Ford
使用 data.tree 包,您可以执行以下操作:
library(data.tree)
df <- data.frame(name = c("land", "water", "air", "car", "bicycle", "boat", "balloon", "airplane", "helicopter", "Ford", "BMW", "Airbus"),
parent = c("root", "root", "root", "land", "land", "water", "air", "air", "air", "car", "car", "airplane"))
请注意,我将 NA 替换为 "root",这使得转换为 data.tree 更加容易。即:
tree <- FromDataFrameNetwork(df)
获取所需的格式变得很简单,因为我们可以使用 data.tree:
中的层次结构基础结构
ToDataFrameTree(tree,
level1 = function(x) x$path[2],
level2 = function(x) x$path[3],
level3 = function(x) x$path[4],
level_number = function(x) x$level - 1)[-1,-1]
不要使用 "root"
作为顶级记录的父值。使用 data.tree 包的解决方案很棒,但是,在较新的版本中,"root"
是节点的保留名称。虽然它被自动替换为 "root2",但对 FromDataFrameNetwork(df)
的调用并没有 return 想要的树。
我有一个 child - parent data.frame,我想将其转换为包含所有级别和级别编号的完整层次列表。下面的示例数据分为三个级别,但可能更多。我可以使用什么函数来转换数据?
来源:
data.frame(name = c("land", "water", "air", "car", "bicycle", "boat", "balloon",
"airplane", "helicopter", "Ford", "BMW", "Airbus"), parent = c(NA, NA, NA,
"land", "land", "water", "air", "air", "air", "car", "car", "airplane"))
name parent
1 land <NA>
2 water <NA>
3 air <NA>
4 car land
5 bicycle land
6 boat water
7 balloon air
8 airplane air
9 helicopter air
10 Ford car
11 BMW car
12 Airbus airplane
目的地:
data.frame(level1 = c("land", "water", "air", "land", "land", "water", "air",
"air", "air", "land", "land", "air"), level2 = c(NA, NA, NA, "car", "bicylcle",
"boat", "balloon", "airplane", "helicopter", "car", "car", "airplane"),
level3 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, "Ford", "BMW", "Airbus"),
level_number = c(1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3))
level1 level2 level3 level_number
1 land <NA> <NA> 1
2 water <NA> <NA> 1
3 air <NA> <NA> 1
4 land car <NA> 2
5 land bicylcle <NA> 2
6 water boat <NA> 2
7 air balloon <NA> 2
8 air airplane <NA> 2
9 air helicopter <NA> 2
10 land car Ford 3
11 land car BMW 3
12 air airplane Airbus 3
使用 data.table
您可以执行以下操作:
require(data.table)
l <- list() # initialize empty list
setDT(dat)
setkey(dat, parent) # setting up the data as keyed data.table
current_lvl <- dat[is.na(parent), .(level_number = 1), keyby=.(level1 = name)]
不current_lvl看起来如下(按level1键)
level1 level_number
1: air 1
2: land 1
3: water 1
现在的诀窍是加入 dat 和 current_lvl 并适当修改结果:
current_lvl <- current_lvl[dat][ # Join the data.tables
!is.na(level_number)][ #exclude non-child-rows
,level_number := level_number + 1] # increment level_number
setnames(current_lvl, "name", paste0("level",ind+1)) # rename column
setkeyv(current_lvl, paste0("level",ind+1)) # set key
哪个给你(由 level2 键入)
level1 level_number level2
1: air 2 airplane
2: air 2 balloon
3: land 2 bicycle
4: water 2 boat
5: land 2 car
6: air 2 helicopter
按如下方式在 while
循环中运行:
while(nrow(current_lvl) > 0){
ind <- length(l) + 1
l[[ind]] <- current_lvl
current_lvl <- current_lvl[dat][!is.na(level_number)][,level_number := level_number + 1]
if(nrow(current_lvl) == 0L){
break
}
setnames(current_lvl, "name", paste0("level",ind+1))
setkeyv(current_lvl, paste0("level",ind+1))
}
你可以看看l看看结果。通过 rbindlist
结合这个给你你想要的东西
res <- rbindlist(l, fill=TRUE)
setcolorder(res, sort(names(res)))
res
结果是什么
> res
level_number level1 level2 level3
1: 1 air NA NA
2: 1 land NA NA
3: 1 water NA NA
4: 2 air airplane NA
5: 2 air balloon NA
6: 2 land bicycle NA
7: 2 water boat NA
8: 2 land car NA
9: 2 air helicopter NA
10: 3 air airplane Airbus
11: 3 land car BMW
12: 3 land car Ford
使用 data.tree 包,您可以执行以下操作:
library(data.tree)
df <- data.frame(name = c("land", "water", "air", "car", "bicycle", "boat", "balloon", "airplane", "helicopter", "Ford", "BMW", "Airbus"),
parent = c("root", "root", "root", "land", "land", "water", "air", "air", "air", "car", "car", "airplane"))
请注意,我将 NA 替换为 "root",这使得转换为 data.tree 更加容易。即:
tree <- FromDataFrameNetwork(df)
获取所需的格式变得很简单,因为我们可以使用 data.tree:
中的层次结构基础结构ToDataFrameTree(tree,
level1 = function(x) x$path[2],
level2 = function(x) x$path[3],
level3 = function(x) x$path[4],
level_number = function(x) x$level - 1)[-1,-1]
不要使用 "root"
作为顶级记录的父值。使用 data.tree 包的解决方案很棒,但是,在较新的版本中,"root"
是节点的保留名称。虽然它被自动替换为 "root2",但对 FromDataFrameNetwork(df)
的调用并没有 return 想要的树。