将数据框转换为树网络兼容列表
Convert a data frame to a treeNetwork compatible list
考虑以下数据框:
Country Provinces City Zone
1 Canada Newfondland St Johns A
2 Canada PEI Charlottetown B
3 Canada Nova Scotia Halifax C
4 Canada New Brunswick Fredericton D
5 Canada Quebec NA NA
6 Canada Quebec Quebec City NA
7 Canada Ontario Toronto A
8 Canada Ontario Ottawa B
9 Canada Manitoba Winnipeg C
10 Canada Saskatchewan Regina D
是否有巧妙的方法将其转换为 treeNetwork
兼容列表(来自 networkD3
包),格式如下:
CanadaPC <- list(name = "Canada",
children = list(
list(name = "Newfoundland",
children = list(list(name = "St. John's",
children = list(list(name = "A"))))),
list(name = "PEI",
children = list(list(name = "Charlottetown",
children = list(list(name = "B"))))),
list(name = "Nova Scotia",
children = list(list(name = "Halifax",
children = list(list(name = "C"))))),
list(name = "New Brunswick",
children = list(list(name = "Fredericton",
children = list(list(name = "D"))))),
list(name = "Quebec",
children = list(list(name = "Quebec City"))),
list(name = "Ontario",
children = list(list(name = "Toronto",
children = list(list(name = "A"))),
list(name = "Ottawa",
children = list(list(name = "B"))))),
list(name = "Manitoba",
children = list(list(name = "Winnipeg",
children = list(list(name = "C"))))),
list(name = "Saskatchewan",
children = list(list(name = "Regina",
children = list(list(name = "D")))))))
为了绘制具有任意级别集的 Reingold-Tilford 树:
我已经尝试了几个 sub-optimal 例程,包括 for
循环的混乱组合,但我无法以所需的格式获得它。
理想情况下,该函数将缩放以将第一列视为 root
(起点),而其他列将是 children.
的不同级别
编辑
有人就同一主题询问了 similar question,@MrFlick 提供了一个有趣的递归函数。原始数据框有一组固定的级别。我引入了 NA
s 以添加 @MrFlick 初始解决方案中未解决的另一个复杂级别(任意级别集)。
数据
structure(list(Country = c("Canada", "Canada", "Canada", "Canada",
"Canada", "Canada", "Canada", "Canada", "Canada", "Canada"),
Provinces = c("Newfondland", "PEI", "Nova Scotia", "New Brunswick",
"Quebec", "Quebec", "Ontario", "Ontario", "Manitoba", "Saskatchewan"
), City = c("St Johns", "Charlottetown", "Halifax", "Fredericton",
NA, "Quebec City", "Toronto", "Ottawa", "Winnipeg", "Regina"
), Zone = c("A", "B", "C", "D", NA, NA, "A", "B", "C",
"D")), class = "data.frame", row.names = c(NA, -10L), .Names = c("Country",
"Provinces", "City", "Zone"))
对于这种情况更好的策略可能是递归 split()
下面是这样的一个实现。首先,这是样本数据
dd<-structure(list(Country = c("Canada", "Canada", "Canada", "Canada",
"Canada", "Canada", "Canada", "Canada", "Canada", "Canada"),
Provinces = c("Newfondland", "PEI", "Nova Scotia", "New Brunswick",
"Quebec", "Quebec", "Ontario", "Ontario", "Manitoba", "Saskatchewan"
), City = c("St Johns", "Charlottetown", "Halifax", "Fredericton",
NA, "Quebec City", "Toronto", "Ottawa", "Winnipeg", "Regina"
), Zone = c("A", "B", "C", "D", NA, NA, "A", "B", "C",
"D")), class = "data.frame", row.names = c(NA, -10L), .Names = c("Country",
"Provinces", "City", "Zone"))
请注意,我已将 "NA"
字符串替换为真实的 NA
值。现在,函数
rsplit <- function(x) {
x <- x[!is.na(x[,1]),,drop=FALSE]
if(nrow(x)==0) return(NULL)
if(ncol(x)==1) return(lapply(x[,1], function(v) list(name=v)))
s <- split(x[,-1, drop=FALSE], x[,1])
unname(mapply(function(v,n) {if(!is.null(v)) list(name=n, children=v) else list(name=n)}, lapply(s, rsplit), names(s), SIMPLIFY=FALSE))
}
那我们可以运行
rsplit(dd)
它似乎与测试数据一起工作。唯一不同的是子节点的排列顺序。
考虑以下数据框:
Country Provinces City Zone
1 Canada Newfondland St Johns A
2 Canada PEI Charlottetown B
3 Canada Nova Scotia Halifax C
4 Canada New Brunswick Fredericton D
5 Canada Quebec NA NA
6 Canada Quebec Quebec City NA
7 Canada Ontario Toronto A
8 Canada Ontario Ottawa B
9 Canada Manitoba Winnipeg C
10 Canada Saskatchewan Regina D
是否有巧妙的方法将其转换为 treeNetwork
兼容列表(来自 networkD3
包),格式如下:
CanadaPC <- list(name = "Canada",
children = list(
list(name = "Newfoundland",
children = list(list(name = "St. John's",
children = list(list(name = "A"))))),
list(name = "PEI",
children = list(list(name = "Charlottetown",
children = list(list(name = "B"))))),
list(name = "Nova Scotia",
children = list(list(name = "Halifax",
children = list(list(name = "C"))))),
list(name = "New Brunswick",
children = list(list(name = "Fredericton",
children = list(list(name = "D"))))),
list(name = "Quebec",
children = list(list(name = "Quebec City"))),
list(name = "Ontario",
children = list(list(name = "Toronto",
children = list(list(name = "A"))),
list(name = "Ottawa",
children = list(list(name = "B"))))),
list(name = "Manitoba",
children = list(list(name = "Winnipeg",
children = list(list(name = "C"))))),
list(name = "Saskatchewan",
children = list(list(name = "Regina",
children = list(list(name = "D")))))))
为了绘制具有任意级别集的 Reingold-Tilford 树:
我已经尝试了几个 sub-optimal 例程,包括 for
循环的混乱组合,但我无法以所需的格式获得它。
理想情况下,该函数将缩放以将第一列视为 root
(起点),而其他列将是 children.
编辑
有人就同一主题询问了 similar question,@MrFlick 提供了一个有趣的递归函数。原始数据框有一组固定的级别。我引入了 NA
s 以添加 @MrFlick 初始解决方案中未解决的另一个复杂级别(任意级别集)。
数据
structure(list(Country = c("Canada", "Canada", "Canada", "Canada",
"Canada", "Canada", "Canada", "Canada", "Canada", "Canada"),
Provinces = c("Newfondland", "PEI", "Nova Scotia", "New Brunswick",
"Quebec", "Quebec", "Ontario", "Ontario", "Manitoba", "Saskatchewan"
), City = c("St Johns", "Charlottetown", "Halifax", "Fredericton",
NA, "Quebec City", "Toronto", "Ottawa", "Winnipeg", "Regina"
), Zone = c("A", "B", "C", "D", NA, NA, "A", "B", "C",
"D")), class = "data.frame", row.names = c(NA, -10L), .Names = c("Country",
"Provinces", "City", "Zone"))
对于这种情况更好的策略可能是递归 split()
下面是这样的一个实现。首先,这是样本数据
dd<-structure(list(Country = c("Canada", "Canada", "Canada", "Canada",
"Canada", "Canada", "Canada", "Canada", "Canada", "Canada"),
Provinces = c("Newfondland", "PEI", "Nova Scotia", "New Brunswick",
"Quebec", "Quebec", "Ontario", "Ontario", "Manitoba", "Saskatchewan"
), City = c("St Johns", "Charlottetown", "Halifax", "Fredericton",
NA, "Quebec City", "Toronto", "Ottawa", "Winnipeg", "Regina"
), Zone = c("A", "B", "C", "D", NA, NA, "A", "B", "C",
"D")), class = "data.frame", row.names = c(NA, -10L), .Names = c("Country",
"Provinces", "City", "Zone"))
请注意,我已将 "NA"
字符串替换为真实的 NA
值。现在,函数
rsplit <- function(x) {
x <- x[!is.na(x[,1]),,drop=FALSE]
if(nrow(x)==0) return(NULL)
if(ncol(x)==1) return(lapply(x[,1], function(v) list(name=v)))
s <- split(x[,-1, drop=FALSE], x[,1])
unname(mapply(function(v,n) {if(!is.null(v)) list(name=n, children=v) else list(name=n)}, lapply(s, rsplit), names(s), SIMPLIFY=FALSE))
}
那我们可以运行
rsplit(dd)
它似乎与测试数据一起工作。唯一不同的是子节点的排列顺序。