试图将数据写入 newick 格式 R
Trying to write data into newick format R
我有一个数据集,其中包含从顶层开始的不同级别的分支:stock -> mbranch -> sbranch -> lsbranch。我希望能够将这些级别的数据可视化为 Newick 格式。我在每个库存级别中有不同的语言组,我想根据这些最高级别的组制作不同的树。
比如我的数据格式如下:
sample= data.frame("stock" = c("A", "A", "B", "B", "B"), "mbranch" = c("C", "D", "E", "F", "G"), "sbranch" = c("H", "O", NA, "K", "L"), "lsbranch" = c("I", "J", NA, "M", "N"), "name" = c("Andrea", "Kevin", "Charlie", "Naomi", "Sam"))
我正在尝试输出 newick 树格式,类似于:
tree = "(A(C(H(I(Andrew))),D(O(J(Kevin)))),B(E(Charlie),F(K(M(Naomi))),G(L(N(Sam)))));"
plot(read.dendrogram(tree))
我这样做是为了稍后我可以对输出树的节点做一个距离矩阵。
函数 write.tree 是否能够像这样分析数据并从中生成树(假设我的实际数据集要大得多)?或者一般来说,输出树格式的函数。谢谢
您可以使用 ape::read.tree()
函数来读取您的 newick 格式树
tree = "(A(C(H(I(Andrew))),D(O(J(Kevin)))),B(E(Charlie),F(K(M(Naomi))),G(L(N(Sam)))));"
my_tree <- read.tree(text = tree)
plot(my_tree)
然后您可以使用 ape::write.tree
将树保存到 newick 文件中:
write.tree(my_tree, file = "my_file_name.tre")
要将 table 转换为 ape
的 "phylo"
对象,您可以使用此函数(可能需要一些调整):
## The function
data.frame.to.phylo <- function(sample){
## Making an edge table
edge_table <- rbind(
## The root connecting A to B
rbind(c("root", "A"),c("root", "B")),
## All the nodes connecting to the tips
cbind(sample$stock, sample$name)
)
## Translating the values in the edge table into edge IDs
## The order must be tips, root, nodes
element_names <- c(unique(sample$name), "root", unique(sample$stock))
element_ids <- seq(1:length(element_names))
## Looping through each ID and name
for(element in element_ids) {
edge_table <- ifelse(edge_table == element_names[element], element_ids[element], edge_table)
}
## Make numeric
edge_table <- apply(edge_table, 2, as.numeric)
## Build the phylo object
phylo_object <- list()
phylo_object$edge <- edge_table
phylo_object$tip.label <- unique(sample$name)
phylo_object$node.label <- c("root", unique(sample$stock))
phylo_object$Nnode <- length(phylo_object$node.label)
## Forcing the class to be "phylo"
class(phylo_object) <- "phylo"
return(phylo_object)
}
## The data
sample = data.frame("stock" = c("A", "A", "B", "B", "B"), "mbranch" = c("C", "D", "E", "F", "G"), "sbranch" = c("H", "O", NA, "K", "L"), "lsbranch" = c("I", "J", NA, "M", "N"), "name" = c("Andrea", "Kevin", "Charlie", "Naomi", "Sam"))
## Plotting the data.frame for testing the function
plot(data.frame.to.phylo(sample))
干杯,
托马斯
我有一个数据集,其中包含从顶层开始的不同级别的分支:stock -> mbranch -> sbranch -> lsbranch。我希望能够将这些级别的数据可视化为 Newick 格式。我在每个库存级别中有不同的语言组,我想根据这些最高级别的组制作不同的树。
比如我的数据格式如下:
sample= data.frame("stock" = c("A", "A", "B", "B", "B"), "mbranch" = c("C", "D", "E", "F", "G"), "sbranch" = c("H", "O", NA, "K", "L"), "lsbranch" = c("I", "J", NA, "M", "N"), "name" = c("Andrea", "Kevin", "Charlie", "Naomi", "Sam"))
我正在尝试输出 newick 树格式,类似于:
tree = "(A(C(H(I(Andrew))),D(O(J(Kevin)))),B(E(Charlie),F(K(M(Naomi))),G(L(N(Sam)))));"
plot(read.dendrogram(tree))
我这样做是为了稍后我可以对输出树的节点做一个距离矩阵。
函数 write.tree 是否能够像这样分析数据并从中生成树(假设我的实际数据集要大得多)?或者一般来说,输出树格式的函数。谢谢
您可以使用 ape::read.tree()
函数来读取您的 newick 格式树
tree = "(A(C(H(I(Andrew))),D(O(J(Kevin)))),B(E(Charlie),F(K(M(Naomi))),G(L(N(Sam)))));"
my_tree <- read.tree(text = tree)
plot(my_tree)
然后您可以使用 ape::write.tree
将树保存到 newick 文件中:
write.tree(my_tree, file = "my_file_name.tre")
要将 table 转换为 ape
的 "phylo"
对象,您可以使用此函数(可能需要一些调整):
## The function
data.frame.to.phylo <- function(sample){
## Making an edge table
edge_table <- rbind(
## The root connecting A to B
rbind(c("root", "A"),c("root", "B")),
## All the nodes connecting to the tips
cbind(sample$stock, sample$name)
)
## Translating the values in the edge table into edge IDs
## The order must be tips, root, nodes
element_names <- c(unique(sample$name), "root", unique(sample$stock))
element_ids <- seq(1:length(element_names))
## Looping through each ID and name
for(element in element_ids) {
edge_table <- ifelse(edge_table == element_names[element], element_ids[element], edge_table)
}
## Make numeric
edge_table <- apply(edge_table, 2, as.numeric)
## Build the phylo object
phylo_object <- list()
phylo_object$edge <- edge_table
phylo_object$tip.label <- unique(sample$name)
phylo_object$node.label <- c("root", unique(sample$stock))
phylo_object$Nnode <- length(phylo_object$node.label)
## Forcing the class to be "phylo"
class(phylo_object) <- "phylo"
return(phylo_object)
}
## The data
sample = data.frame("stock" = c("A", "A", "B", "B", "B"), "mbranch" = c("C", "D", "E", "F", "G"), "sbranch" = c("H", "O", NA, "K", "L"), "lsbranch" = c("I", "J", NA, "M", "N"), "name" = c("Andrea", "Kevin", "Charlie", "Naomi", "Sam"))
## Plotting the data.frame for testing the function
plot(data.frame.to.phylo(sample))
干杯, 托马斯