基于使用 R 的尖端标签中存在的字符串标记系统发育中的内部节点

Labelling internal nodes in phylogeny based on string present in tip labels with R

我正在处理一组系统发育,其中一些提示对应于感兴趣的物种,被标记为 {Foreground}。我使用 this code 生成了那些标记的系统发育,以防有用。例如:

((dqua_filteredTranscripts_dqua_XP_014485378_1_p1{Foreground}:0.16707,(hsal_filteredTranscripts_hsal_XP_019699576_1_p1{Foreground}:0.09303,obru_filteredTranscripts_obru_XP_032675222_1{Foreground}:0.14764)n2:0.0264)n1:0.081515,(lhum_filteredTranscripts_lhum_XP_012225643_1:0.15063,((((lnig_filteredTranscripts_lnig_lcl|LBMM01007291_1_cds_KMQ89872_1_9717_p1:0.03303,nful_filteredTranscripts_nful_XP_029159864_1:0.06193)n7:0.0208,(fexs_filteredTranscripts_fexs_XP_029666948_1_p1:0.05165,cflo_filteredTranscripts_cflo_XP_011252390_1:0.14872)n8:0.01846)n6:0.09014,((ebur_filteredTranscripts_ebur_evm_model_scaffold_4_137_p1:0.16113,obir_filteredTranscripts_obir_XP_011352658_1:0.18685)n10:0.16551,pgra_filteredTranscripts_pgra_XP_020292968_1:0.26079)n9:0.02221)n5:0.01985,(pbar_filteredTranscripts_pbar_XP_011642439_1:0.14069,((mpha_filteredTranscripts_mpha_XP_028047735_2:0.22823,((tcur_filteredTranscripts_tcur_XP_024882433_1:0.00706,tcur_filteredTranscripts_tcur_XP_024887856_1:0.00054)n15:0.08977,(cobs_filteredTranscripts_cobs_Cobs_17518mRNA1_p1:0.18228,veme_filteredTranscripts_veme_XP_011868525_1_p1:0.18631)n16:0.02472)n14:0.04118)n13:0.01637,(((waur_filteredTranscripts_waur_XP_011691747_1:0.00055,waur_filteredTranscripts_waur_XP_011706608_1:0.00054)n19:0.14732,(ccos_filteredTranscripts_ccos_XP_018403444_1:0.05989,(tsep_filteredTranscripts_tsep_XP_018353851_1{Foreground}:0.03875,(acol_filteredTranscripts_acol_XP_018049575_1_p1{Foreground}:0.03578,aech_filteredTranscripts_aech_XP_011050323_1{Foreground}:0.03598)n22:0.00783)n21:0.03036)n20:0.05015)n18:0.0229,(cvar_filteredTranscripts_cvar_CVAR_10443RA_p1:0.09577,sinv_filteredTranscripts_sinv_XP_011164490_1:0.09784)n23:0.01294)n17:0.01659)n12:0.04417)n11:0.07617)n4:0.03295)n3:0.081515)n0;

我还需要将标签 {Foreground} 添加到标记提示的父级内部节点,而不是任何未标记提示的父级,例如我想要的结果是:

((dqua_filteredTranscripts_dqua_XP_014485378_1_p1{Foreground}:0.16707,(hsal_filteredTranscripts_hsal_XP_019699576_1_p1{Foreground}:0.09303,obru_filteredTranscripts_obru_XP_032675222_1{Foreground}:0.14764)n2{Foreground}:0.0264)n1{Foreground}:0.081515,(lhum_filteredTranscripts_lhum_XP_012225643_1:0.15063,((((lnig_filteredTranscripts_lnig_lcl|LBMM01007291_1_cds_KMQ89872_1_9717_p1:0.03303,nful_filteredTranscripts_nful_XP_029159864_1:0.06193)n7:0.0208,(fexs_filteredTranscripts_fexs_XP_029666948_1_p1:0.05165,cflo_filteredTranscripts_cflo_XP_011252390_1:0.14872)n8:0.01846)n6:0.09014,((ebur_filteredTranscripts_ebur_evm_model_scaffold_4_137_p1:0.16113,obir_filteredTranscripts_obir_XP_011352658_1:0.18685)n10:0.16551,pgra_filteredTranscripts_pgra_XP_020292968_1:0.26079)n9:0.02221)n5:0.01985,(pbar_filteredTranscripts_pbar_XP_011642439_1:0.14069,((mpha_filteredTranscripts_mpha_XP_028047735_2:0.22823,((tcur_filteredTranscripts_tcur_XP_024882433_1:0.00706,tcur_filteredTranscripts_tcur_XP_024887856_1:0.00054)n15:0.08977,(cobs_filteredTranscripts_cobs_Cobs_17518mRNA1_p1:0.18228,veme_filteredTranscripts_veme_XP_011868525_1_p1:0.18631)n16:0.02472)n14:0.04118)n13:0.01637,(((waur_filteredTranscripts_waur_XP_011691747_1:0.00055,waur_filteredTranscripts_waur_XP_011706608_1:0.00054)n19:0.14732,(ccos_filteredTranscripts_ccos_XP_018403444_1:0.05989,(tsep_filteredTranscripts_tsep_XP_018353851_1{Foreground}:0.03875,(acol_filteredTranscripts_acol_XP_018049575_1_p1{Foreground}:0.03578,aech_filteredTranscripts_aech_XP_011050323_1{Foreground}:0.03598)n22{Foreground}:0.00783)n21{Foreground}:0.03036)n20:0.05015)n18:0.0229,(cvar_filteredTranscripts_cvar_CVAR_10443RA_p1:0.09577,sinv_filteredTranscripts_sinv_XP_011164490_1:0.09784)n23:0.01294)n17:0.01659)n12:0.04417)n11:0.07617)n4:0.03295)n3:0.081515):0;

我尝试使用 ape::makeNodeLabel,为我标记的物种提供物种缩写列表:

ape::makeNodeLabel(test, method = "user", nodeList = list(`{Foreground}` = c("aech", "acol", "tsep", "obru", "hsal", "dqua")))

但是,该方法将节点父节点标记为 所有 我标记的提示以及未标记的提示,例如:

((dqua_filteredTranscripts_dqua_XP_014485378_1_p1{Foreground}:0.16707,(hsal_filteredTranscripts_hsal_XP_019699576_1_p1{Foreground}:0.09303,obru_filteredTranscripts_obru_XP_032675222_1{Foreground}:0.14764)n2:0.0264)n1:0.081515,(lhum_filteredTranscripts_lhum_XP_012225643_1:0.15063,((((lnig_filteredTranscripts_lnig_lcl|LBMM01007291_1_cds_KMQ89872_1_9717_p1:0.03303,nful_filteredTranscripts_nful_XP_029159864_1:0.06193)n7:0.0208,(fexs_filteredTranscripts_fexs_XP_029666948_1_p1:0.05165,cflo_filteredTranscripts_cflo_XP_011252390_1:0.14872)n8:0.01846)n6:0.09014,((ebur_filteredTranscripts_ebur_evm_model_scaffold_4_137_p1:0.16113,obir_filteredTranscripts_obir_XP_011352658_1:0.18685)n10:0.16551,pgra_filteredTranscripts_pgra_XP_020292968_1:0.26079)n9:0.02221)n5:0.01985,(pbar_filteredTranscripts_pbar_XP_011642439_1:0.14069,((mpha_filteredTranscripts_mpha_XP_028047735_2:0.22823,((tcur_filteredTranscripts_tcur_XP_024882433_1:0.00706,tcur_filteredTranscripts_tcur_XP_024887856_1:0.00054)n15:0.08977,(cobs_filteredTranscripts_cobs_Cobs_17518mRNA1_p1:0.18228,veme_filteredTranscripts_veme_XP_011868525_1_p1:0.18631)n16:0.02472)n14:0.04118)n13:0.01637,(((waur_filteredTranscripts_waur_XP_011691747_1:0.00055,waur_filteredTranscripts_waur_XP_011706608_1:0.00054)n19:0.14732,(ccos_filteredTranscripts_ccos_XP_018403444_1:0.05989,(tsep_filteredTranscripts_tsep_XP_018353851_1{Foreground}:0.03875,(acol_filteredTranscripts_acol_XP_018049575_1_p1{Foreground}:0.03578,aech_filteredTranscripts_aech_XP_011050323_1{Foreground}:0.03598)n22:0.00783)n21:0.03036)n20:0.05015)n18:0.0229,(cvar_filteredTranscripts_cvar_CVAR_10443RA_p1:0.09577,sinv_filteredTranscripts_sinv_XP_011164490_1:0.09784)n23:0.01294)n17:0.01659)n12:0.04417)n11:0.07617)n4:0.03295)n3:0.081515){Foreground};

至关重要的是,我需要操纵的系统发育只能包含感兴趣物种列表的一个子集,并且可能包含每个感兴趣物种的多个提示(即两个包含缩写 aech 的提示在单个树)。我需要对包含不同物种组的数千个系统发育执行此标记程序。我希望能够提供一个物种缩写向量 (c("aech", "acol", "tsep", "obru", "hsal", "dqua")) 并输出我上面描述的系统发育。

在此先感谢您的帮助!

我无法重现你的数据,所以我将展示一个使用随机树的解决方案:

## A random tree with node labels
tree <- rtree(50)
tree <- makeNodeLabel(tree)

## Attributing the term "{Foreground}" to 10 random tips
selected_tips <- sample(1:50, 10)
tree$tip.label[selected_tips] <- paste(tree$tip.label[selected_tips], "{Foreground}")

第一步是找到包含术语{Foreground}的提示(or/and "aech""acol"等,逻辑相同)。

## Finding which tip has the term "{Foreground}" (assuming we did not generate it randomly before)
selected_tips <- grep("\{Foreground\}", tree$tip.label)

然后我们可以找到哪些节点是这些提示的直接后代(更多关于如何阅读$edge tables here

## Finding the direct ancestor for each of these tips
descendants <- tree$edge[tree$edge[, 2] %in% selected_tips, 1]

然后创建一串仅包含这些选定后代的节点标签

## Adding the term "{Foreground}" to the selected descendants (the -Ntip(tree) part is because the node counting in the $edge table starts at the value Ntip(tree)).
tree$node.label[descendants-Ntip(tree)] <- paste(tree$node.label[descendants-Ntip(tree)], "{Foreground}")
## Replacing all the non selected node labels by nothing ("")
tree$node.label[-c(descendants-Ntip(tree))] <- ""

## Plotting the results
plot(tree)
nodelabels(tree$node.label)