如何将 coreNLP 生成的解析树转换成 data.tree R 包
How to convert coreNLP generated parse tree into data.tree R package
我想将R包coreNLP生成的解析树转换成data.treeR包格式。使用以下代码生成解析树:
options( java.parameters = "-Xmx2g" )
library(NLP)
library(coreNLP)
#initCoreNLP() # change this if downloaded to non-standard location
initCoreNLP(annotators = "tokenize,ssplit,pos,lemma,parse")
## Some text.
s <- c("A rare black squirrel has become a regular visitor to a suburban garden.")
s <- as.String(s)
anno<-annotateString(s)
parse_tree <- getParse(anno)
parse_tree
The output parse tree is as follows:
> parse_tree
[1] "(ROOT\r\n (S\r\n (NP (DT A) (JJ rare) (JJ black) (NN squirrel))\r\n (VP (VBZ has)\r\n (VP (VBN become)\r\n (NP (DT a) (JJ regular) (NN visitor))\r\n (PP (TO to)\r\n (NP (DT a) (JJ suburban) (NN garden)))))\r\n (. .)))\r\n\r\n"
我发现发帖后 Visualize Parse Tree Structure
.它将 openNLP 包生成的解析树转换为树格式。但是解析树不同于 coreNLP 生成的解析树,而且解决方案也没有转换为我想要的 data.tree 格式。
编辑
通过添加下面的 2 行,我们可以使用帖子中提供的功能 Visualize Parse Tree Structure
# this step modifies coreNLP parse tree to mimic openNLP parse tree
parse_tree <- gsub("[\r\n]", "", parse_tree)
parse_tree <- gsub("ROOT", "TOP", parse_tree)
library(igraph)
library(NLP)
parse2graph(parse_tree, # plus optional graphing parameters
title = sprintf("'%s'", x), margin=-0.05,
vertex.color=NA, vertex.frame.color=NA,
vertex.label.font=2, vertex.label.cex=1.5, asp=0.5,
edge.width=1.5, edge.color='black', edge.arrow.size=0)
但我想要的是将解析树转换成 data.tree 格式 由 data.tree package
提供
一旦有了边列表,转换到 data.tree 就很简单了。仅替换 parse2graph 函数的最后一位,并将样式移出函数:
parse2tree <- function(ptext) {
stopifnot(require(NLP) && require(igraph))
## Replace words with unique versions
ms <- gregexpr("[^() ]+", ptext) # just ignoring spaces and brackets?
words <- regmatches(ptext, ms)[[1]] # just words
regmatches(ptext, ms) <- list(paste0(words, seq.int(length(words)))) # add id to words
## Going to construct an edgelist and pass that to igraph
## allocate here since we know the size (number of nodes - 1) and -1 more to exclude 'TOP'
edgelist <- matrix('', nrow=length(words)-2, ncol=2)
## Function to fill in edgelist in place
edgemaker <- (function() {
i <- 0 # row counter
g <- function(node) { # the recursive function
if (inherits(node, "Tree")) { # only recurse subtrees
if ((val <- node$value) != 'TOP1') { # skip 'TOP' node (added '1' above)
for (child in node$children) {
childval <- if(inherits(child, "Tree")) child$value else child
i <<- i+1
edgelist[i,1:2] <<- c(val, childval)
}
}
invisible(lapply(node$children, g))
}
}
})()
## Create the edgelist from the parse tree
edgemaker(Tree_parse(ptext))
tree <- FromDataFrameNetwork(as.data.frame(edgelist))
return (tree)
}
parse_tree <- "(ROOT\r\n (S\r\n (NP (DT A) (JJ rare) (JJ black) (NN squirrel))\r\n (VP (VBZ has)\r\n (VP (VBN become)\r\n (NP (DT a) (JJ regular) (NN visitor))\r\n (PP (TO to)\r\n (NP (DT a) (JJ suburban) (NN garden)))))\r\n (. .)))\r\n\r\n"
parse_tree <- gsub("[\r\n]", "", parse_tree)
parse_tree <- gsub("ROOT", "TOP", parse_tree)
library(data.tree)
tree <- parse2tree(parse_tree)
tree
SetNodeStyle(tree, style = "filled,rounded", shape = "box", fillcolor = "GreenYellow")
plot(tree)
我想将R包coreNLP生成的解析树转换成data.treeR包格式。使用以下代码生成解析树:
options( java.parameters = "-Xmx2g" )
library(NLP)
library(coreNLP)
#initCoreNLP() # change this if downloaded to non-standard location
initCoreNLP(annotators = "tokenize,ssplit,pos,lemma,parse")
## Some text.
s <- c("A rare black squirrel has become a regular visitor to a suburban garden.")
s <- as.String(s)
anno<-annotateString(s)
parse_tree <- getParse(anno)
parse_tree
The output parse tree is as follows:
> parse_tree
[1] "(ROOT\r\n (S\r\n (NP (DT A) (JJ rare) (JJ black) (NN squirrel))\r\n (VP (VBZ has)\r\n (VP (VBN become)\r\n (NP (DT a) (JJ regular) (NN visitor))\r\n (PP (TO to)\r\n (NP (DT a) (JJ suburban) (NN garden)))))\r\n (. .)))\r\n\r\n"
我发现发帖后 Visualize Parse Tree Structure .它将 openNLP 包生成的解析树转换为树格式。但是解析树不同于 coreNLP 生成的解析树,而且解决方案也没有转换为我想要的 data.tree 格式。
编辑 通过添加下面的 2 行,我们可以使用帖子中提供的功能 Visualize Parse Tree Structure
# this step modifies coreNLP parse tree to mimic openNLP parse tree
parse_tree <- gsub("[\r\n]", "", parse_tree)
parse_tree <- gsub("ROOT", "TOP", parse_tree)
library(igraph)
library(NLP)
parse2graph(parse_tree, # plus optional graphing parameters
title = sprintf("'%s'", x), margin=-0.05,
vertex.color=NA, vertex.frame.color=NA,
vertex.label.font=2, vertex.label.cex=1.5, asp=0.5,
edge.width=1.5, edge.color='black', edge.arrow.size=0)
但我想要的是将解析树转换成 data.tree 格式 由 data.tree package
提供一旦有了边列表,转换到 data.tree 就很简单了。仅替换 parse2graph 函数的最后一位,并将样式移出函数:
parse2tree <- function(ptext) {
stopifnot(require(NLP) && require(igraph))
## Replace words with unique versions
ms <- gregexpr("[^() ]+", ptext) # just ignoring spaces and brackets?
words <- regmatches(ptext, ms)[[1]] # just words
regmatches(ptext, ms) <- list(paste0(words, seq.int(length(words)))) # add id to words
## Going to construct an edgelist and pass that to igraph
## allocate here since we know the size (number of nodes - 1) and -1 more to exclude 'TOP'
edgelist <- matrix('', nrow=length(words)-2, ncol=2)
## Function to fill in edgelist in place
edgemaker <- (function() {
i <- 0 # row counter
g <- function(node) { # the recursive function
if (inherits(node, "Tree")) { # only recurse subtrees
if ((val <- node$value) != 'TOP1') { # skip 'TOP' node (added '1' above)
for (child in node$children) {
childval <- if(inherits(child, "Tree")) child$value else child
i <<- i+1
edgelist[i,1:2] <<- c(val, childval)
}
}
invisible(lapply(node$children, g))
}
}
})()
## Create the edgelist from the parse tree
edgemaker(Tree_parse(ptext))
tree <- FromDataFrameNetwork(as.data.frame(edgelist))
return (tree)
}
parse_tree <- "(ROOT\r\n (S\r\n (NP (DT A) (JJ rare) (JJ black) (NN squirrel))\r\n (VP (VBZ has)\r\n (VP (VBN become)\r\n (NP (DT a) (JJ regular) (NN visitor))\r\n (PP (TO to)\r\n (NP (DT a) (JJ suburban) (NN garden)))))\r\n (. .)))\r\n\r\n"
parse_tree <- gsub("[\r\n]", "", parse_tree)
parse_tree <- gsub("ROOT", "TOP", parse_tree)
library(data.tree)
tree <- parse2tree(parse_tree)
tree
SetNodeStyle(tree, style = "filled,rounded", shape = "box", fillcolor = "GreenYellow")
plot(tree)