如何将 coreNLP 生成的解析树转换成 data.tree R 包

How to convert coreNLP generated parse tree into data.tree R package

我想将R包coreNLP生成的解析树转换成data.treeR包格式。使用以下代码生成解析树:

 options( java.parameters = "-Xmx2g" ) 
library(NLP)
library(coreNLP)
#initCoreNLP() # change this if downloaded to non-standard location
initCoreNLP(annotators = "tokenize,ssplit,pos,lemma,parse")
## Some text.
s <- c("A rare black squirrel has become a regular visitor to a suburban garden.")
s <- as.String(s)


anno<-annotateString(s)
parse_tree <- getParse(anno)
parse_tree

The output parse tree is as follows:
> parse_tree
[1] "(ROOT\r\n  (S\r\n    (NP (DT A) (JJ rare) (JJ black) (NN squirrel))\r\n    (VP (VBZ has)\r\n      (VP (VBN become)\r\n        (NP (DT a) (JJ regular) (NN visitor))\r\n        (PP (TO to)\r\n          (NP (DT a) (JJ suburban) (NN garden)))))\r\n    (. .)))\r\n\r\n"

我发现发帖后 Visualize Parse Tree Structure .它将 openNLP 包生成的解析树转换为树格式。但是解析树不同于 coreNLP 生成的解析树,而且解决方案也没有转换为我想要的 data.tree 格式。

编辑 通过添加下面的 2 行,我们可以使用帖子中提供的功能 Visualize Parse Tree Structure

# this step modifies coreNLP parse tree to mimic openNLP parse tree
parse_tree <- gsub("[\r\n]", "", parse_tree)
parse_tree <- gsub("ROOT", "TOP", parse_tree)

library(igraph)
library(NLP)

parse2graph(parse_tree,  # plus optional graphing parameters
            title = sprintf("'%s'", x), margin=-0.05,
            vertex.color=NA, vertex.frame.color=NA,
            vertex.label.font=2, vertex.label.cex=1.5, asp=0.5,
            edge.width=1.5, edge.color='black', edge.arrow.size=0)

但我想要的是将解析树转换成 data.tree 格式 由 data.tree package

提供

一旦有了边列表,转换到 data.tree 就很简单了。仅替换 parse2graph 函数的最后一位,并将样式移出函数:

parse2tree <- function(ptext) {
  stopifnot(require(NLP) && require(igraph))

  ## Replace words with unique versions
  ms <- gregexpr("[^() ]+", ptext)                                      # just ignoring spaces and brackets?
  words <- regmatches(ptext, ms)[[1]]                                   # just words
  regmatches(ptext, ms) <- list(paste0(words, seq.int(length(words))))  # add id to words

  ## Going to construct an edgelist and pass that to igraph
  ## allocate here since we know the size (number of nodes - 1) and -1 more to exclude 'TOP'
  edgelist <- matrix('', nrow=length(words)-2, ncol=2)

  ## Function to fill in edgelist in place
  edgemaker <- (function() {
    i <- 0                                       # row counter
    g <- function(node) {                        # the recursive function
      if (inherits(node, "Tree")) {            # only recurse subtrees
        if ((val <- node$value) != 'TOP1') { # skip 'TOP' node (added '1' above)
          for (child in node$children) {
            childval <- if(inherits(child, "Tree")) child$value else child
            i <<- i+1
            edgelist[i,1:2] <<- c(val, childval)
          }
        }
        invisible(lapply(node$children, g))
      }
    }
  })()

  ## Create the edgelist from the parse tree
  edgemaker(Tree_parse(ptext))
  tree <- FromDataFrameNetwork(as.data.frame(edgelist))
  return (tree)
}


parse_tree <- "(ROOT\r\n  (S\r\n    (NP (DT A) (JJ rare) (JJ black) (NN squirrel))\r\n    (VP (VBZ has)\r\n      (VP (VBN become)\r\n        (NP (DT a) (JJ regular) (NN visitor))\r\n        (PP (TO to)\r\n          (NP (DT a) (JJ suburban) (NN garden)))))\r\n    (. .)))\r\n\r\n"
parse_tree <- gsub("[\r\n]", "", parse_tree)
parse_tree <- gsub("ROOT", "TOP", parse_tree)

library(data.tree)

tree <- parse2tree(parse_tree)
tree
SetNodeStyle(tree, style = "filled,rounded", shape = "box", fillcolor = "GreenYellow")
plot(tree)