斯坦福 NLP:在单行上标记化输出?

Stanford NLP: Tokenize output on a single line?

我们能否像 Apache OpenNLP 那样使用命令行工具在单行上输出分词器输出? http://nlp.stanford.edu/software/tokenizer.shtml https://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.tokenizer

您可以通过编程方式或从命令行使用 DocumentPreprocessor

来自 CLI:

$ echo "This is a test. And some more." | java edu.stanford.nlp.process.DocumentPreprocessor 2>/dev/null
This is a test .
And some more .

您可以通过编程方式做同样的事情;参见 this SO answer