Java 中的 SparkNLP 情感分析
SparkNLP Sentiment Analysis in Java
我想使用 SparkNLP 使用默认训练模型对列 column1
上的 spark 数据集进行情绪分析。这是我的代码:
DocumentAssembler docAssembler = (DocumentAssembler) new DocumentAssembler().setInputCol("column1")
.setOutputCol("document");
Tokenizer tokenizer = (Tokenizer) ((Tokenizer) new Tokenizer().setInputCols(new String[] { "document" }))
.setOutputCol("token");
String[] inputCols = new String[] { "token", "document" };
SentimentDetector sentiment = ((SentimentDetector) ((SentimentDetector) new SentimentDetector().setInputCols(inputCols)).setOutputCol("sentiment"));
Pipeline pipeline = new Pipeline().setStages(new PipelineStage[] { docAssembler, tokenizer, sentiment });
// Fit the pipeline to training documents.
PipelineModel pipelineFit = pipeline.fit(ds);
ds = pipelineFit.transform(ds);
ds.show();
此处 ds
是 Dataset<Row>
,列包括列 column1
。我收到以下错误。
java.util.NoSuchElementException: Failed to find a default value for dictionary
at org.apache.spark.ml.param.Params$$anonfun$getOrDefault.apply(params.scala:780)
at org.apache.spark.ml.param.Params$$anonfun$getOrDefault.apply(params.scala:780)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.ml.param.Params$class.getOrDefault(params.scala:779)
at org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:42)
at org.apache.spark.ml.param.Params$class.$(params.scala:786)
at org.apache.spark.ml.PipelineStage.$(Pipeline.scala:42)
at com.johnsnowlabs.nlp.annotators.sda.pragmatic.SentimentDetector.train(SentimentDetector.scala:62)
at com.johnsnowlabs.nlp.annotators.sda.pragmatic.SentimentDetector.train(SentimentDetector.scala:12)
at com.johnsnowlabs.nlp.AnnotatorApproach.fit(AnnotatorApproach.scala:45)
at org.apache.spark.ml.Pipeline$$anonfun$fit.apply(Pipeline.scala:153)
at org.apache.spark.ml.Pipeline$$anonfun$fit.apply(Pipeline.scala:149)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableViewLike$Transformed$class.foreach(IterableViewLike.scala:44)
at scala.collection.SeqViewLike$AbstractTransformed.foreach(SeqViewLike.scala:37)
at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:149)
我已经查看了示例,但我无法找到任何明确的 example/documentation 使用默认模型在 java 中进行情绪分析。
所以我终于明白了。最终代码:
DocumentAssembler docAssembler = (DocumentAssembler) new DocumentAssembler().setInputCol("column1")
.setOutputCol("document");
Tokenizer tokenizer = (Tokenizer) ((Tokenizer) new Tokenizer().setInputCols(new String[] { "document" }))
.setOutputCol("token");
String[] inputCols = new String[] { "token", "document" };
ViveknSentimentModel sentiment = (ViveknSentimentModel) ViveknSentimentModel
.load("/path/to/pretained model folder");
Pipeline pipeline = new Pipeline().setStages(new PipelineStage[] { docAssembler, tokenizer, sentiment });
// Fit the pipeline to training documents.
PipelineModel pipelineFit = pipeline.fit(ds);
ds = pipelineFit.transform(ds);
可以从 here 下载模型。
我想使用 SparkNLP 使用默认训练模型对列 column1
上的 spark 数据集进行情绪分析。这是我的代码:
DocumentAssembler docAssembler = (DocumentAssembler) new DocumentAssembler().setInputCol("column1")
.setOutputCol("document");
Tokenizer tokenizer = (Tokenizer) ((Tokenizer) new Tokenizer().setInputCols(new String[] { "document" }))
.setOutputCol("token");
String[] inputCols = new String[] { "token", "document" };
SentimentDetector sentiment = ((SentimentDetector) ((SentimentDetector) new SentimentDetector().setInputCols(inputCols)).setOutputCol("sentiment"));
Pipeline pipeline = new Pipeline().setStages(new PipelineStage[] { docAssembler, tokenizer, sentiment });
// Fit the pipeline to training documents.
PipelineModel pipelineFit = pipeline.fit(ds);
ds = pipelineFit.transform(ds);
ds.show();
此处 ds
是 Dataset<Row>
,列包括列 column1
。我收到以下错误。
java.util.NoSuchElementException: Failed to find a default value for dictionary
at org.apache.spark.ml.param.Params$$anonfun$getOrDefault.apply(params.scala:780)
at org.apache.spark.ml.param.Params$$anonfun$getOrDefault.apply(params.scala:780)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.ml.param.Params$class.getOrDefault(params.scala:779)
at org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:42)
at org.apache.spark.ml.param.Params$class.$(params.scala:786)
at org.apache.spark.ml.PipelineStage.$(Pipeline.scala:42)
at com.johnsnowlabs.nlp.annotators.sda.pragmatic.SentimentDetector.train(SentimentDetector.scala:62)
at com.johnsnowlabs.nlp.annotators.sda.pragmatic.SentimentDetector.train(SentimentDetector.scala:12)
at com.johnsnowlabs.nlp.AnnotatorApproach.fit(AnnotatorApproach.scala:45)
at org.apache.spark.ml.Pipeline$$anonfun$fit.apply(Pipeline.scala:153)
at org.apache.spark.ml.Pipeline$$anonfun$fit.apply(Pipeline.scala:149)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableViewLike$Transformed$class.foreach(IterableViewLike.scala:44)
at scala.collection.SeqViewLike$AbstractTransformed.foreach(SeqViewLike.scala:37)
at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:149)
我已经查看了示例,但我无法找到任何明确的 example/documentation 使用默认模型在 java 中进行情绪分析。
所以我终于明白了。最终代码:
DocumentAssembler docAssembler = (DocumentAssembler) new DocumentAssembler().setInputCol("column1")
.setOutputCol("document");
Tokenizer tokenizer = (Tokenizer) ((Tokenizer) new Tokenizer().setInputCols(new String[] { "document" }))
.setOutputCol("token");
String[] inputCols = new String[] { "token", "document" };
ViveknSentimentModel sentiment = (ViveknSentimentModel) ViveknSentimentModel
.load("/path/to/pretained model folder");
Pipeline pipeline = new Pipeline().setStages(new PipelineStage[] { docAssembler, tokenizer, sentiment });
// Fit the pipeline to training documents.
PipelineModel pipelineFit = pipeline.fit(ds);
ds = pipelineFit.transform(ds);
可以从 here 下载模型。