DL4J:如何计算从getWordVectorsMean获得的INDArray之间的余弦相似度
DL4J: How to calculate Cosine Similarity between INDArray obtained from getWordVectorsMean
我计算了两个句子的 VectorMean 是这样的:
String demoString1 = "Enter first label";
String demoString2 = "Enter first name";
Collection<String> label1 = Splitter.on(' ').splitToList(demoString1);
Collection<String> label2 = Splitter.on(' ').splitToList(demoString2);
System.out.println("label1:==>"+label1);
System.out.println("getWordVectorMatrix->INDArray------------------"+vectors.getWordVectorsMean(label1));
System.out.println("label2:==>"+label2);
System.out.println("getWordVectorMatrix->INDArray------------------"+vectors.getWordVectorsMean(label2));
输出:
label1:==>[Enter, first, label]
getWordVectorMatrix->INDArray------------------[0.02, -0.14, 0.07, -0.10,.............100 dimension vector]
label2:==>[Enter, first, name]
getWordVectorMatrix->INDArray------------------[-0.00, -0.15, 0.07, -0.13,............100 dimension vector]
现在如何使用均值计算两个句子之间的相似度(余弦相似度)?
我搜索过,但我找不到任何 API 可用的 DL4J。
方法:
public static double cosineSimForSentence(Word2Vec vector, String sentence1, String sentence2){
Collection<String> label1 = Splitter.on(' ').splitToList(sentence1);
Collection<String> label2 = Splitter.on(' ').splitToList(sentence2);
try{
return Transforms.cosineSim(vector.getWordVectorsMean(label1), vector.getWordVectorsMean(label2));
}catch(Exception e){
exceptionMessage = e.getMessage();
}
return Transforms.cosineSim(vector.getWordVectorsMean(label1), vector.getWordVectorsMean(label2));
}
方法调用:
System.out.println("Similarity Score between: "+demoString1+" --vs-- "+ demoString2 +":==>"+ cosineSimForSentence(vectors, demoString1, demoString2));
我计算了两个句子的 VectorMean 是这样的:
String demoString1 = "Enter first label";
String demoString2 = "Enter first name";
Collection<String> label1 = Splitter.on(' ').splitToList(demoString1);
Collection<String> label2 = Splitter.on(' ').splitToList(demoString2);
System.out.println("label1:==>"+label1);
System.out.println("getWordVectorMatrix->INDArray------------------"+vectors.getWordVectorsMean(label1));
System.out.println("label2:==>"+label2);
System.out.println("getWordVectorMatrix->INDArray------------------"+vectors.getWordVectorsMean(label2));
输出:
label1:==>[Enter, first, label]
getWordVectorMatrix->INDArray------------------[0.02, -0.14, 0.07, -0.10,.............100 dimension vector]
label2:==>[Enter, first, name]
getWordVectorMatrix->INDArray------------------[-0.00, -0.15, 0.07, -0.13,............100 dimension vector]
现在如何使用均值计算两个句子之间的相似度(余弦相似度)? 我搜索过,但我找不到任何 API 可用的 DL4J。
方法:
public static double cosineSimForSentence(Word2Vec vector, String sentence1, String sentence2){
Collection<String> label1 = Splitter.on(' ').splitToList(sentence1);
Collection<String> label2 = Splitter.on(' ').splitToList(sentence2);
try{
return Transforms.cosineSim(vector.getWordVectorsMean(label1), vector.getWordVectorsMean(label2));
}catch(Exception e){
exceptionMessage = e.getMessage();
}
return Transforms.cosineSim(vector.getWordVectorsMean(label1), vector.getWordVectorsMean(label2));
}
方法调用:
System.out.println("Similarity Score between: "+demoString1+" --vs-- "+ demoString2 +":==>"+ cosineSimForSentence(vectors, demoString1, demoString2));