从 CSV 文件创建 GATE 文档

create a GATE document from a CSV file

我需要转换结构如下的 csv 文档:

i love iphone \t positive
i hate iphone \t negative

到包含相关 class 的门文档:

最好的方法是什么?开玩笑,groovy ?

可能不是更简单的答案,但它适用于此 perl 脚本:

use strict;
use locale;
use HTML::Entities;

open (IN,$ARGV[0])
    or die "file doesn't exist ! : $!\n";

my $i = 0;

while (my $form = <FICHIER>) {

    if ($form =~ /^((.+)\t(.+))$/)

    {   
        my $file = "tweet_".$i.".xml";
        # Use the open() function to create the file.
        unless(open FILE, '>'.$file) {
        # Die with error message 
        # if we can't open it.
        die "nUnable to create $file";
        }           

        my $sentence =;
        my $encoded_sent = encode_entities($sentence);

        my $class = ;
        my $length_sent = length($sentence);

        ##head xml
        print FILE "<?xml version='1.0' encoding='UTF-8'?>"."\n";
        print FILE '<GateDocument version="3">'."\n";
        print FILE '<GateDocumentFeatures>'."\n";
        print FILE '<Feature>'."\n";
        print FILE '<Name className="java.lang.String">gate.SourceURL</Name>'."\n";
        print FILE '<Value className="java.lang.String">created from String</Value>'."\n";
        print FILE '</Feature>'."\n";
        print FILE '</GateDocumentFeatures>'."\n";

        ##create xml for each line  -- here is the content
        print FILE '<TextWithNodes><Node id="0"/>'.$encoded_sent.'<Node id="'.$length_sent.'"/></TextWithNodes>'."\n";

        print FILE '<AnnotationSet Name="Key">'."\n";
        print FILE '<Annotation Id="1" Type="Tweet" StartNode="0" EndNode="'.$length_sent.'">'."\n";

        print FILE '<Feature>'."\n";
        print FILE '<Name className="java.lang.String">class</Name>'."\n";
        print FILE '<Value className="java.lang.String">'.$class.'</Value>'."\n";
        print FILE '</Feature>'."\n";
        print FILE '</Annotation>'."\n";
        print FILE '</AnnotationSet>'."\n";

        ##end of the document
        print FILE '</GateDocument>'."\n";
        $i++;
    }
    close FILE;
}    
close IN;

基本上你必须处理 CSV 和 GATE 文档。如果您在 CPAN 上搜索,您会找到可以轻松处理这些类型文档的模块。

因此您可以使用Text::CSV从CSV文件中获取文本,并使用NLP::GATE::Document模块的setTextsetAnnotationSet方法来创建、设置文本和注释GATE 文档。

尝试一下,如果您遇到任何问题,请使用您目前为实现目标而尝试过的代码再次询问。