从 CSV 文件创建 GATE 文档
create a GATE document from a CSV file
我需要转换结构如下的 csv 文档:
i love iphone \t positive
i hate iphone \t negative
到包含相关 class 的门文档:
最好的方法是什么?开玩笑,groovy ?
可能不是更简单的答案,但它适用于此 perl 脚本:
use strict;
use locale;
use HTML::Entities;
open (IN,$ARGV[0])
or die "file doesn't exist ! : $!\n";
my $i = 0;
while (my $form = <FICHIER>) {
if ($form =~ /^((.+)\t(.+))$/)
{
my $file = "tweet_".$i.".xml";
# Use the open() function to create the file.
unless(open FILE, '>'.$file) {
# Die with error message
# if we can't open it.
die "nUnable to create $file";
}
my $sentence =;
my $encoded_sent = encode_entities($sentence);
my $class = ;
my $length_sent = length($sentence);
##head xml
print FILE "<?xml version='1.0' encoding='UTF-8'?>"."\n";
print FILE '<GateDocument version="3">'."\n";
print FILE '<GateDocumentFeatures>'."\n";
print FILE '<Feature>'."\n";
print FILE '<Name className="java.lang.String">gate.SourceURL</Name>'."\n";
print FILE '<Value className="java.lang.String">created from String</Value>'."\n";
print FILE '</Feature>'."\n";
print FILE '</GateDocumentFeatures>'."\n";
##create xml for each line -- here is the content
print FILE '<TextWithNodes><Node id="0"/>'.$encoded_sent.'<Node id="'.$length_sent.'"/></TextWithNodes>'."\n";
print FILE '<AnnotationSet Name="Key">'."\n";
print FILE '<Annotation Id="1" Type="Tweet" StartNode="0" EndNode="'.$length_sent.'">'."\n";
print FILE '<Feature>'."\n";
print FILE '<Name className="java.lang.String">class</Name>'."\n";
print FILE '<Value className="java.lang.String">'.$class.'</Value>'."\n";
print FILE '</Feature>'."\n";
print FILE '</Annotation>'."\n";
print FILE '</AnnotationSet>'."\n";
##end of the document
print FILE '</GateDocument>'."\n";
$i++;
}
close FILE;
}
close IN;
基本上你必须处理 CSV 和 GATE 文档。如果您在 CPAN 上搜索,您会找到可以轻松处理这些类型文档的模块。
因此您可以使用Text::CSV从CSV文件中获取文本,并使用NLP::GATE::Document模块的setText
、setAnnotationSet
方法来创建、设置文本和注释GATE 文档。
尝试一下,如果您遇到任何问题,请使用您目前为实现目标而尝试过的代码再次询问。
我需要转换结构如下的 csv 文档:
i love iphone \t positive
i hate iphone \t negative
到包含相关 class 的门文档:
最好的方法是什么?开玩笑,groovy ?
可能不是更简单的答案,但它适用于此 perl 脚本:
use strict;
use locale;
use HTML::Entities;
open (IN,$ARGV[0])
or die "file doesn't exist ! : $!\n";
my $i = 0;
while (my $form = <FICHIER>) {
if ($form =~ /^((.+)\t(.+))$/)
{
my $file = "tweet_".$i.".xml";
# Use the open() function to create the file.
unless(open FILE, '>'.$file) {
# Die with error message
# if we can't open it.
die "nUnable to create $file";
}
my $sentence =;
my $encoded_sent = encode_entities($sentence);
my $class = ;
my $length_sent = length($sentence);
##head xml
print FILE "<?xml version='1.0' encoding='UTF-8'?>"."\n";
print FILE '<GateDocument version="3">'."\n";
print FILE '<GateDocumentFeatures>'."\n";
print FILE '<Feature>'."\n";
print FILE '<Name className="java.lang.String">gate.SourceURL</Name>'."\n";
print FILE '<Value className="java.lang.String">created from String</Value>'."\n";
print FILE '</Feature>'."\n";
print FILE '</GateDocumentFeatures>'."\n";
##create xml for each line -- here is the content
print FILE '<TextWithNodes><Node id="0"/>'.$encoded_sent.'<Node id="'.$length_sent.'"/></TextWithNodes>'."\n";
print FILE '<AnnotationSet Name="Key">'."\n";
print FILE '<Annotation Id="1" Type="Tweet" StartNode="0" EndNode="'.$length_sent.'">'."\n";
print FILE '<Feature>'."\n";
print FILE '<Name className="java.lang.String">class</Name>'."\n";
print FILE '<Value className="java.lang.String">'.$class.'</Value>'."\n";
print FILE '</Feature>'."\n";
print FILE '</Annotation>'."\n";
print FILE '</AnnotationSet>'."\n";
##end of the document
print FILE '</GateDocument>'."\n";
$i++;
}
close FILE;
}
close IN;
基本上你必须处理 CSV 和 GATE 文档。如果您在 CPAN 上搜索,您会找到可以轻松处理这些类型文档的模块。
因此您可以使用Text::CSV从CSV文件中获取文本,并使用NLP::GATE::Document模块的setText
、setAnnotationSet
方法来创建、设置文本和注释GATE 文档。
尝试一下,如果您遇到任何问题,请使用您目前为实现目标而尝试过的代码再次询问。