在perl中提取单词偏移量时出错
Error when extracting words offset in perl
我有一个读取两个文件的程序,第一个文件包含用分号 (;) 分隔的术语(一个或多个术语),第二个文件包含文本,目标是确定术语在第一个文件!
我的程序开始很好地波动真空(正确的偏移量提取 2 20,对于量子场也是 45 59),但是当提取偏移量时例如术语 核物理 (正确的偏移量396 411)我的代码生成 399 414!或费米子场(我的代码生成 138 154)但正确的是 135 151
使用的代码是:
#!/usr/bin/perl
use strict;use warnings;
my @a = ();
my @b = ();
my @aa = ();
my $l=0;
my $v=1;
my $g=0;
my $kh;
my $ligne2;
my $texte;
open(FICC, $ARGV[0]);
print "choose the name of the file\n";
my $fic = <STDIN>;
open(FIC1, ">$fic");
while (<FICC>) {
my $ligne2=$_;
$a[$l]=$ligne2;
$l++;
}
my $aa;
my $ligne;
my $rep = "C:\scienceie2017_train\train2";
opendir(REP,$rep) or die "E/S : $!\n";
foreach my $kh (@a) {
chomp($kh);
if ($kh=~/.*\.txt/) {
$texte=$kh;
#print "$kh";
print FIC1 "$texte";
}
@aa=split(/;/,$kh);
#$u++;
while(defined(my $fic=readdir REP)){
my $f="${rep}\$texte";
open FIC, "$f" or warn "$f E/S: $!\n";
while(<FIC>){
$ligne = $_;
chomp($ligne);
#print FIC1 "@aa";
foreach my $che (@aa) {
$che =~ s/^\s+//;
$che =~ s/\s+$//;
if ($ligne =~/\Q$che\E/) {
print FIC1 "T$v\tTask $-[0] $+[0]\t$che\n";
$v++;
}
}
$v = 1;
}
print FIC1 "\n";
close FIC;
goto che
}
che:
}
正文是:
A fluctuating vacuum is a general feature of quantum fields, of which the free Maxwell field considered in [1–12] is but one example. Fermionic fields such as that describing the electron, also undergo vacuum fluctuations, consequently one expects to find Casimir effects associated with such fields whenever they are confined in some way. Such effects were first investigated in the context of nuclear physics, within the so-called “MIT bag model” of the nucleon [13]. In the bag-model one envisages the nucleon as a collection of fermionic fields describing confined quarks. These quarks are subject to a boundary condition at the surface of the ‘bag’ that represents the nucleon’s surface. Just as in the electromagnetic case, the bag boundary condition modifies the vacuum fluctuations of the field, which results in the appearance of a Casimir force [14–18]. This force, although very weak at a macroscopic scale, can be significant on the small length scales encountered in nuclear physics. It therefore has important consequences for the physics of the bag-model nucleon [19].
提取的术语是:
fluctuating vacuum;general feature;quantum fields;free Maxwell;free Maxwell field;Maxwell;Maxwell field;Maxwell field;Maxwell field;field considered in ;considered in ;1–12;Fermionic fields;vacuum fluctuations;Casimir;Casimir effects;Casimir effects;Casimir effects;such fields;Such effects;nuclear physics;so-called “MIT;so-called “MIT bag;“MIT bag;“MIT bag model”;bag model”;fermionic fields;fermionic fields describing;boundary condition;nucleon’s surface;electromagnetic case;bag boundary;bag boundary condition;boundary condition;vacuum fluctuations;Casimir;Casimir force ;force ;14–18;macroscopic scale;small length;small length scales;length scales;nuclear physics;important consequences;bag-model nucleon ;
我不清楚你的代码,但是当我 运行 你提供的数据和我的代码时,我得到了这些结果。
两个变量@-
和@+
,($-[0]
和$+[0]
)在Variables-related-to-regular-expressions中有描述。 (LAST_MATCH_START & LAST_MATCH_END)
我的代码:
#!/usr/bin/perl
use strict;
use warnings;
my $s = 'A fluctuating vacuum is a general feature ... (rest of line)';
my @terms = split /;/, 'fluctuating vacuum;Fermionic fields;nuclear physics;bag-model nucleon';
for my $term (@terms) {
while ($s =~ /$term/g) {
print "$-[0] - $+[0] $term\n";
}
}
输出:
2 - 20 fluctuating vacuum
135 - 151 Fermionic fields
396 - 411 nuclear physics
983 - 998 nuclear physics
1063 - 1080 bag-model nucleon
#!/usr/bin/perl
$string = "A fluctuating vacuum is a general feature of quantum fields, of which the free Maxwell field considered in [1–12] is but one example. Fermionic fields such as that describing the electron, also undergo vacuum fluctuations, consequently one expects to find Casimir effects associated with such fields whenever they are confined in some way. Such effects were first investigated in the context of nuclear physics, within the so-called “MIT bag model” of the nucleon [13]. In the bag-model one envisages the nucleon as a collection of fermionic fields describing confined quarks. These quarks are subject to a boundary condition at the surface of the ‘bag’ that represents the nucleon’s surface. Just as in the electromagnetic case, the bag boundary condition modifies the vacuum fluctuations of the field, which results in the appearance of a Casimir force [14–18]. This force, although very weak at a macroscopic scale, can be significant on the small length scales encountered in nuclear physics. It therefore has important consequences for the physics of the bag-model nucleon [19].";
@extracted_terms = ( "fluctuating vacuum", "Fermionic fields", "nuclear physics", "bag-model nucleon" );
for my $term ( @extracted_terms )
{
$position = index $string, $term;
printf ( "%s, %s\n", $position, $position + length($term) );
}
您必须以 UTF 8 打开文件
替换
open FIC, "$f" or warn "$f E/S: $!\n";
来自
open FIC, "<:encoding(UTF-8)", "$f" or warn "$f E/S: $!\n";
我有一个读取两个文件的程序,第一个文件包含用分号 (;) 分隔的术语(一个或多个术语),第二个文件包含文本,目标是确定术语在第一个文件!
我的程序开始很好地波动真空(正确的偏移量提取 2 20,对于量子场也是 45 59),但是当提取偏移量时例如术语 核物理 (正确的偏移量396 411)我的代码生成 399 414!或费米子场(我的代码生成 138 154)但正确的是 135 151
使用的代码是:
#!/usr/bin/perl
use strict;use warnings;
my @a = ();
my @b = ();
my @aa = ();
my $l=0;
my $v=1;
my $g=0;
my $kh;
my $ligne2;
my $texte;
open(FICC, $ARGV[0]);
print "choose the name of the file\n";
my $fic = <STDIN>;
open(FIC1, ">$fic");
while (<FICC>) {
my $ligne2=$_;
$a[$l]=$ligne2;
$l++;
}
my $aa;
my $ligne;
my $rep = "C:\scienceie2017_train\train2";
opendir(REP,$rep) or die "E/S : $!\n";
foreach my $kh (@a) {
chomp($kh);
if ($kh=~/.*\.txt/) {
$texte=$kh;
#print "$kh";
print FIC1 "$texte";
}
@aa=split(/;/,$kh);
#$u++;
while(defined(my $fic=readdir REP)){
my $f="${rep}\$texte";
open FIC, "$f" or warn "$f E/S: $!\n";
while(<FIC>){
$ligne = $_;
chomp($ligne);
#print FIC1 "@aa";
foreach my $che (@aa) {
$che =~ s/^\s+//;
$che =~ s/\s+$//;
if ($ligne =~/\Q$che\E/) {
print FIC1 "T$v\tTask $-[0] $+[0]\t$che\n";
$v++;
}
}
$v = 1;
}
print FIC1 "\n";
close FIC;
goto che
}
che:
}
正文是:
A fluctuating vacuum is a general feature of quantum fields, of which the free Maxwell field considered in [1–12] is but one example. Fermionic fields such as that describing the electron, also undergo vacuum fluctuations, consequently one expects to find Casimir effects associated with such fields whenever they are confined in some way. Such effects were first investigated in the context of nuclear physics, within the so-called “MIT bag model” of the nucleon [13]. In the bag-model one envisages the nucleon as a collection of fermionic fields describing confined quarks. These quarks are subject to a boundary condition at the surface of the ‘bag’ that represents the nucleon’s surface. Just as in the electromagnetic case, the bag boundary condition modifies the vacuum fluctuations of the field, which results in the appearance of a Casimir force [14–18]. This force, although very weak at a macroscopic scale, can be significant on the small length scales encountered in nuclear physics. It therefore has important consequences for the physics of the bag-model nucleon [19].
提取的术语是:
fluctuating vacuum;general feature;quantum fields;free Maxwell;free Maxwell field;Maxwell;Maxwell field;Maxwell field;Maxwell field;field considered in ;considered in ;1–12;Fermionic fields;vacuum fluctuations;Casimir;Casimir effects;Casimir effects;Casimir effects;such fields;Such effects;nuclear physics;so-called “MIT;so-called “MIT bag;“MIT bag;“MIT bag model”;bag model”;fermionic fields;fermionic fields describing;boundary condition;nucleon’s surface;electromagnetic case;bag boundary;bag boundary condition;boundary condition;vacuum fluctuations;Casimir;Casimir force ;force ;14–18;macroscopic scale;small length;small length scales;length scales;nuclear physics;important consequences;bag-model nucleon ;
我不清楚你的代码,但是当我 运行 你提供的数据和我的代码时,我得到了这些结果。
两个变量@-
和@+
,($-[0]
和$+[0]
)在Variables-related-to-regular-expressions中有描述。 (LAST_MATCH_START & LAST_MATCH_END)
我的代码:
#!/usr/bin/perl
use strict;
use warnings;
my $s = 'A fluctuating vacuum is a general feature ... (rest of line)';
my @terms = split /;/, 'fluctuating vacuum;Fermionic fields;nuclear physics;bag-model nucleon';
for my $term (@terms) {
while ($s =~ /$term/g) {
print "$-[0] - $+[0] $term\n";
}
}
输出:
2 - 20 fluctuating vacuum
135 - 151 Fermionic fields
396 - 411 nuclear physics
983 - 998 nuclear physics
1063 - 1080 bag-model nucleon
#!/usr/bin/perl
$string = "A fluctuating vacuum is a general feature of quantum fields, of which the free Maxwell field considered in [1–12] is but one example. Fermionic fields such as that describing the electron, also undergo vacuum fluctuations, consequently one expects to find Casimir effects associated with such fields whenever they are confined in some way. Such effects were first investigated in the context of nuclear physics, within the so-called “MIT bag model” of the nucleon [13]. In the bag-model one envisages the nucleon as a collection of fermionic fields describing confined quarks. These quarks are subject to a boundary condition at the surface of the ‘bag’ that represents the nucleon’s surface. Just as in the electromagnetic case, the bag boundary condition modifies the vacuum fluctuations of the field, which results in the appearance of a Casimir force [14–18]. This force, although very weak at a macroscopic scale, can be significant on the small length scales encountered in nuclear physics. It therefore has important consequences for the physics of the bag-model nucleon [19].";
@extracted_terms = ( "fluctuating vacuum", "Fermionic fields", "nuclear physics", "bag-model nucleon" );
for my $term ( @extracted_terms )
{
$position = index $string, $term;
printf ( "%s, %s\n", $position, $position + length($term) );
}
您必须以 UTF 8 打开文件
替换
open FIC, "$f" or warn "$f E/S: $!\n";
来自
open FIC, "<:encoding(UTF-8)", "$f" or warn "$f E/S: $!\n";