如何使用 Perl 的 Text::Aspell 来拼写检查文本?
How to use Perl's Text::Aspell to spellcheck a text?
我想为我的 Perl 程序添加拼写检查。看起来 Text::Aspell 应该可以满足我的需要,但它只提供了一个检查单个单词的功能。
use strict;
use warnings;
use Text::Aspell;
my $input = "This doesn't look too bad. Me&you. with/without. 1..2..3..go!";
my $aspell = Text::Aspell->new();
$aspell->set_option('lang', 'en');
print "$input: ", $aspell->check($input), "\n";
这会打印:
This doesn't look too bad. Me&you. with/without. 1..2..3..go!: 0
这么明显它只需要一个单词,那我怎么把一段文字分成单词呢?一个简单的 split
白色 space:
foreach my $word (split /\s/, $input) {
next unless($word =~ /\w/);
print "$word: ", $aspell->check($word), "\n";
}
没有白色的标点符号会出现问题space:
This: 1
doesn't: 1
look: 1
too: 1
bad.: 0
Me&you.: 0
with/without.: 0
1..2..3..go!: 0
我想我可以提一下标点符号:
foreach my $word (split qr{[,.;!:\s#"\?&%@\(\)\[\]/\d]}, $input) {
next unless($word =~ /\w/);
print "$word: ", $aspell->check($word), "\n";
}
这得到了合理的输出:
This: 1
doesn't: 1
look: 1
too: 1
bad: 1
Me: 1
you: 1
with: 1
without: 1
go: 1
但看起来很笨拙,我想知道是否有更简单的方法(对我来说代码更少,不那么脆弱)。
如何对文本进行拼写检查?
以下代码片段使用不包含字母的正则表达式和 '
将句子拆分为单词。
您可以扩展正则表达式您的心愿。
use strict;
use warnings;
use Text::Aspell;
my $regex = qr/[^'a-z]+/i;
my $input = "This doesn't look too bad. Me&you. with/without. 1..2..3..go!";
my $aspell = Text::Aspell->new();
$aspell->set_option('lang', 'en');
printf "%12s: %d\n", $_, $aspell->check($_) for split($regex, $input);
输出
This: 1
doesn't: 1
look: 1
too: 1
bad: 1
Me: 1
you: 1
with: 1
without: 1
go: 1
Text::Aspell
没有检查整个字符串的选项,而是只检查单个单词。与其自己拆分字符串,我建议使用已经为您拆分的模块,例如 Text::SpellChecker
。例如:
use strict;
use warnings;
use Text::SpellChecker;
use feature 'say';
my $input = "This doesn't look too bad. Me&you. with/without. 1..2..3..go!";
my $checker = Text::SpellChecker->new(text => $input);
$checker->set_options(aspell => { 'lang' => 'en' });
while (my $word = $checker->next_word) {
say "Invalid word: $word";
}
或者,
my $checker = Text::SpellChecker->new(text => $input);
$checker->set_options(aspell => { 'lang' => 'en' });
if ($checker->next_word) {
say "The string is not valid.";
} else {
say "The string is valid.";
}
该模块的 documentation 展示了如何以交互方式替换错误词:
while (my $word = $checker->next_word) {
print $checker->highlighted_text,
"\n",
"$word : ",
(join "\t", @{$checker->suggestions}),
"\nChoose a new word : ";
chomp (my $new_word = <STDIN>);
$checker->replace(new_word => $new_word) if $new_word;
}
如果您想自己单独检查输入字符串的每个单词,您可以查看 Text::SpellCheck
如何将字符串拆分为单词(这是由 next_word
函数完成的)。它使用以下正则表达式:
while ($self->{text} =~ m/\b(\p{L}+(?:'\p{L}+)?)/g) {
...
}
我想为我的 Perl 程序添加拼写检查。看起来 Text::Aspell 应该可以满足我的需要,但它只提供了一个检查单个单词的功能。
use strict;
use warnings;
use Text::Aspell;
my $input = "This doesn't look too bad. Me&you. with/without. 1..2..3..go!";
my $aspell = Text::Aspell->new();
$aspell->set_option('lang', 'en');
print "$input: ", $aspell->check($input), "\n";
这会打印:
This doesn't look too bad. Me&you. with/without. 1..2..3..go!: 0
这么明显它只需要一个单词,那我怎么把一段文字分成单词呢?一个简单的 split
白色 space:
foreach my $word (split /\s/, $input) {
next unless($word =~ /\w/);
print "$word: ", $aspell->check($word), "\n";
}
没有白色的标点符号会出现问题space:
This: 1
doesn't: 1
look: 1
too: 1
bad.: 0
Me&you.: 0
with/without.: 0
1..2..3..go!: 0
我想我可以提一下标点符号:
foreach my $word (split qr{[,.;!:\s#"\?&%@\(\)\[\]/\d]}, $input) {
next unless($word =~ /\w/);
print "$word: ", $aspell->check($word), "\n";
}
这得到了合理的输出:
This: 1
doesn't: 1
look: 1
too: 1
bad: 1
Me: 1
you: 1
with: 1
without: 1
go: 1
但看起来很笨拙,我想知道是否有更简单的方法(对我来说代码更少,不那么脆弱)。
如何对文本进行拼写检查?
以下代码片段使用不包含字母的正则表达式和 '
将句子拆分为单词。
您可以扩展正则表达式您的心愿。
use strict;
use warnings;
use Text::Aspell;
my $regex = qr/[^'a-z]+/i;
my $input = "This doesn't look too bad. Me&you. with/without. 1..2..3..go!";
my $aspell = Text::Aspell->new();
$aspell->set_option('lang', 'en');
printf "%12s: %d\n", $_, $aspell->check($_) for split($regex, $input);
输出
This: 1
doesn't: 1
look: 1
too: 1
bad: 1
Me: 1
you: 1
with: 1
without: 1
go: 1
Text::Aspell
没有检查整个字符串的选项,而是只检查单个单词。与其自己拆分字符串,我建议使用已经为您拆分的模块,例如 Text::SpellChecker
。例如:
use strict;
use warnings;
use Text::SpellChecker;
use feature 'say';
my $input = "This doesn't look too bad. Me&you. with/without. 1..2..3..go!";
my $checker = Text::SpellChecker->new(text => $input);
$checker->set_options(aspell => { 'lang' => 'en' });
while (my $word = $checker->next_word) {
say "Invalid word: $word";
}
或者,
my $checker = Text::SpellChecker->new(text => $input);
$checker->set_options(aspell => { 'lang' => 'en' });
if ($checker->next_word) {
say "The string is not valid.";
} else {
say "The string is valid.";
}
该模块的 documentation 展示了如何以交互方式替换错误词:
while (my $word = $checker->next_word) {
print $checker->highlighted_text,
"\n",
"$word : ",
(join "\t", @{$checker->suggestions}),
"\nChoose a new word : ";
chomp (my $new_word = <STDIN>);
$checker->replace(new_word => $new_word) if $new_word;
}
如果您想自己单独检查输入字符串的每个单词,您可以查看 Text::SpellCheck
如何将字符串拆分为单词(这是由 next_word
函数完成的)。它使用以下正则表达式:
while ($self->{text} =~ m/\b(\p{L}+(?:'\p{L}+)?)/g) {
...
}