使用perl脚本从字符串中删除特殊字符

Removal of special characters from string using perl script

我有如下字符串

stringinput = Sweééééôden@

我想得到像

这样的输出
stringoutput = Sweden

必须删除 spl 字符 ééééô@

正在使用

$stringoutput = `echo $stringinput | sed 's/[^a-z  A-Z 0-9]//g'`;

我得到类似 Sweééééôden 的结果,但 ééééô 没有被删除。

你能给我一些建议吗

不需要从 Perl 调用 sed,perl 可以自己进行替换。它也更快,因为您不需要启动新进程。

#!/usr/bin/perl
use warnings;
use strict;
use utf8;

my $string = 'Sweééééôden@';
$string =~ s/[^A-Za-z0-9]//g;
print $string;

您需要在 sed 命令之前使用 LC_ALL=C 使 [A-Za-z] 字符 class 根据 ASCII table:

创建范围
stringoutput=$(echo $stringinput | LC_ALL=C sed 's/[^A-Za-z0-9]//g')

参见 online demo:

stringinput='Sweééééôden@';
stringoutput=$(echo $stringinput | LC_ALL=C sed 's/[^A-Za-z0-9]//g');
echo "$stringoutput";
# => Sweden

POSIX regex reference:

In the default C locale, the sorting sequence is the native character order; for example, ‘[a-d]’ is equivalent to ‘[abcd]’. In other locales, the sorting sequence is not specified, and ‘[a-d]’ might be equivalent to ‘[abcd]’ or to ‘[aBbCcDd]’, or it might fail to match any character, or the set of characters that it matches might even be erratic. To obtain the traditional interpretation of bracket expressions, you can use the ‘C’ locale by setting the LC_ALL environment variable to the value ‘C’.

在 Perl 中,您可以简单地使用

my $stringinput = 'Sweééééôden@';
my $stringoutput = $stringinput =~ s/[^A-Za-z0-9]+//gr;
print $stringoutput;

参见 this online demo