如何删除 MySQL 的 utf8 字符集不支持的字符?
How can I remove characters that are not supported by MySQL's utf8 character set?
如何从 MySQL 的 utf8 character set? In other words, characters with four bytes, such as "", that are only supported by MySQL's utf8mb4 character set.
不支持的字符串中删除字符
例如,
C = -2.4‰ ± 0.3‰; H = -57‰
应该变成
C = -2.4‰ ± 0.3‰; H = -57‰
我想将数据文件加载到具有 CHARSET=utf8
.
的 MySQL table
MySQL的utf8mb4
编码就是世人所说的UTF-8
.
MySQL 的 utf8
编码是 UTF-8
的一个子集,它只支持 BMP 中的字符(意思是字符 U+0000 到 U+FFFF) .
因此,以下将匹配有问题的不受支持的字符:
/[^\N{U+0000}-\N{U+FFFF}]/
您可以使用以下三种不同的技术来清理输入:
1:删除不支持的字符:
s/[^\N{U+0000}-\N{U+FFFF}]//g;
2:用U+FFFD替换不支持的字符:
s/[^\N{U+0000}-\N{U+FFFF}]/\N{REPLACEMENT CHARACTER}/g;
3:使用翻译映射替换不支持的字符:
my %translations = (
"\N{MATHEMATICAL ITALIC SMALL EPSILON}" => "\N{GREEK SMALL LETTER EPSILON}",
# ...
);
s{([^\N{U+0000}-\N{U+FFFF}])}{ $translations{} // "\N{REPLACEMENT CHARACTER}" }eg;
例如,
use utf8; # Source code is encoded using UTF-8
use open ':std', ':encoding(UTF-8)'; # Terminal and files use UTF-8.
use strict;
use warnings;
use 5.010; # say, //
use charnames ':full'; # Not needed in 5.16+
my %translations = (
"\N{MATHEMATICAL ITALIC SMALL EPSILON}" => "\N{GREEK SMALL LETTER EPSILON}",
# ...
);
$_ = "C = -2.4‰ ± 0.3‰; H = -57‰";
say;
s{([^\N{U+0000}-\N{U+FFFF}])}{ $translations{} // "\N{REPLACEMENT CHARACTER}" }eg;
say;
输出:
C = -2.4‰ ± 0.3‰; H = -57‰
εC = -2.4‰ ± 0.3‰; εH = -57‰
如何从 MySQL 的 utf8 character set? In other words, characters with four bytes, such as "", that are only supported by MySQL's utf8mb4 character set.
不支持的字符串中删除字符例如,
C = -2.4‰ ± 0.3‰; H = -57‰
应该变成
C = -2.4‰ ± 0.3‰; H = -57‰
我想将数据文件加载到具有 CHARSET=utf8
.
MySQL的utf8mb4
编码就是世人所说的UTF-8
.
MySQL 的 utf8
编码是 UTF-8
的一个子集,它只支持 BMP 中的字符(意思是字符 U+0000 到 U+FFFF) .
因此,以下将匹配有问题的不受支持的字符:
/[^\N{U+0000}-\N{U+FFFF}]/
您可以使用以下三种不同的技术来清理输入:
1:删除不支持的字符:
s/[^\N{U+0000}-\N{U+FFFF}]//g;
2:用U+FFFD替换不支持的字符:
s/[^\N{U+0000}-\N{U+FFFF}]/\N{REPLACEMENT CHARACTER}/g;
3:使用翻译映射替换不支持的字符:
my %translations = (
"\N{MATHEMATICAL ITALIC SMALL EPSILON}" => "\N{GREEK SMALL LETTER EPSILON}",
# ...
);
s{([^\N{U+0000}-\N{U+FFFF}])}{ $translations{} // "\N{REPLACEMENT CHARACTER}" }eg;
例如,
use utf8; # Source code is encoded using UTF-8
use open ':std', ':encoding(UTF-8)'; # Terminal and files use UTF-8.
use strict;
use warnings;
use 5.010; # say, //
use charnames ':full'; # Not needed in 5.16+
my %translations = (
"\N{MATHEMATICAL ITALIC SMALL EPSILON}" => "\N{GREEK SMALL LETTER EPSILON}",
# ...
);
$_ = "C = -2.4‰ ± 0.3‰; H = -57‰";
say;
s{([^\N{U+0000}-\N{U+FFFF}])}{ $translations{} // "\N{REPLACEMENT CHARACTER}" }eg;
say;
输出:
C = -2.4‰ ± 0.3‰; H = -57‰
εC = -2.4‰ ± 0.3‰; εH = -57‰