将匹配的模式转换为小写 AWK

Question

我想将匹配的模式转换为小写，我正在使用以下 awk 代码，但它正在执行替换，但也在替换词后添加换行符

awk 'BEGIN{ FS = "[&]";RS = ";";  };{  = tolower() }{print [=10=]}' test.txt

测试文件内容为：

This is test file &AMP; replacing &APOS; PATTERN

我得到的输出是：

    This is test file &amp; 
    replacing &apos; 
    PATTERN

Answer 1

正如 Ed Morton 所指出的，这 已损坏。

~~您要确保 OFS 和 ORS 分别与 FS 和 RS 相同。~~

当你修改RS时，awk会改变它的读行为；但除非您也更改 ORS，否则写入行为将保持默认，即打印换行符作为记录分隔符。

正如 Ed Morton 指出的那样，您还需要将 FS 更改为仅单个字符 & 才能让您的程序运行。但是修复后，我得到了预期的输出。

vnix$ awk 'BEGIN{ OFS = FS = "&"; ORS = RS = ";"; };{ = tolower() }{print [=10=]}' <<':' > This is test file &AMP; replacing &APOS; PATTERN > : This is test file & replacing ' PATTERN &;

~~感谢一个聪明的解决方案，它已经达到了 95%。~~

Answer 2

也许这更符合您的需求

awk '{for(i=1;i<=NF;i++) if("&"==substr($i,1,1)) $i=tolower($i)}1'

将所有以 & 符号开头的单词转换为小写。

或者，如果您想为匹配项指定第一个字符和最后一个字符

 awk '{for(i=1;i<=NF;i++) if(match($i,"&.*;")) $i=tolower($i)}1'

f

Answer 3

我没有看到一个简单的单行代码来实现这一点。也许是一个简短的脚本：

{
  while (match([=10=], /&[A-Z]+;/)) {
    tag=substr([=10=],match([=10=],/&[A-Z]+;/)+1); tag=substr(tag,0,index(tag,";"));
    [=10=]=substr([=10=],0,match([=10=],/&[A-Z]+;/)) tolower(tag) substr([=10=],match([=10=],/&[A-Z]+;/)+length(tag)+1);
  }
}

1

此步骤遍历输入的每一行以搜索大写标记，对于它找到的每个标记，使用一组 substr() 函数替换该行。

一个测试：

$ echo "This is test file &AMP;   replacing &APOS; PATTERN" | gawk -f ~/doit.awk
This is test file &amp;   replacing &apos; PATTERN

如果你想运行这个独立的，你可以在顶部放一个"shebang"。它可以在 gawk 或 BSD awk 中运行，因此它应该适用于大多数操作系统。

Answer 4

这真是sed的工作：

$ sed -r 's/&[^;]+/\L&/g' file
This is test file &amp; replacing &apos; PATTERN

如果它必须是可移植的 awk 那么它会是：

$ awk '{rec=""; while(match([=11=],/&[^;]+/)) { rec = rec substr([=11=],1,RSTART-1) tolower(substr([=11=],RSTART,RLENGTH)); [=11=]=substr([=11=],RSTART+RLENGTH)} print rec [=11=]}' file
This is test file &amp; replacing &apos; PATTERN

将匹配的模式转换为小写 AWK

Convert matched pattern to lower case AWK

awk

ksh