使用 AWK 的随机字母

Question

我正在使用 AIX 服务器中的 AWK 函数创建一个 sh 代码来屏蔽敏感数据。

Input:

CustomerName|somedata|phonenumber|Address
Roly|xyz|1234|London

Output:

CustomerName|somedata|phonenumber|Address
Atrm|xyz|8546|Xdfdtt

注意：CustomerName、Phonenumber、Address 是敏感信息，因此我只更改这些。

我想保留相同的字典值，直到我决定更改字典（用什么替换什么）。

我在 AIX 中的代码。这仅适用于 AIX 系统（我希望如此）

set -A randomalpha
set -A alphabets a b c d e f g h i j k l m n o p q r s t u v w x y z
set -A numbers 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9  0 1 2 3 4 5 6 7 8 9
randomalpha=`cat /dev/urandom| tr -dc 'A-Z'|head -c 26`
randonumber=`cat /dev/urandom| tr -dc '0-9'|head -c 26`

i=0
echo $randomalpha | awk -v ORS="" '{ gsub(/./,"&\n") ; print }' | \
while read char
do
   randomalpha[i]="$char"
   print "$char"
   ((i=i+1))
done

print ${randomalpha[4]}
print ${alphabets[25]}

integer max=1400
integer i=1

while [[ $i -lt $max ]]
do
# $(echo value of is : $i >> $file)
 #   echo value i: $i
  #  (( i = i + 1 ))
b=""
echo "Please enter your name:" 
read name

i=0

echo $name | awk -v ORS=" " '{ gsub(/./,"&\n") ; print }' | \
while read char
do
   name[i]="$char"   

   j=0
    while [ $j -le 36 ] 
    do

    if [[ "$char" == ${alphabets[j]} ]];then
        #print  "matched here:"${alphabets[j]}"matched here:"${randomalpha[j]} 
        b=$b${randomalpha[j]}

        if [ -z "${char}" ];then
             b=$b${randomalpha[j]}" "
        fi

        if [[ "$char" == ${numbers[j]+} ]];then
           print "$char"
        #print  "matched here:"${alphabets[j]}"matched here:"${randomalpha[j]} 
        b=$b${randonumber[j]}

        if [ -z "${numbers}" ];then
             b=$b${randonumber[j]}" "
        fi
        fi

    fi
    ((j=j+1))
    done
   ((i=i+1))
done

print $b

done

我所做的是

我为 A-Z 分配了一个随机字符，然后匹配每个字符的输入值并替换为该字符的字典值。

用简单的术语来说，让我们说

实际值：ABCD 随机生成：XUTY

所以只要它找到 A，它就会被 X 替换。我知道这是一个错误的代码，但我正在尝试这个选项来证明它是可能的。

有人可以阐明使用几行 AWK 以比大代码更简单的方式实现结果吗？

当我处理 20GB 的文件时，这段代码要慢得多

谢谢！

Answer 1

如果我明白你想做什么，这将适用于任何系统上的任何 awk，并且比你的 shell 脚本快运行个数量级：

$ cat tst.awk
function shuffle(oldStr,        newStr,len,array,i,j,t) {
    # logic copied from https://www.rosettacode.org/wiki/Knuth_shuffle#AWK
    # and tweaked to operate on a string as input instead of an array.
    len = length(oldStr)
    for (i=1; i<=len; i++) {
        array[i] = substr(oldStr,i,1)
    }

    for (i = len; i > 1; i--) {
        # j = random integer from 1 to i
        j = int(i * rand()) + 1

        # swap array[i], array[j]
        t = array[i]
        array[i] = array[j]
        array[j] = t
    }

    for (i=1; i<=len; i++) {
        newStr = newStr array[i]
    }
    return newStr
}

BEGIN {
    srand()

    ordrLets = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    randLets = shuffle(ordrLets)
    ordrDigs = "0123456789"
    randDigs = shuffle(ordrDigs)

    ordrChars = ordrLets ordrDigs
    randChars = randLets randDigs

    numChars = length(ordrChars)
    for (charNr=1; charNr<=numChars; charNr++) {
        oldChar = substr(ordrChars,charNr,1)
        newChar = substr(randChars,charNr,1)
        map[tolower(oldChar)] = tolower(newChar)
        map[toupper(oldChar)] = toupper(newChar)
        # Uncomment this to print the mappings to stderr:
        # print oldChar, newChar | "cat>&2"
    }

    split("1 3 4", fldIdx2nr)

    FS = OFS = "|"
}

NR==1 { print; next }
{
    for (fldIdx in fldIdx2nr) {
        fldNr  = fldIdx2nr[fldIdx]
        oldStr = $fldNr
        newStr = ""
        numChars = length(oldStr)
        for (charNr=1; charNr<=numChars; charNr++) {
            oldChar = substr(oldStr,charNr,1)
            newChar = (oldChar in map ? map[oldChar] : oldChar)
            newStr  = newStr newChar
        }
        $fldNr = newStr
    }
    print
}

.

$ cat file
CustomerName|somedata|phonenumber|Address
Roly|xyz|1234|London

$ awk -f tst.awk file
CustomerName|somedata|phonenumber|Address
Tcnj|xyz|2397|Ncoyco

我假设您实际上希望每个字母都映射到一个唯一的字母，这与您的原始代码所做的不同。如果情况并非如此，并且您不介意将多个原始字母映射到同一个最终字母，那么您可以使用对 rand() 的简单调用来创建每个单独的映射，而不是使用 shuffle() 函数的主要算法。

使用 AWK 的随机字母

Random Letters using AWK

unix

bash

shell

aix

awk