如何使用 "if" 条件拆分整数？

Question

这几天我一直在寻找尝试使用 awk、sed、cut 和 tr 来解决我的问题的方法。我有一个用“@”分隔的数据集，如下所示...

1@2@11@11/8@11/8@11@11/2
2@4@31 1/2@31 1/2@31/2@21@21/2
3@10@116 1/4@98@911 3/4@410@38 1/2
4@1@21@21/8@21/8@33@49 1/4
5@11@74@75@67 1/2@511 1/2@511 1/2
6@9@106@108 1/4@89 1/4@613 1/2@616
7@7@96@118 1/4@1313 1/2@715@717 3/4
8@12@127 3/4@129 3/4@1212 1/2@816 1/2@817 3/4
9@6@63@ 64 1/2@79@916 1/2@918
10@13@139 3/4@1311 1/4@1112@1017@1019 3/4
11@3@42@42@43 1/2@1118 1/2@1126 1/4
12@5@84 1/2@87@1011 3/4@1219 1/2@1228 1/4
13@8@52 1/2@53 1/2@57@1324@1332 3/4

我想要做的是将第一列 (运行ks) 中的行号与从第 3 列开始的其他列中的其余整数分开。最终结果看起来像这样...

1@2@1@1@1@1/8@1@1/8@1@1@1@1/2
2@4@3@1 1/2@3@1 1/2@3@1/2@2@1@2@1/2
3@10@11@6 1/4@9@8@9@11 3/4@4@10@3@8 1/2
4@1@2@1@2@1/8@2@1/8@3@3@4@9 1/4
5@11@7@4@7@5@6@7 1/2@5@11 1/2@5@11 1/2
6@9@10@6@10@8 1/4@8@9 1/4@6@13 1/2@6@16
7@7@9@6@11@8 1/4@13@13 1/2@7@15@7@17 3/4
8@12@12@7 3/4@12@9 3/4@12@12 1/2@8@16 1/2@8@17 3/4
9@6@6@3@6@4 1/2@7@9@9@16 1/2@9@18
10@13@13@9 3/4@13@11 1/4@11@12@10@17@10@19 3/4
11@3@4@2@4@2@4@3 1/2@11@18 1/2@11@26 1/4
12@5@8@4 1/2@8@7@10@11 3/4@12@19 1/2@12@28 1/4
13@8@5@2 1/2@5@3 1/2@5@7@13@24@13@32 3/4

我在想我可以使用 "if statement"。像 "if integers start with [2-9] then split after one character, elif it starts with [1] and length is equal to 3 or more (before the space and fraction) then split the firsts two characters." 我不知道如何去解决这个问题。我有成千上万个类似的文件，需要更改所有文件的结构，因此解决方案必须运行通过一个循环。

Answer 1

这里几乎是您在 awk 中描述的逻辑的转录（我添加了假设，从 1 开始并且长度为 2 应该在第一个字符之后拆分）。我还注意到第 9 行在 @ 定界符之后有一个 space，因此将这种可能性添加到字段分隔符中，正如您在 BEGIN 块中看到的那样——也许使用您不知道的真实数据不需要那个，所以只是要注意。我最终得到了您的预期输出，但您可能想在更大的数据集上进行桌面检查，以防有更多用例未被考虑在内。

$ cat jd.awk
BEGIN { FS = " *@ *"; OFS = "@" }

{
    for (i=3; i<=NF; ++i) {
        # if integers start with [2-9] then split after one character
        if (substr($i, 1, 1) ~ /[2-9]/) {
            $i = substr($i, 1, 1) "@" substr($i, 2)
        }
        else {
            split($i, parts, "[ /]")

            # else if it starts with [1] and length is equal to 2
            # (before the space and fraction) then split the first character
            if (substr($i, 1, 1) == "1" && length(parts[1]) == 2) {
                $i = substr($i, 1, 1) "@" substr($i, 2)
            }

            # else if it starts with [1] and length is equal to 3 or more
            # (before the space and fraction) then split the firsts two characters.
            else if (substr($i, 1, 1) == "1" && length(parts[1]) >= 3) {
                $i = substr($i, 1, 2) "@" substr($i, 3)
            }
        }
    }
    print
}

$ cat jd.txt
1@2@11@11/8@11/8@11@11/2
2@4@31 1/2@31 1/2@31/2@21@21/2
3@10@116 1/4@98@911 3/4@410@38 1/2
4@1@21@21/8@21/8@33@49 1/4
5@11@74@75@67 1/2@511 1/2@511 1/2
6@9@106@108 1/4@89 1/4@613 1/2@616
7@7@96@118 1/4@1313 1/2@715@717 3/4
8@12@127 3/4@129 3/4@1212 1/2@816 1/2@817 3/4
9@6@63@ 64 1/2@79@916 1/2@918
10@13@139 3/4@1311 1/4@1112@1017@1019 3/4
11@3@42@42@43 1/2@1118 1/2@1126 1/4
12@5@84 1/2@87@1011 3/4@1219 1/2@1228 1/4
13@8@52 1/2@53 1/2@57@1324@1332 3/4


$ awk -f jd.awk jd.txt 
1@2@1@1@1@1/8@1@1/8@1@1@1@1/2
2@4@3@1 1/2@3@1 1/2@3@1/2@2@1@2@1/2
3@10@11@6 1/4@9@8@9@11 3/4@4@10@3@8 1/2
4@1@2@1@2@1/8@2@1/8@3@3@4@9 1/4
5@11@7@4@7@5@6@7 1/2@5@11 1/2@5@11 1/2
6@9@10@6@10@8 1/4@8@9 1/4@6@13 1/2@6@16
7@7@9@6@11@8 1/4@13@13 1/2@7@15@7@17 3/4
8@12@12@7 3/4@12@9 3/4@12@12 1/2@8@16 1/2@8@17 3/4
9@6@6@3@6@4 1/2@7@9@9@16 1/2@9@18
10@13@13@9 3/4@13@11 1/4@11@12@10@17@10@19 3/4
11@3@4@2@4@2@4@3 1/2@11@18 1/2@11@26 1/4
12@5@8@4 1/2@8@7@10@11 3/4@12@19 1/2@12@28 1/4
13@8@5@2 1/2@5@3 1/2@5@7@13@24@13@32 3/4

Answer 2

这是一个有趣的例子：

perl -F@ -lape '$_ = join "@", shift(@F), shift(@F), map {s/(1\d|\d)(\d+)/\@/g; $_} @F' file

加一点评论

perl -F@ -lape '
    $_ = join "@",                # join the following things, using "@" 
              shift(@F),          #   the first field
              shift(@F),          #   the second field
              map {               #   then, transform the rest with this expr
                  s{              #     search for:
                      (1\d | \d)  #       1 plus a digit, or a digit
                      (\d+)       #       followed by one or more digits
                   }{\@}xg;   #     add an "@" in between
                  $_              #     and return the new string
              } @F
' file

选项：

-a 和 -F@ -- 使用 @ 字符作为分隔符

@F

-l -- 自动处理行尾
-p -- 处理完每一行

$_

Answer 3

这可能对你有用 (GNU sed)：

sed -r 's/^(([^@]*@){2})/\n/;ta;:a;/\n[0-9]?$/s/\n//;t;/\n(1[0-9]|[0-9])([0-9][0-9]?)/s//@\n/;ta;/\n([0-9]?[^0-9\n]) ?/s//\n/;ta' file

这会在第二个字段之后插入一个换行符，然后进行模式匹配和循环，每个连续的匹配都会向前移动换行符，直到换行符被删除时的行尾。

Answer 4

感谢您的快速回复。
他们帮助我解析了上传前需要完成的数据集。由于简单，我最终使用的解决方案是基于珍珠的解决方案。再次感谢您的回答。

如何使用 "if" 条件拆分整数？

How do I split an integer using an "if" condition?

regex

awk

if-statement

sed

string-length