用于按顺序检查丢失文件的 Unix 命令

Question

以下是文件夹中的文件格式。

File format - fact_type_<key>_partid
fact_type_123_1
fact_type_123_2
fact_type_123_3
fact_type_123_4
fact_type_124_1
fact_type_124_2
fact_type_124_3
fact_type_124_4
..
fact_type_130_1

每个key应该有4个文件(i.e Key1 should have 4 files ending with 1, 2, 3 and 4).

键应该按顺序排列，对于上面的例子，下一个文件应该是 fact_type_125_1

以上文件是从外部进程加载的，如果我们没有开始和结束键之间的所有文件，下一个进程将失败 (4 files for each key and all keys starting 123 till 130)。

现在正在使用 cut 命令将数据复制到 excel 然后找出任何丢失的键

ls -1a | cut -d '_' -f3 | sort | uniq

请帮助我在文件夹中验证此命令。

Answer 1

使用 bash 和 GNU 排序：

for f1 in fact_type_*; do
  echo "${f1%_[0-9]}"
done | sort -u |\
while read -r f2; do
  for ((i=1; i<=4; i++)); do
    f="${f2}_${i}"
    [[ ! -e "$f" ]] && echo "missing $f"
  done
done

输出（例如）：

missing fact_type_126_4
missing fact_type_127_1
missing fact_type_127_2
missing fact_type_127_4

Answer 2

因此，限制条件：

Each key should have 4 files

Keys should be in sequence

所以我这样做了：

首先我们需要获取所有文件
然后我们需要最大和最小键
然后我们需要从每个 {1..4} 后缀的最小和最大密钥生成序列
然后我们需要检查每个条目是否存在文件

脚本：

check() {
  local keys
  keys=$(
    # find all the files
    find "" -regex '.*/fact_type_[0-9]+_[0-4]' \
      -type f -printf "%f\n" |
    # extract the keys
    cut -d_ -f3
  )

  if [ -z "$keys" ]; then
    echo "No files found"
    return 255
  fi

  local nonexisting
  nonexisting=$(
    # sort it
    <<<"$keys" sort |
    # extract first and last key only
    sed -n '1p;$p' |
    # generate sequence
    xargs seq |
    # append {1..4} to all keys
    xargs -i printf "%s\n" "fact_type_{}_"{1..4} |
    # print only nonexisting files
    xargs -l sh -c '[ ! -e "" ] && printf "%s\n" ""' --
  )

  if [ -n "$nonexisting" ]; then
    <<<"$nonexisting" xargs printf "File %s does not exists\n"
    return "$(<<<"$nonexisting" wc -l)"
  fi
}

touch fact_type_{123..130}_{1..4}

check .  # all ok

rm fact_type_130_1
rm fact_type_125_4
check .  # two files missing

会输出（第一个check .什么都不输出，第二个只输出）：

File fact_type_125_4 does not exists
File fact_type_130_1 does not exists

测试于 repl。

Answer 3

使用 GNU awk 获取数组的数组和 sorted_in:

$ cat tst.awk
BEGIN {
    for (i=1; i<ARGC; i++) {
        fname = ARGV[i]
        split(fname,fparts,/_/)
        key = fparts[3]
        id  = fparts[4]
        ids[key][pid]
    }
    PROCINFO["sorted_in"] = "@ind_num_asc"
    for (key in ids) {
        if ( (prevKey != "") && (key != prevKey+1) ) {
            printf "key gap: %s -> %s\n", prevKey, key | "cat>&2"
        }
        prevId = ""
        idCnt = 0
        for (id in ids[key]) {
            if ( (prevId != "") && (id != prevId+1) ) {
                printf "id gap: %s, %s -> %s\n", key, prevId, id | "cat>&2"
            }
            if (id !~ /^[1234]$/) {
                printf "bad id: %s, %s\n", key, id | "cat>&2"
            }
            idCnt++
            prevId = id
        }
        if (idCnt != 4) {
            printf "bad id count: %s, %s\n", key, idCnt | "cat>&2"
        }
        prevKey = key
    }
}

$ awk -f tst.awk *

用于按顺序检查丢失文件的 Unix 命令

Unix Command to check for missing file in sequence

unix

awk

cut

sed