用于按顺序检查丢失文件的 Unix 命令
Unix Command to check for missing file in sequence
以下是文件夹中的文件格式。
File format - fact_type_<key>_partid
fact_type_123_1
fact_type_123_2
fact_type_123_3
fact_type_123_4
fact_type_124_1
fact_type_124_2
fact_type_124_3
fact_type_124_4
..
fact_type_130_1
每个key应该有4个文件(i.e Key1 should have 4 files ending with 1, 2, 3 and 4).
键应该按顺序排列,对于上面的例子,下一个文件应该是 fact_type_125_1
以上文件是从外部进程加载的,如果我们没有开始和结束键之间的所有文件,下一个进程将失败 (4 files for each key and all keys starting 123 till 130)
。
现在正在使用 cut 命令将数据复制到 excel 然后找出任何丢失的键
ls -1a | cut -d '_' -f3 | sort | uniq
请帮助我在文件夹中验证此命令。
使用 bash 和 GNU 排序:
for f1 in fact_type_*; do
echo "${f1%_[0-9]}"
done | sort -u |\
while read -r f2; do
for ((i=1; i<=4; i++)); do
f="${f2}_${i}"
[[ ! -e "$f" ]] && echo "missing $f"
done
done
输出(例如):
missing fact_type_126_4
missing fact_type_127_1
missing fact_type_127_2
missing fact_type_127_4
因此,限制条件:
Each key should have 4 files
Keys should be in sequence
所以我这样做了:
- 首先我们需要获取所有文件
- 然后我们需要最大和最小键
- 然后我们需要从每个 {1..4} 后缀的最小和最大密钥生成序列
- 然后我们需要检查每个条目是否存在文件
脚本:
check() {
local keys
keys=$(
# find all the files
find "" -regex '.*/fact_type_[0-9]+_[0-4]' \
-type f -printf "%f\n" |
# extract the keys
cut -d_ -f3
)
if [ -z "$keys" ]; then
echo "No files found"
return 255
fi
local nonexisting
nonexisting=$(
# sort it
<<<"$keys" sort |
# extract first and last key only
sed -n '1p;$p' |
# generate sequence
xargs seq |
# append {1..4} to all keys
xargs -i printf "%s\n" "fact_type_{}_"{1..4} |
# print only nonexisting files
xargs -l sh -c '[ ! -e "" ] && printf "%s\n" ""' --
)
if [ -n "$nonexisting" ]; then
<<<"$nonexisting" xargs printf "File %s does not exists\n"
return "$(<<<"$nonexisting" wc -l)"
fi
}
touch fact_type_{123..130}_{1..4}
check . # all ok
rm fact_type_130_1
rm fact_type_125_4
check . # two files missing
会输出(第一个check .
什么都不输出,第二个只输出):
File fact_type_125_4 does not exists
File fact_type_130_1 does not exists
测试于 repl。
使用 GNU awk 获取数组的数组和 sorted_in:
$ cat tst.awk
BEGIN {
for (i=1; i<ARGC; i++) {
fname = ARGV[i]
split(fname,fparts,/_/)
key = fparts[3]
id = fparts[4]
ids[key][pid]
}
PROCINFO["sorted_in"] = "@ind_num_asc"
for (key in ids) {
if ( (prevKey != "") && (key != prevKey+1) ) {
printf "key gap: %s -> %s\n", prevKey, key | "cat>&2"
}
prevId = ""
idCnt = 0
for (id in ids[key]) {
if ( (prevId != "") && (id != prevId+1) ) {
printf "id gap: %s, %s -> %s\n", key, prevId, id | "cat>&2"
}
if (id !~ /^[1234]$/) {
printf "bad id: %s, %s\n", key, id | "cat>&2"
}
idCnt++
prevId = id
}
if (idCnt != 4) {
printf "bad id count: %s, %s\n", key, idCnt | "cat>&2"
}
prevKey = key
}
}
$ awk -f tst.awk *
以下是文件夹中的文件格式。
File format - fact_type_<key>_partid
fact_type_123_1
fact_type_123_2
fact_type_123_3
fact_type_123_4
fact_type_124_1
fact_type_124_2
fact_type_124_3
fact_type_124_4
..
fact_type_130_1
每个key应该有4个文件(i.e Key1 should have 4 files ending with 1, 2, 3 and 4).
键应该按顺序排列,对于上面的例子,下一个文件应该是 fact_type_125_1
以上文件是从外部进程加载的,如果我们没有开始和结束键之间的所有文件,下一个进程将失败 (4 files for each key and all keys starting 123 till 130)
。
现在正在使用 cut 命令将数据复制到 excel 然后找出任何丢失的键
ls -1a | cut -d '_' -f3 | sort | uniq
请帮助我在文件夹中验证此命令。
使用 bash 和 GNU 排序:
for f1 in fact_type_*; do
echo "${f1%_[0-9]}"
done | sort -u |\
while read -r f2; do
for ((i=1; i<=4; i++)); do
f="${f2}_${i}"
[[ ! -e "$f" ]] && echo "missing $f"
done
done
输出(例如):
missing fact_type_126_4 missing fact_type_127_1 missing fact_type_127_2 missing fact_type_127_4
因此,限制条件:
Each key should have 4 files
Keys should be in sequence
所以我这样做了:
- 首先我们需要获取所有文件
- 然后我们需要最大和最小键
- 然后我们需要从每个 {1..4} 后缀的最小和最大密钥生成序列
- 然后我们需要检查每个条目是否存在文件
脚本:
check() {
local keys
keys=$(
# find all the files
find "" -regex '.*/fact_type_[0-9]+_[0-4]' \
-type f -printf "%f\n" |
# extract the keys
cut -d_ -f3
)
if [ -z "$keys" ]; then
echo "No files found"
return 255
fi
local nonexisting
nonexisting=$(
# sort it
<<<"$keys" sort |
# extract first and last key only
sed -n '1p;$p' |
# generate sequence
xargs seq |
# append {1..4} to all keys
xargs -i printf "%s\n" "fact_type_{}_"{1..4} |
# print only nonexisting files
xargs -l sh -c '[ ! -e "" ] && printf "%s\n" ""' --
)
if [ -n "$nonexisting" ]; then
<<<"$nonexisting" xargs printf "File %s does not exists\n"
return "$(<<<"$nonexisting" wc -l)"
fi
}
touch fact_type_{123..130}_{1..4}
check . # all ok
rm fact_type_130_1
rm fact_type_125_4
check . # two files missing
会输出(第一个check .
什么都不输出,第二个只输出):
File fact_type_125_4 does not exists
File fact_type_130_1 does not exists
测试于 repl。
使用 GNU awk 获取数组的数组和 sorted_in:
$ cat tst.awk
BEGIN {
for (i=1; i<ARGC; i++) {
fname = ARGV[i]
split(fname,fparts,/_/)
key = fparts[3]
id = fparts[4]
ids[key][pid]
}
PROCINFO["sorted_in"] = "@ind_num_asc"
for (key in ids) {
if ( (prevKey != "") && (key != prevKey+1) ) {
printf "key gap: %s -> %s\n", prevKey, key | "cat>&2"
}
prevId = ""
idCnt = 0
for (id in ids[key]) {
if ( (prevId != "") && (id != prevId+1) ) {
printf "id gap: %s, %s -> %s\n", key, prevId, id | "cat>&2"
}
if (id !~ /^[1234]$/) {
printf "bad id: %s, %s\n", key, id | "cat>&2"
}
idCnt++
prevId = id
}
if (idCnt != 4) {
printf "bad id count: %s, %s\n", key, idCnt | "cat>&2"
}
prevKey = key
}
}
$ awk -f tst.awk *