Shell 逐列验证 csv 文件的脚本
Shell script to validate a csv file column by column
我想知道如何在 shell 中写这篇文章?我想通过 coulmn 验证 csv 文件中的字段。例如只想验证第一列是否为数字
Number,Letter
1,u
2,h
3,d
4,j
以上
Loop - for all files (loop1)
loop from rows(2-n) (loop2) #skipping first row since its a header
validate column 1
validate column 2
...
end loop2
if( file pass validation)
copy to goodFile directory
else(
send to badFile directory
end loop1
下面是逐行验证,我需要做哪些修改才能使它像上面的伪代码一样。我在unix上很糟糕,刚开始学习awk。
#!/bin/sh
for file in /source/*.csv
do
awk -F"," '{ # awk -F", " {'print'} to get the fields.
$date_regex = '~(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)\d\d~';
if (length() == "")
break
if (length() == "") && (length() > 30)
break
if (length() == "") && ( !~ /$date_regex/)
break
if (length() == "") && (( != "S") || ( != "E")
break
if (length() == "") && ((length() < 9 || (length() > 11)))
break
}' file
#whatever you need with "$file"
完成
假设文件中没有多余的空格,下面是我在 bash 中的做法。
# validate: first field is an integer
# validate: 2nd field is a lower-case letter
for file in *.csv; do
good=true
while IFS=, read -ra fields; do
if [[ ! (
${fields[0]} =~ ^[+-]?[[:digit:]]+$
&& ${fields[1]} == [a-z]
) ]]
then
good=false
break
fi
done < "$file"
if $good; then
: # handle good file
else
: # handle bad file
fi
done
我会结合两种不同的方式来写一个循环。
以#开头的行是注释:
# Read all files. I hope no file have spaces in their names
for file in /source/*.csv ; do
# init two variables before processing a new file
FILESTATUS=GOOD
FIRSTROW=true
# process file 1 line a time, splitting the line by the
# Internal Field Sep ,
cat "${file}" | while IFS=, read field1 field2; do
# Skip first line, the header row
if [ "${FIRSTROW}" = "true" ]; then
FIRSTROW=FALSE
# skip processing of this line, continue with next record
continue;
fi
# Lot of different checks possible here
# Can google them easy (check field integer)
if [[ "${field1}" = somestringprefix* ]]; then
${FILESTATUS}=BAD
# Stop inner loop
break
fi
somecheckonField2
done
if [ ${FILESTATUS} = "GOOD" ] ; then
mv ${file} /source/good
else
mv ${file} /source/bad
fi
done
我想知道如何在 shell 中写这篇文章?我想通过 coulmn 验证 csv 文件中的字段。例如只想验证第一列是否为数字
Number,Letter
1,u
2,h
3,d
4,j
以上
Loop - for all files (loop1)
loop from rows(2-n) (loop2) #skipping first row since its a header
validate column 1
validate column 2
...
end loop2
if( file pass validation)
copy to goodFile directory
else(
send to badFile directory
end loop1
下面是逐行验证,我需要做哪些修改才能使它像上面的伪代码一样。我在unix上很糟糕,刚开始学习awk。
#!/bin/sh
for file in /source/*.csv
do
awk -F"," '{ # awk -F", " {'print'} to get the fields.
$date_regex = '~(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)\d\d~';
if (length() == "")
break
if (length() == "") && (length() > 30)
break
if (length() == "") && ( !~ /$date_regex/)
break
if (length() == "") && (( != "S") || ( != "E")
break
if (length() == "") && ((length() < 9 || (length() > 11)))
break
}' file
#whatever you need with "$file"
完成
假设文件中没有多余的空格,下面是我在 bash 中的做法。
# validate: first field is an integer
# validate: 2nd field is a lower-case letter
for file in *.csv; do
good=true
while IFS=, read -ra fields; do
if [[ ! (
${fields[0]} =~ ^[+-]?[[:digit:]]+$
&& ${fields[1]} == [a-z]
) ]]
then
good=false
break
fi
done < "$file"
if $good; then
: # handle good file
else
: # handle bad file
fi
done
我会结合两种不同的方式来写一个循环。 以#开头的行是注释:
# Read all files. I hope no file have spaces in their names
for file in /source/*.csv ; do
# init two variables before processing a new file
FILESTATUS=GOOD
FIRSTROW=true
# process file 1 line a time, splitting the line by the
# Internal Field Sep ,
cat "${file}" | while IFS=, read field1 field2; do
# Skip first line, the header row
if [ "${FIRSTROW}" = "true" ]; then
FIRSTROW=FALSE
# skip processing of this line, continue with next record
continue;
fi
# Lot of different checks possible here
# Can google them easy (check field integer)
if [[ "${field1}" = somestringprefix* ]]; then
${FILESTATUS}=BAD
# Stop inner loop
break
fi
somecheckonField2
done
if [ ${FILESTATUS} = "GOOD" ] ; then
mv ${file} /source/good
else
mv ${file} /source/bad
fi
done