从另外两个列的拆分中向数据集添加一列
add a column to a dataset from the split of two other columns
我在 ubuntu 中有以下数据集,我想在 bash 中进行迭代(while 或 for)以生成一个新列,其中包含失败和通过的主题之间的商.
id, name, country, Continent, grade, passed, failed
1, Louise Smith, UK, Europe, 7, 5, 1
2, Okio Kiomoto, Japan, Asia, 9, 5, 0
3, Ralph Watson, USA, Northern America, 5.6, 5, 2
4, Mary Mcaann, South Africa, Africa, 4.7, 5, 3
5, Jack Thomson, Australia, Oceania, 10, 5, 0
6, N'dongo Mbaye, Senegal, Africa, 7.9, 5, 1
为此,我尝试在脚本中使用以下代码。但是我无法得到任何结果,因为我找不到任何方法将这个新生成的列添加到当前数据集。有什么想法吗?
while IFS=, read _ _ _ _ _ passed failed; do
newcolumn=$($passed/$failed |bc)
done
作为指导,所需的输出如下。
id, name, country, Continent, grade, passed, failed, new
1, Louise Smith, UK, Europe, 7, 5, 1, 0.2
2, Okio Kiomoto, Japan, Asia, 9, 5, 0, 0
3, Ralph Watson, USA, Northern America, 5.6, 5, 2, 0.4
4, Mary Mcaann, South Africa, Africa, 4.7, 5, 3, 0.6
5, Jack Thomson, Australia, Oceania, 10, 5, 0, 0
6, N'dongo Mbaye, Senegal, Africa, 7.9, 5, 1, 0.2
谢谢
我稍微重构了您的代码并提出了以下内容:
#!/bin/bash
# create new header
header=$(awk 'NR==1 {print}' s.dat)
printf "%s, new\n" "${header}"
# read data file data rows
while IFS=, read a b c d e passed failed; do
newcolumn=0
# avoid divide-by-zero
if [[ "${passed}" -ne "0" ]] ; then
newcolumn=$(bc <<<"scale=2; ${failed} / ${passed}")
fi
# output data with new generated column
printf "%s %3.2f\n" "${a}, ${b}, ${c}, ${d}, ${e}, ${passed}, ${failed}, " "${newcolumn}"
done < <(awk 'NR!=1 {print}' s.dat)
s.dat 的内容:
id, name, country, Continent, grade, passed, failed
1, Louise Smith, UK, Europe, 7, 5, 1
2, Okio Kiomoto, Japan, Asia, 9, 5, 0
3, Ralph Watson, USA, Northern America, 5.6, 5, 2
4, Mary Mcaann, South Africa, Africa, 4.7, 5, 3
5, Jack Thomson, Australia, Oceania, 10, 5, 0
6, N'dongo Mbaye, Senegal, Africa, 7.9, 5, 1
执行脚本时的输出:
id, name, country, Continent, grade, passed, failed, new
1, Louise Smith, UK, Europe, 7, 5, 1, 0.20
2, Okio Kiomoto, Japan, Asia, 9, 5, 0, 0.00
3, Ralph Watson, USA, Northern America, 5.6, 5, 2, 0.40
4, Mary Mcaann, South Africa, Africa, 4.7, 5, 3, 0.60
5, Jack Thomson, Australia, Oceania, 10, 5, 0, 0.00
6, N'dongo Mbaye, Senegal, Africa, 7.9, 5, 1, 0.20
更新 - 根据 OP 在评论中的问题:
不使用 awk 获取 header 行:
header=$(head -n 1 s.dat)
不使用 awk 处理数据行:
{
# extra read to skip first row
read
# read data file data rows
while IFS=, read a b c d e passed failed; do
newcolumn=0
# avoid divide-by-zero
if [[ "${passed}" -ne "0" ]] ; then
newcolumn=$(bc <<<"scale=2; ${failed} / ${passed}")
fi
# output data with new generated column
printf "%s %3.2f\n" "${a}, ${b}, ${c}, ${d}, ${e}, ${passed}, ${failed}, " "${newcolumn}"
done
} < s.dat
使用awk
$ awk 'BEGIN { FS=OFS=", " } NR == 1 { ="new" } NR > 1 { =$NF/$(NF-1) }1' input_file
id, name, country, Continent, grade, passed, failed, new
1, Louise Smith, UK, Europe, 7, 5, 1, 0.2
2, Okio Kiomoto, Japan, Asia, 9, 5, 0, 0
3, Ralph Watson, USA, Northern America, 5.6, 5, 2, 0.4
4, Mary Mcaann, South Africa, Africa, 4.7, 5, 3, 0.6
5, Jack Thomson, Australia, Oceania, 10, 5, 0, 0
6, N'dongo Mbaye, Senegal, Africa, 7.9, 5, 1, 0.2
已在 gawk 5.1.1
、mawk 1.3.4
、mawk 1.9.9.6
和 macos nawk
上进行测试和确认
______ # this pair of empty double quotes
/ # is *** essential ***, since it forces string
/ # compare, allowing rows getting "0" value in new
/ # column to print out properly
\
{m,n,g}awk '"" ($(_=NF += !+FS) = !/[0-9]/ ? "new" : \
($--_) / (+$--_ ? $_ : --_^--_^_))' FS = '[,][ \t]*'
OFS = ', '
——————————————————————————————————
id, name, country, Continent, grade, passed, failed, new
1, Louise Smith, UK, Europe, 7, 5, 1, 0.2
2, Okio Kiomoto, Japan, Asia, 9, 5, 0, 0
3, Ralph Watson, USA, Northern America, 5.6, 5, 2, 0.4
4, Mary Mcaann, South Africa, Africa, 4.7, 5, 3, 0.6
5, Jack Thomson, Australia, Oceania, 10, 5, 0, 0
6, N'dongo Mbaye, Senegal, Africa, 7.9, 5, 1, 0.2
——————————————————————————————————
(condensed version :::)
mawk '""($(_=NF+=!+FS)=!/[0-9]/?"new":($--_)/(+$--_?$_:--_^--_^_) )' FS='[,][ \t]*' OFS=', '
我在 ubuntu 中有以下数据集,我想在 bash 中进行迭代(while 或 for)以生成一个新列,其中包含失败和通过的主题之间的商.
id, name, country, Continent, grade, passed, failed
1, Louise Smith, UK, Europe, 7, 5, 1
2, Okio Kiomoto, Japan, Asia, 9, 5, 0
3, Ralph Watson, USA, Northern America, 5.6, 5, 2
4, Mary Mcaann, South Africa, Africa, 4.7, 5, 3
5, Jack Thomson, Australia, Oceania, 10, 5, 0
6, N'dongo Mbaye, Senegal, Africa, 7.9, 5, 1
为此,我尝试在脚本中使用以下代码。但是我无法得到任何结果,因为我找不到任何方法将这个新生成的列添加到当前数据集。有什么想法吗?
while IFS=, read _ _ _ _ _ passed failed; do
newcolumn=$($passed/$failed |bc)
done
作为指导,所需的输出如下。
id, name, country, Continent, grade, passed, failed, new
1, Louise Smith, UK, Europe, 7, 5, 1, 0.2
2, Okio Kiomoto, Japan, Asia, 9, 5, 0, 0
3, Ralph Watson, USA, Northern America, 5.6, 5, 2, 0.4
4, Mary Mcaann, South Africa, Africa, 4.7, 5, 3, 0.6
5, Jack Thomson, Australia, Oceania, 10, 5, 0, 0
6, N'dongo Mbaye, Senegal, Africa, 7.9, 5, 1, 0.2
谢谢
我稍微重构了您的代码并提出了以下内容:
#!/bin/bash
# create new header
header=$(awk 'NR==1 {print}' s.dat)
printf "%s, new\n" "${header}"
# read data file data rows
while IFS=, read a b c d e passed failed; do
newcolumn=0
# avoid divide-by-zero
if [[ "${passed}" -ne "0" ]] ; then
newcolumn=$(bc <<<"scale=2; ${failed} / ${passed}")
fi
# output data with new generated column
printf "%s %3.2f\n" "${a}, ${b}, ${c}, ${d}, ${e}, ${passed}, ${failed}, " "${newcolumn}"
done < <(awk 'NR!=1 {print}' s.dat)
s.dat 的内容:
id, name, country, Continent, grade, passed, failed
1, Louise Smith, UK, Europe, 7, 5, 1
2, Okio Kiomoto, Japan, Asia, 9, 5, 0
3, Ralph Watson, USA, Northern America, 5.6, 5, 2
4, Mary Mcaann, South Africa, Africa, 4.7, 5, 3
5, Jack Thomson, Australia, Oceania, 10, 5, 0
6, N'dongo Mbaye, Senegal, Africa, 7.9, 5, 1
执行脚本时的输出:
id, name, country, Continent, grade, passed, failed, new
1, Louise Smith, UK, Europe, 7, 5, 1, 0.20
2, Okio Kiomoto, Japan, Asia, 9, 5, 0, 0.00
3, Ralph Watson, USA, Northern America, 5.6, 5, 2, 0.40
4, Mary Mcaann, South Africa, Africa, 4.7, 5, 3, 0.60
5, Jack Thomson, Australia, Oceania, 10, 5, 0, 0.00
6, N'dongo Mbaye, Senegal, Africa, 7.9, 5, 1, 0.20
更新 - 根据 OP 在评论中的问题:
不使用 awk 获取 header 行:
header=$(head -n 1 s.dat)
不使用 awk 处理数据行:
{
# extra read to skip first row
read
# read data file data rows
while IFS=, read a b c d e passed failed; do
newcolumn=0
# avoid divide-by-zero
if [[ "${passed}" -ne "0" ]] ; then
newcolumn=$(bc <<<"scale=2; ${failed} / ${passed}")
fi
# output data with new generated column
printf "%s %3.2f\n" "${a}, ${b}, ${c}, ${d}, ${e}, ${passed}, ${failed}, " "${newcolumn}"
done
} < s.dat
使用awk
$ awk 'BEGIN { FS=OFS=", " } NR == 1 { ="new" } NR > 1 { =$NF/$(NF-1) }1' input_file
id, name, country, Continent, grade, passed, failed, new
1, Louise Smith, UK, Europe, 7, 5, 1, 0.2
2, Okio Kiomoto, Japan, Asia, 9, 5, 0, 0
3, Ralph Watson, USA, Northern America, 5.6, 5, 2, 0.4
4, Mary Mcaann, South Africa, Africa, 4.7, 5, 3, 0.6
5, Jack Thomson, Australia, Oceania, 10, 5, 0, 0
6, N'dongo Mbaye, Senegal, Africa, 7.9, 5, 1, 0.2
已在 gawk 5.1.1
、mawk 1.3.4
、mawk 1.9.9.6
和 macos nawk
______ # this pair of empty double quotes
/ # is *** essential ***, since it forces string
/ # compare, allowing rows getting "0" value in new
/ # column to print out properly
\
{m,n,g}awk '"" ($(_=NF += !+FS) = !/[0-9]/ ? "new" : \
($--_) / (+$--_ ? $_ : --_^--_^_))' FS = '[,][ \t]*'
OFS = ', '
——————————————————————————————————
id, name, country, Continent, grade, passed, failed, new
1, Louise Smith, UK, Europe, 7, 5, 1, 0.2
2, Okio Kiomoto, Japan, Asia, 9, 5, 0, 0
3, Ralph Watson, USA, Northern America, 5.6, 5, 2, 0.4
4, Mary Mcaann, South Africa, Africa, 4.7, 5, 3, 0.6
5, Jack Thomson, Australia, Oceania, 10, 5, 0, 0
6, N'dongo Mbaye, Senegal, Africa, 7.9, 5, 1, 0.2
——————————————————————————————————
(condensed version :::)
mawk '""($(_=NF+=!+FS)=!/[0-9]/?"new":($--_)/(+$--_?$_:--_^--_^_) )' FS='[,][ \t]*' OFS=', '