阅读专栏并查找中位数 (Bash)

Question

我想找到每列的中位数，但它并不像我想要的那样工作。

1 2 3 
3 2 1
2 1 5

我期待

2 2 3

结果，但事实证明它只给出了总和误差和一些 "sum" 的列。下面是 "median in column"

的代码片段

while read -r line; do
    read -a array <<< "$line"
    for i in "${!array[@]}"
    do
      column[${i}]=${array[$i]}
      ((length[${i}]++))
      result=${column[*]} | sort -n
    done < file
 for i in ${!column[@]}
 do
   #some median calculation.....

注意：我想练习 bash，这就是我使用 bash 硬编码的原因。如果有人能帮助我，我真的很感激，尤其是在 BASH 中。谢谢。

Answer 1

Bash 确实不适合这样的低级文本处理：read 命令对其读取的每个字符进行系统调用 ，这意味着它很慢，而且是 CPU 猪。处理交互式输入还可以，但将其用于一般文本处理就很疯狂了。为此使用 awk（Python、Perl 等）会好得多。

作为学习Bash的练习，我猜测没问题，但请尽量避免在实际程序中使用read进行批量文本处理。有关详细信息，请参阅 Unix 和 Linux Stack Exchange 站点上的 Why is using a shell loop to process text considered bad practice?，尤其是 Stéphane Chazelas（Shellshock Bash 错误的发现者）。

无论如何，回到你的问题... :)

你的大部分代码都可以，但是

result=${column[*]} | sort -n

没有按照您的意愿去做。

这是获得纯 Bash 列中位数的一种方法：

#!/usr/bin/env bash # Find medians of columns of numeric data # See # Written by PM 2Ring 2015.10.13 fname= echo "input data:" cat "$fname" echo #Read rows, saving into columns numrows=1 while read -r -a array; do ((numrows++)) for i in "${!array[@]}"; do #Separate column items with a newline column[i]+="${array[i]}"$'\n' done done < "$fname" #Calculate line number of middle value; which must be 1-based to use as `head` #argument, and must compensate for extra newline added by 'here' string, `<<<` midrow=$((1+numrows/2)) echo "midrow: $midrow" #Get median of each column result='' for i in "${!column[@]}"; do median=$(sort -n <<<"${column[i]}" | head -n "$midrow" | tail -n 1) result+="$median " done echo "result: $result"

输出

input data: 1 2 3 3 2 1 2 1 5 midrow: 3 result: 2 2 3

阅读专栏并查找中位数 (Bash)

Reading Column and Find Median (Bash)

bash

median