Bash 脚本超级慢

Question

我正在更新一个旧脚本来解析 ARP 数据并从中获取有用的信息。我们添加了一个新路由器，虽然我可以从路由器中提取 ARP 数据，但它是一种新格式。我有一个文件 "zTempMonth"，它是来自两组路由器的所有 arp 数据，我需要将其编译成规范化的新数据格式。下面的代码行在逻辑上完成了我需要它们执行的操作 - 但速度非常慢 - 因为运行这些循环需要几天的时间，而之前脚本需要 20-30 分钟。有没有办法加快它的速度，或者找出是什么减慢了它的速度？

提前谢谢你，

    echo "Parsing zTempMonth"
    while read LINE
    do
            wc=`echo $LINE | wc -w`
            if [[ $wc -eq "6" ]]; then
                    true
                    out=$(echo $LINE | awk '{ print  " "  " " }')
                    echo $out >> zTempMonth.tmp

            else
                    false
            fi

            if [[ $wc -eq "4" ]]; then
                    true
                    out=$(echo $LINE | awk '{ print  " "  " " }')
                    echo $out >> zTempMonth.tmp
            else
                    false
            fi


    done < zTempMonth

Answer 1

虽然读取循环很慢。
循环中的子 shell 很慢。
>> (open(f, 'a')) 循环调用很慢。

你可以加快速度并保持纯粹 bash，只需失去 #2 和 #3：

#!/usr/bin/env bash

while read -a line; do
    case "${#line[@]}" in
        6) printf '%s %s %s\n' "${line[1]}" "${line[3]}" "${line[5]}";;
        4) printf '%s %s %s\n' "${line[0]}" "${line[2]}" "${line[3]}";;
    esac
done < zTempMonth >> zTempMonth.tmp

但是如果多于几行，这仍然比纯 awk 慢。考虑一个像这样简单的 awk 脚本：

BEGIN {
    print "Parsing zTempMonth"
}   

NF == 6 {
    print  " "  " " 
}   

NF == 4 {
    print  " "  " " 
}

你可以这样执行：

awk -f thatAwkScript zTempMonth >> zTempMonth.tmp

获得与当前脚本相同的追加方法。

Answer 2

编写 shell 脚本时，直接调用函数几乎总是比使用 subshell 调用函数更好。我见过的通常约定是回显函数的 return 值并使用 subshell 捕获该输出。例如：

#!/bin/bash
function get_path() {
    echo "/path/to/something"
}
mypath="$(get_path)"

这很好用，但是使用 subshell 会有很大的速度开销，并且有一个更快的替代方法。相反，您可以有一个约定，其中特定变量始终是函数的 return 值（我使用 retval）。这还有一个额外的好处，就是允许您从函数中 return 数组。

如果您不知道什么是 subshell，就本博客而言 post subshell 是另一个 bash shell 它会在您使用 $() 或 `` 时生成，用于执行您放入其中的代码。

我做了一些简单的测试让你观察开销。对于两个功能等效的脚本：

这个用了subshell:

#!/bin/bash
function a() {
    echo hello
}
for (( i = 0; i < 10000; i++ )); do
    echo "$(a)"
done

这个使用了一个变量：

#!/bin/bash
function a() {
    retval="hello"
}
for (( i = 0; i < 10000; i++ )); do
    a
    echo "$retval"
done

这两者之间的速度差异很明显。

$ for i in variable subshell; do
> echo -e "\n$i"; time ./$i > /dev/null
> done

variable

real 0m0.367s
user 0m0.346s
sys 0m0.015s

subshell

real 0m11.937s
user 0m3.121s
sys 0m0.359s

如您所见，使用 variable 时，执行时间为 0.367 秒。 subshell 然而需要整整 11.937 秒！

来源：http://rus.har.mn/blog/2010-07-05/subshells/

Bash 脚本超级慢

Bash Script is super slow

bash

performance