如何使用 Bash 4 将 CSV 数据转换为关联数组？

Question

文件 /tmp/file.csv 包含以下内容：

name,age,gender
bob,21,m
jane,32,f

CSV 文件将始终包含 headers.. 但可能包含不同数量的字段：

id,title,url,description
1,foo name,foo.io,a cool foo site
2,bar title,http://bar.io,a great bar site
3,baz heading,https://baz.io,some description

在任何一种情况下，我都想将我的 CSV 数据转换为关联数组的数组..

我需要什么

所以，我想要一个 Bash 4.3 函数，它将 CSV 作为管道输入并将数组发送到标准输出：

/tmp/file.csv:

name,age,gender
bob,21,m
jane,32,f

需要在我的模板系统中使用，像这样:

{{foo | csv_to_array | foo2}}

^ 这是固定的 API，我必须使用该语法。foo2 必须接收数组作为标准输入。

csv_to_array func 必须做它的事情，这样之后我就可以这样做：

$ declare -p row1; declare -p row2; declare -p new_array;

它会给我这个:

declare -A row1=([gender]="m" [name]="bob" [age]="21" )
declare -A row2=([gender]="f" [name]="jane" [age]="32" )
declare -a new_array=([0]="row1" [1]="row2")

..一旦我有了这个数组结构（关联数组名称的索引数组），我就有了一个 shell-based 模板系统来访问它们，就像这样：

{{#new_array}}
  Hi {{item.name}}, you are {{item.age}} years old.
{{/new_array}}

但我正在努力生成我需要的数组..

我尝试过的事情：

我已经尝试以此为起点来获取我需要的数组结构：

while IFS=',' read -r -a my_array; do
    echo ${my_array[0]} ${my_array[1]} ${my_array[2]}
done <<< $(cat /tmp/file.csv)

（来自 Shell: CSV to array）

..还有这个：

cat /tmp/file.csv | while read line; do
  line=( ${line//,/ } )
  echo "0: ${line[0]}, 1: ${line[1]}, all: ${line[@]}" 
done

（来自 https://www.reddit.com/r/commandline/comments/1kym4i/bash_create_array_from_one_line_in_csv/cbu9o2o/）

但我在从另一端得到我想要的东西方面并没有真正取得任何进展...

编辑：

接受了第二个答案，但我不得不破解我正在使用的库来使任一解决方案都起作用..

我很乐意查看其他答案，这些答案不将声明命令导出为字符串，在当前环境中为运行，而是以某种方式将声明命令的结果数组提升到当前环境（当前环境是函数运行来自的任何地方）。

示例：

$ cat file.csv | csv_to_array
$ declare -p row2 # gives the data

所以，要明确一点，如果上面的 ^ 在终端中工作，它将在我正在使用的库中工作，而无需我必须添加的 hack（这涉及为 ^declare -a 和在其他函数中使用 source <(cat); eval $STDIN...）...

有关更多信息，请参阅我对第二个答案的评论。

Answer 1

方法很简单：

将第 headers 列读入数组
逐行读取文件，每一行...
- 创建一个新的关联数组并将其名称注册到数组名称数组中
- 读取字段，按列赋值headers

_{在最后一步中，我们不能使用 read -a、mapfile 或类似的东西，因为它们只会创建以数字作为索引的常规数组，但我们需要一个关联数组，所以我们必须手动创建数组.}

但是，由于 bash 的怪癖，实现有点复杂。

以下函数解析 stdin 并相应地创建数组。我冒昧地将您的数组 new_array 重命名为 rowNames.

#! /bin/bash
csvToArrays() {
    IFS=, read -ra header
    rowIndex=0
    while IFS= read -r line; do
        ((rowIndex++))
        rowName="row$rowIndex"
        declare -Ag "$rowName"
        IFS=, read -ra fields <<< "$line"
        fieldIndex=0
        for field in "${fields[@]}"; do
            printf -v quotedFieldHeader %q "${header[fieldIndex++]}"
            printf -v "$rowName[$quotedFieldHeader]" %s "$field"
        done
        rowNames+=("$rowName")
    done
    declare -p "${rowNames[@]}" rowNames
}

在管道中调用函数无效。 Bash 在子 shell 的管道中执行命令，因此您无法访问 someCommand | csvToArrays 创建的数组。相反，将函数调用为以下之一

csvToArrays < <(someCommand) # when input comes from a command, except "cat file"
csvToArrays < someFile       # when input comes from a file

Bash 像这样的脚本往往很慢。这就是为什么我懒得从内部循环中提取 printf -v quotedFieldHeader … 的原因，即使它会一遍又一遍地做同样的工作。
我认为整个模板化的东西和所有相关的东西在 python、perl 或类似的语言中会更容易编程和更快地执行。

Answer 2

以下脚本：

csv_to_array() {
    local -a values
    local -a headers
    local counter

    IFS=, read -r -a headers
    declare -a new_array=()
    counter=1
    while IFS=, read -r -a values; do
        new_array+=( row$counter )
        declare -A "row$counter=($(
            paste -d '' <(
                printf "[%s]=\n" "${headers[@]}"
            ) <(
                printf "%q\n" "${values[@]}"
            )
        ))"
        (( counter++ ))
    done
    declare -p new_array ${!row*}
}

foo2() {
    source <(cat)
    declare -p new_array ${!row*} |
    sed 's/^/foo2: /'
}

echo "==> TEST 1 <=="

cat <<EOF |
id,title,url,description
1,foo name,foo.io,a cool foo site
2,bar title,http://bar.io,a great bar site
3,baz heading,https://baz.io,some description
EOF
csv_to_array |
foo2 

echo "==> TEST 2 <=="

cat <<EOF |
name,age,gender
bob,21,m
jane,32,f
EOF
csv_to_array |
foo2

将输出：

==> TEST 1 <==
foo2: declare -a new_array=([0]="row1" [1]="row2" [2]="row3")
foo2: declare -A row1=([url]="foo.io" [description]="a cool foo site" [id]="1" [title]="foo name" )
foo2: declare -A row2=([url]="http://bar.io" [description]="a great bar site" [id]="2" [title]="bar title" )
foo2: declare -A row3=([url]="https://baz.io" [description]="some description" [id]="3" [title]="baz heading" )
==> TEST 2 <==
foo2: declare -a new_array=([0]="row1" [1]="row2")
foo2: declare -A row1=([gender]="m" [name]="bob" [age]="21" )
foo2: declare -A row2=([gender]="f" [name]="jane" [age]="32" )

输出来自 foo2 函数。

csv_to_array 函数首先读取标头。然后对于每个读取的行，它将新元素添加到 new_array 数组中，并创建一个名为 row$index 的新关联数组，其中的元素是通过连接 headers 名称和从该行读取的值创建的。最后，declare -p 的输出从函数输出。

foo2 函数获取标准输入，因此数组进入它的作用域。然后它再次输出这些值，在每一行前面加上 foo2:.

如何使用 Bash 4 将 CSV 数据转换为关联数组？

How do I convert CSV data into an associative array using Bash 4?

csv

arrays

bash

shell

associative-array

我需要什么

我尝试过的事情：

编辑：