如何将 YAML 数据解析为自定义 Bash 数据 array/hash 结构？

Question

我有以下 YAML 文件：

site:
  title: My blog
  domain: example.com
  author1:
    name: bob
    url: /author/bob
  author2:
    name: jane
    url: /author/jane
  header_links:
    about:
      title: About
      url: about.html
    contact:
      title: Contact Us
      url: contactus.html
  js_deps:
    - cashjs
    - jets

products:
  product1:
    name: Prod One
    price: 10
  product2:
    name: Prod Two
    price: 20

我想要一个 Bash、Python 或 AWK 函数或脚本，可以将上面的 YAML 文件作为输入 (</code>)，并且 <strong>生成然后执行</strong>以下代码（或完全等效的代码）：</p> <pre><code>unset site_title unset site_domain unset site_author1 unset site_author2 unset site_header_links unset site_header_links_about unset site_header_links_contact unset js_deps site_title="My blog" site_domain="example.com" declare -A site_author1 declare -A site_author2 site_author1=( [name]="bob" [url]="/author/bob" ) site_author2=( [name]="jane" [url]="/author/jane" ) declare -A site_header_links_about declare -A site_header_links_contact site_header_links_about=( [name]="About" [url]="about.html" ) site_header_links_contact=( [name]="Contact Us" [url]="contact.html" ) site_header_links=(site_header_links_about site_header_links_contact) js_deps=(cashjs jets) unset products unset product1 unset product2 declare -A product1 declare -A product2 product1=( [name]="Prod One" [price]=10 ) product2=( [name]="Prod Two" [price]=20 ) products=(product1 product2)

所以，逻辑是：

通过 YAML，并创建带有字符串值的下划线连接变量名称，除了在最后（底部）级别，数据应创建为关联数组或索引数组，只要有可能...另外，任何创建的关联数组都应该在索引数组中按名称列出。

所以，换句话说：

只要最后一层数据可以变成关联数组，那么它应该是 (foo.bar.hash => ${foo_bar_hash[@]}
只要最后一级数据可以变成索引数组，那么它应该是 (foo.bar.list => ${foo_bar_list[@]}
每个关联数组都应在索引数组中按名称列出，索引数组以其在 yaml 数据中的父项命名（参见示例中的 products）
否则，只需制作一个下划线连接的 var 名称并将值保存为字符串 (foo.bar.string => ${foo_bar_string}

...我需要这个特定的 Bash 数据结构的原因是我正在使用需要它的基于 Bash 的模板系统。

一旦我有了我需要的功能，我就可以在我的模板中轻松使用 YAML 数据，就像这样：

{{site_title}}

...

{{#foreach link in site_header_links}}
  <a href="{{link.url}}">{{link.name}}</a>
{{/foreach}}

...

{{#js_deps}}
  {{.}}
{{/js_deps}}

...

{{#foreach item in products}}
  {{item.name}}
  {{item.price}}
{{/foreach}}

我尝试了什么：

这与我之前问的一个问题完全相关：

这太接近了，但是我还需要生成一个 site_header_links 的关联数组，好吧 ..它失败了，因为 site_header_links 是嵌套的太深了。

我仍然喜欢在解决方案中使用 https://github.com/azohra/yaml.sh，因为它也会为模板系统提供一个简单的把手式 lookup 剽窃 :)

编辑：

非常清楚：解决方案不能使用pip、virtualenv或任何其他需要单独安装的外部依赖——它必须是自包含的 script/func（就像 https://github.com/azohra/yaml.sh 一样），它可以存在于 CMS 项目目录中......或者我不需要在这里..

...

希望评论得当的答案可以帮助我避免回到这里 ;)

Answer 1

单凭纸牌游戏的规则是很难看出来的看着人们玩一轮。以类似的方式很难准确地看到 YAML 文件的 "rules" 是什么。

下面我也对root-level做了假设作为一级、二级、三级节点，输出什么产生。对节点做出假设也是有效的基于它具有的操作级别parents，它更灵活（因为你然后可以添加例如根级别的序列），但这会实施起来有点困难。

保持声明和复合数组赋值穿插另一个代码和 "similar" 项目分组有点麻烦。为此，您需要跟踪节点类型的转换（str， dict, nested dict) 并对其进行分组。所以每个根级密钥我转储所有 unset 首先，然后是所有声明，然后是所有赋值，然后是 al 复合作业。我认为这属于“确切的事情相当于”。

因为 products -> product1/product2 被完全处理不同于具有相同节点的 site -> author1/authro2 结构，我做了一个单独的函数来处理每个根级键。

为了运行你应该为 Python (3.7/3.6) 设置一个虚拟环境，安装里面的 YAML 库：

$ python -m venv /opt/util/yaml2bash
$ /opt/util/yaml2bash/bin/pip install ruamel.yaml

然后存储以下程序，例如在 /opt/util/yaml2bash/bin/yaml2bash 并使其可执行 (chmod +x /opt/util/yaml2bash/bin/yaml2bash)

#! /opt/util/yaml2bash/bin/python

import sys
from pathlib import Path
import ruamel.yaml

if len(sys.argv) > 0:
    input = Path(sys.argv[1])
else:
    input = sys.stdin


def bash_site(k0, v0, fp):
    """this function takes a root-level key and its value (v0 a dict), constructs the 
    list of unsets and outputs based on the keys, values and type of values of v0,
    then dumps these to fp
    """
    unsets = []
    declares = []
    assignments = []
    compounds = {}
    for k1, v1 in v0.items():
        if isinstance(v1, str):
            k = k0 + '_' + k1
            unsets.append(k)
            assignments.append(f'{k}="{v1}"')
        elif isinstance(v1, dict):
            first_val = list(v1.values())[0]
            if isinstance(first_val, str):
                k = k0 + '_' + k1
                unsets.append(k)
                declares.append(k)
                assignments.append(f'{k}=(')
                for k2, v2 in v1.items():
                    q = '"' if isinstance(v2, str) else ''
                    assignments.append(f'  [{k2}]={q}{v2}{q}')
                assignments.append(')')
            elif isinstance(first_val, dict):
                for k2, v2 in v1.items(): # assume all the same type
                    k = k0 + '_' + k1 + '_' + k2   
                    unsets.append(k)
                    declares.append(k)
                    assignments.append(f'{k}=(')
                    for k3, v3 in v2.items():
                        q = '"' if isinstance(v3, str) else ''
                        assignments.append(f'  [{k2}]={q}{v3}{q}')
                    assignments.append(')')
                    compounds.setdefault(k0 + '_' + k1, []).append(k)
            else:
                raise NotImplementedError("unknown val: " + repr(first_val))
        elif isinstance(v1, list):
            unsets.append(k1)
            compounds[k1] = v1
        else:
            raise NotImplementedError("unknown val: " + repr(v1))


    if unsets:
        for item in unsets:
            print('unset', item, file=fp)
        print(file=fp)
    if declares:
        for item in declares:
            print('declare -A', item, file=fp)
        print(file=fp)
    if assignments:
        for item in assignments:
            print(item, file=fp)
        print(file=fp)
    if compounds:
        for k in compounds:
            v = ' '.join(compounds[k])
            print(f'{k}=({v})', file=fp)
        print(file=fp)


def bash_products(k0, v0, fp):
    """this function takes a root-level key and its value (v0 a dict), constructs the 
    list of unsets and outputs based on the keys, values and type of values of v0,
    then dumps these to fp
    """
    unsets = [k0]
    declares = []
    assignments = []
    compounds = {}
    for k1, v1 in v0.items():
        if isinstance(v1, dict):
            first_val = list(v1.values())[0]
            if isinstance(first_val, str):
                unsets.append(k1)
                declares.append(k1)
                assignments.append(f'{k1}=(')
                for k2, v2 in v1.items():
                    q = '"' if isinstance(v2, str) else ''
                    assignments.append(f'  [{k2}]={q}{v2}{q}')
                assignments.append(')')
                compounds.setdefault(k0, []).append(k1)
            else:
                raise NotImplementedError("unknown val: " + repr(first_val))
        else:
            raise NotImplementedError("unknown val: " + repr(v1))


    if unsets:
        for item in unsets:
            print('unset', item, file=fp)
        print(file=fp)
    if declares:
        for item in declares:
            print('declare -A', item, file=fp)
        print(file=fp)
    if assignments:
        for item in assignments:
            print(item, file=fp)
        print(file=fp)
    if compounds:
        for k in compounds:
            v = ' '.join(compounds[k])
            print(f'{k}=({v})', file=fp)
        print(file=fp)




yaml = ruamel.yaml.YAML()
data = yaml.load(input)

output = sys.stdout  # make it easier to redirect to file if necessary at some point in the future

bash_site('site', data['site'], output)
bash_products('products', data['products'], output)

如果你运行这个程序并提供你的 YAML 输入文件作为参数 (/opt/util/yaml2bash/bin/yaml2bash input.yaml) 给出：

unset site_title
unset site_domain
unset site_author1
unset site_author2
unset site_header_links_about
unset site_header_links_contact
unset js_deps

declare -A site_author1
declare -A site_author2
declare -A site_header_links_about
declare -A site_header_links_contact

site_title="My blog"
site_domain="example.com"
site_author1=(
  [name]="bob"
  [url]="/author/bob"
)
site_author2=(
  [name]="jane"
  [url]="/author/jane"
)
site_header_links_about=(
  [about]="About"
  [about]="about.html"
)
site_header_links_contact=(
  [contact]="Contact Us"
  [contact]="contactus.html"
)

site_header_links=(site_header_links_about site_header_links_contact)
js_deps=(cashjs jets)

unset products
unset product1
unset product2

declare -A product1
declare -A product2

product1=(
  [name]="Prod One"
  [price]=10
)
product2=(
  [name]="Prod Two"
  [price]=20
)

products=(product1 product2)

您可以使用 source $(/opt/util/yaml2bash/bin/yaml2bash input.yaml) 在 bash.

中获取所有这些值

请注意，所有 YAML 文件中的双引号都是多余的。

使用 Python 和 ruamel.yaml（免责声明我是那个的作者包）给你一个完整的 YAML 解析器，例如允许您使用评论和 flow-style collections:

jsdeps: [cashjs, jets]    # more compact

如果你被困在几乎 end-of-life Python 2.7 并且不能完全控制你的机器（在这种情况下你应该 install/compile Python 3.7为此），您仍然可以使用 ruamel yaml。

决定你的程序在哪里，例如~/bin
创建~/bin/ruamel（按1调整。）
cd ~/bin/ruamel
touch __init__.py
从 PyPI

latest tar file

解压 tar 文件并将生成的目录从 ruamel.yaml-X.Y.Z 重命名为 yaml

ruamel.yaml 应该可以在没有依赖项的情况下工作。在 2.7 上，ruamel.ordereddict 和 ruamel.yaml.clib 为 speed-up.

提供了 C 版本的基本例程

上面的程序需要重写一点（f-strings -> "".format() 和 pathlib.Path -> 老式的 with open(...) as fp:

Answer 2

我决定使用以下组合：

Yay的破解版：
- 添加了对简单列表的支持
- 修复了多个缩进级别
this yaml parser的破解版：
- 使用从 Yay 借来的前缀内容，以保持一致性

function yaml_to_vars {
   # find input file
   for f in "" ".yay" ".yml"
   do
     [[ -f "$f" ]] && input="$f" && break
   done
   [[ -z "$input" ]] && exit 1

   # use given dataset prefix or imply from file name
   [[ -n "" ]] && local prefix="" || {
     local prefix=$(basename "$input"); prefix=${prefix%.*}; prefix="${prefix//-/_}_";
   }

   local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo @|tr @ '4')
   sed -ne "s|,$s\]$s$|]|" \
        -e ":1;s|^\($s\)\($w\)$s:$s\[$s\(.*\)$s,$s\(.*\)$s\]|: []\n  - |;t1" \
        -e "s|^\($s\)\($w\)$s:$s\[$s\(.*\)$s\]|:\n  - |;p"  | \
   sed -ne "s|,$s}$s$|}|" \
        -e ":1;s|^\($s\)-$s{$s\(.*\)$s,$s\($w\)$s:$s\(.*\)$s}|- {}\n  : |;t1" \
        -e    "s|^\($s\)-$s{$s\(.*\)$s}|-\n  |;p" | \
   sed -ne "s|^\($s\):||" \
        -e "s|^\($s\)-$s[\"']\(.*\)[\"']$s$|$fs$fs|p" \
        -e "s|^\($s\)-$s\(.*\)$s$|$fs$fs|p" \
        -e "s|^\($s\)\($w\)$s:$s[\"']\(.*\)[\"']$s$|$fs$fs|p" \
        -e "s|^\($s\)\($w\)$s:$s\(.*\)$s$|$fs$fs|p" | \
   awk -F$fs '{
      indent = length()/2;
      vname[indent] = ;
      for (i in vname) {if (i > indent) {delete vname[i]; idx[i]=0}}
      if(length()== 0){  vname[indent]= ++idx[indent] };
      if (length() > 0) {
         vn=""; for (i=0; i<indent; i++) { vn=(vn)(vname[i])("_")}
         printf("%s%s%s=\"%s\"\n", "'$prefix'",vn, vname[indent], );
      }
   }'
}

yay_parse() {

   # find input file
   for f in "" ".yay" ".yml"
   do
     [[ -f "$f" ]] && input="$f" && break
   done
   [[ -z "$input" ]] && exit 1

   # use given dataset prefix or imply from file name
   [[ -n "" ]] && local prefix="" || {
     local prefix=$(basename "$input"); prefix=${prefix%.*}; prefix=${prefix//-/_};
   }

   echo "unset $prefix; declare -g -a $prefix;"

   local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo @|tr @ '4')
   #sed -n -e "s|^\($s\)\($w\)$s:$s\"\(.*\)\"$s$|$fs$fs|p" \
   #       -e "s|^\($s\)\($w\)$s:$s\(.*\)$s$|$fs$fs|p" "$input" |
   sed -ne "s|,$s\]$s$|]|" \
        -e ":1;s|^\($s\)\($w\)$s:$s\[$s\(.*\)$s,$s\(.*\)$s\]|: []\n  - |;t1" \
        -e "s|^\($s\)\($w\)$s:$s\[$s\(.*\)$s\]|:\n  - |;p"  | \
   sed -ne "s|,$s}$s$|}|" \
        -e ":1;s|^\($s\)-$s{$s\(.*\)$s,$s\($w\)$s:$s\(.*\)$s}|- {}\n  : |;t1" \
        -e    "s|^\($s\)-$s{$s\(.*\)$s}|-\n  |;p" | \
   sed -ne "s|^\($s\):||" \
        -e "s|^\($s\)-$s[\"']\(.*\)[\"']$s$|$fs$fs|p" \
        -e "s|^\($s\)-$s\(.*\)$s$|$fs$fs|p" \
        -e "s|^\($s\)\($w\)$s:$s[\"']\(.*\)[\"']$s$|$fs$fs|p" \
        -e "s|^\($s\)\($w\)$s:$s\(.*\)$s$|$fs$fs|p" | \
   awk -F$fs '{
      indent       = length()/2;
      key          = ;
      value        = ;

      # No prefix or parent for the top level (indent zero)
      root_prefix  = "'$prefix'_";
      if (indent == 0) {
        prefix = "";          parent_key = "'$prefix'";
      } else {
        prefix = root_prefix; parent_key = keys[indent-1];
      }

      keys[indent] = key;

      # remove keys left behind if prior row was indented more than this row
      for (i in keys) {if (i > indent) {delete keys[i]}}

      # if we have a value
      if (length(value) > 0) {

        # set values here

        # if the "key" is missing, make array indexed, not assoc..

        if (length(key) == 0) {
          # array item has no key, only a value..
          # so, if we didnt already unset the assoc array
          if (unsetArray == 0) {
            # unset the assoc array here
            printf("unset %s%s; ", prefix, parent_key);
            # switch the flag, so we only unset once, before adding values
            unsetArray = 1;
          }
          # array was unset, has no key, so add item using indexed array syntax
          printf("%s%s+=(\"%s\");\n", prefix, parent_key, value);

        } else {
          # array item has key and value, add item using assoc array syntax
          printf("%s%s[%s]=\"%s\";\n", prefix, parent_key, key, value);
        }

      } else {

        # declare arrays here

        # reset this flag for each new array we work on...
        unsetArray = 0;

        # if item has no key, declare indexed array
        if (length(key) == 0) {
          # indexed
          printf("unset %s%s; declare -g -a %s%s;\n", root_prefix, key, root_prefix, key);

        # if item has numeric key, declare indexed array
        } else if (key ~ /^[[:digit:]]/) {
          printf("unset %s%s; declare -g -a %s%s;\n", root_prefix, key, root_prefix, key);

        # else (item has a string for a key), declare associative array
        } else {
          printf("unset %s%s; declare -g -A %s%s;\n", root_prefix, key, root_prefix, key);
        }

        # set root level values here

        if (indent > 0) {
          # add to associative array
          printf("%s%s[%s]+=\"%s%s\";\n", prefix, parent_key , key, root_prefix, key);
        } else {
          # add to indexed array
          printf("%s%s+=( \"%s%s\");\n", prefix, parent_key , root_prefix, key);
        }

      }
   }'
}

# helper to load yay data file
yay() {
  # yaml_to_vars "$@"  ## uncomment to debug (prints data to stdout)
  eval $(yaml_to_vars "$@")

  # yay_parse "$@"  ## uncomment to debug (prints data to stdout)
  eval $(yay_parse "$@")
}

使用上面的代码，当products.yml包含：

  product1
    name: Foo
    price: 100
  product2
    name: Bar
    price: 200

解析器可以这样调用：

source path/to/yml-parser.sh
yay products.yml

它生成并计算此代码：

products_product1_name="Foo"
products_product1_price="100"
products_product2_name="Bar"
products_product2_price="200"
unset products;
declare -g -a products;
unset products_product1;
declare -g -A products_product1;
products+=( "products_product1");
products_product1[name]="Foo";
products_product1[price]="100";
unset products_product2;
declare -g -A products_product2;
products+=( "products_product2");
products_product2[name]="Bar";
products_product2[price]="200";

所以，我得到以下 Bash 数组和变量：

declare -a products=([0]="products_product1" [1]="products_product2")
declare -A products_product1=([price]="100" [name]="Foo" )
declare -A products_product2=([price]="200" [name]="Bar" )

在我的模板系统中，我现在可以像这样访问这个 yml 数据：

{{#foreach product in products}}
  Name:  {{product.name}}
  Price: {{product.price}}
{{/foreach}}

:)

另一个例子：

文件site.yml

meta_info:
  title: My cool blog
  domain: foo.github.io
author1:
  name: bob
  url: /author/bob
author2:
  name: jane
  url: /author/jane
header_links:
  link1:
    title: About
    url: about.html
  link2:
    title: Contact Us
    url: contactus.html
js_deps:
  cashjs: cashjs
  jets: jets
Foo:
  - one
  - two
  - three

产生：

declare -a site=([0]="site_meta_info" [1]="site_author1" [2]="site_author2" [3]="site_header_links" [4]="site_js_deps" [5]="site_Foo")
declare -A site_meta_info=([title]="My cool blog" [domain]="foo.github.io" )
declare -A site_author1=([url]="/author/bob" [name]="bob" )
declare -A site_author2=([url]="/author/jane" [name]="jane" )
declare -A site_header_links=([link1]="site_link1" [link2]="site_link2" )
declare -A site_link1=([url]="about.html" [title]="About" )
declare -A site_link2=([url]="contactus.html" [title]="Contact Us" )
declare -A site_js_deps=([cashjs]="cashjs" [jets]="jets" )
declare -a site_Foo=([0]="one" [1]="two" [2]="three")

在我的模板中，我可以像这样访问 site_header_links：

{{#foreach link in site_header_links}}
  * {{link.title}} - {{link.url}}
{{/foreach}}

和site_Foo（破折号，或简单的列表）像这样：

{{#site_Foo}}
  * {{.}}
{{/site_Foo}}

如何将 YAML 数据解析为自定义 Bash 数据 array/hash 结构？

How to parse YAML data into a custom Bash data array/hash structure?

arrays

bash

yaml

associative-array

我尝试了什么：

另一个例子：