将字符串数组平均拆分为子数组,同时使用带有 jq 的过滤器

Split string array evenly into sub arrays whilst using a filter with jq

鉴于我有以下 json

[
    "/home/test-spa/src/components/modals/super-admin/tests/integration/index.test.tsx",
    "/home/test-spa/src/components/modals/delete-user/tests/index.test.tsx",
    "/home/test-spa/src/components/modals/edit-admin/tests/integration/index.test.tsx",
    "/home/test-spa/src/components/modals/delete-admin/tests/index.test.tsx",
    "/home/test-spa/src/components/modals/add-user/tests/integration/index.test.tsx",
    "/home/test-spa/src/components/modals/add-admin/tests/integration/index.test.tsx",
    "/home/test-spa/src/components/modals/edit-user/tests/index.test.tsx",
    "/home/test-spa/src/components/modals/change-user/tests/index.test.tsx",
    "/home/test-spa/src/other-directory/modals/tests/index.test.ts",
    "/home/test-spa/src/directory/modals/tests/index.test.ts",
]
  1. 我想排除字符串目录其他目录的任何内容
  2. 然后我想将数组拆分为 4 个数组,但我想平均拆分字符串中具有 integration 的任何内容,即我不希望所有集成都在一个数组中.然后可以将任何其他字符串拆分到 4 个数组中。

我想使用 jq 来执行此过滤器。以下代码允许我将 json 拆分为 4,但不执行上述所需的过滤。

jq -cM '[_nwise(length / 4 | floor)]'

因此我正在寻找类似于以下输出的内容(只要集成测试尽可能均匀地拆分,其他字符串就可以均匀填充并且顺序无关紧要)

[
    [
        "/home/test-spa/src/components/modals/super-admin/tests/integration/index.test.tsx",
        "/home/test-spa/src/components/modals/delete-user/tests/index.test.tsx"
    ],
    [
        "/home/test-spa/src/components/modals/edit-admin/tests/integration/index.test.tsx",
        "/home/test-spa/src/components/modals/delete-admin/tests/index.test.tsx"
    ],
    [
        "/home/test-spa/src/components/modals/add-admin/tests/integration/index.test.tsx",
        "/home/test-spa/src/components/modals/edit-user/tests/index.test.tsx"
    ],
    [
        "/home/test-spa/src/components/modals/add-user/tests/integration/index.test.tsx",
        "/home/test-spa/src/components/modals/change-user/tests/index.test.tsx"
    ]
]

如果桶的数量是预先确定的

这是一个通用的“循环法”函数,编写后可以有效地执行“有”和“没有”字符串的分布(即,无需连接任何数组):

# s is a stream, $n a predetermined number of buckets
def roundrobin(s; $n):
   reduce s as $s ({n: 0, a: []}; .a[.n % $n] += [$s] | .n+=1) | .a;
# First exclude the unwanted elements:
  map(select(test("(other-)?directory")|not))
# Perform the required round-robin:
  | roundrobin( (.[] | select(index("integration"))),
                (.[] | select(index("integration")|not));  4)

如果桶数是数据驱动的

如果桶的数量应该取决于指定字符串的出现次数,那么使用上面定义的 roundrobin 过滤器,一个合理有效的解决方案可以写成如下:

# First exclude the unwanted elements:
  map(select(test("(other-)?directory")|not))
# Form an array of the strings with the specified substring
  | map(select(index("integration"))) as $has
# Perform the required round-robin:
  | roundrobin( $has[], ((.-$has)[]); $has|length)

这是我想到的,分成 N 个桶:

def bucket_shift($n):
    # loop through all input, shift each elem into bucket 
    reduce .[] as $elem ( { count: 0, rv: [] };
                          (.rv[(.count % $n)] += [$elem] | .count += 1))
                           | .rv ;

# get rid of everything with directory or other-directory
[ .[] | select(test("directory|other-directory") | not) ]

# grab all lines with "integration" in an array
 | [ ([ .[] | select(test("integration")) ]),
# grab all lines without "integration" into a second array
     ([ .[] | select(test("integration") | not) ]) ]
# flatten and divide into buckets (arg passed in)
 | flatten | bucket_shift($num_buckets|tonumber)

我在你的输入中标记了每一行,这样我可以更容易地跟踪它们,然后添加了几行额外的行,这样结果就不会被你想要的桶数整除,以确保它会平衡出色地。 I 和 J 行应该被过滤掉。

<~> $ jq . /tmp/so.json
[
  "A/home/test-spa/src/components/modals/super-admin/tests/integration/index.test.tsx",
  "B/home/test-spa/src/components/modals/delete-user/tests/index.test.tsx",
  "C/home/test-spa/src/components/modals/edit-admin/tests/integration/index.test.tsx",
  "D/home/test-spa/src/components/modals/delete-admin/tests/index.test.tsx",
  "E/home/test-spa/src/components/modals/add-user/tests/integration/index.test.tsx",
  "F/home/test-spa/src/components/modals/add-admin/tests/integration/index.test.tsx",
  "G/home/test-spa/src/components/modals/edit-user/tests/index.test.tsx",
  "H/home/test-spa/src/components/modals/change-user/tests/index.test.tsx",
  "IX/home/test-spa/src/other-directory/modals/tests/index.test.ts",
  "JX/home/test-spa/src/directory/modals/tests/index.test.ts",
  "K/home/test-spa/src/components/modals/change-user/tests/index.test.tsx",
  "L/home/test-spa/src/components/modals/add-admin/tests/integration/index.test.tsx"
]

脚本如上:

<~> $ cat /tmp/so.jq
def bucket_shift($n):
    # loop through all input, shift each elem into bucket 
    reduce .[] as $elem ( { count: 0, rv: [] };
                          (.rv[(.count % $n)] += [$elem] | .count += 1))
                           | .rv ;

# get rid of everything with directory or other-directory
[ .[] | select(test("directory|other-directory") | not) ]

# grab all lines with "integration" in an array
 | [ ([ .[] | select(test("integration")) ]),
# grab all lines without "integration" into a second array
     ([ .[] | select(test("integration") | not) ]) ]
# flatten and divide into buckets (arg passed in)
 | flatten | bucket_shift($num_buckets|tonumber)

分成 4 个桶:

<~> $ jq --arg num_buckets 4 -f /tmp/so.jq /tmp/so.json
[
  [
    "A/home/test-spa/src/components/modals/super-admin/tests/integration/index.test.tsx",
    "L/home/test-spa/src/components/modals/add-admin/tests/integration/index.test.tsx",
    "H/home/test-spa/src/components/modals/change-user/tests/index.test.tsx"
  ],
  [
    "C/home/test-spa/src/components/modals/edit-admin/tests/integration/index.test.tsx",
    "B/home/test-spa/src/components/modals/delete-user/tests/index.test.tsx",
    "K/home/test-spa/src/components/modals/change-user/tests/index.test.tsx"
  ],
  [
    "E/home/test-spa/src/components/modals/add-user/tests/integration/index.test.tsx",
    "D/home/test-spa/src/components/modals/delete-admin/tests/index.test.tsx"
  ],
  [
    "F/home/test-spa/src/components/modals/add-admin/tests/integration/index.test.tsx",
    "G/home/test-spa/src/components/modals/edit-user/tests/index.test.tsx"
  ]
]

改为分为 3 个桶:

<~> $ jq --arg num_buckets 3 -f /tmp/so.jq /tmp/so.json
[
  [
    "A/home/test-spa/src/components/modals/super-admin/tests/integration/index.test.tsx",
    "F/home/test-spa/src/components/modals/add-admin/tests/integration/index.test.tsx",
    "D/home/test-spa/src/components/modals/delete-admin/tests/index.test.tsx",
    "K/home/test-spa/src/components/modals/change-user/tests/index.test.tsx"
  ],
  [
    "C/home/test-spa/src/components/modals/edit-admin/tests/integration/index.test.tsx",
    "L/home/test-spa/src/components/modals/add-admin/tests/integration/index.test.tsx",
    "G/home/test-spa/src/components/modals/edit-user/tests/index.test.tsx"
  ],
  [
    "E/home/test-spa/src/components/modals/add-user/tests/integration/index.test.tsx",
    "B/home/test-spa/src/components/modals/delete-user/tests/index.test.tsx",
    "H/home/test-spa/src/components/modals/change-user/tests/index.test.tsx"
  ]
]

要有一个默认的桶大小,你可以这样做:

bucket_shift($ARGS.named["num_buckets"] // 4|tonumber)