在 MySQL 和 MariaDB 上按 JSON 数组列排序行

Ordering rows by JSON array column on MySQL & MariaDB

PostgreSQL 允许行按 arrays. It compares the first value of each array, then the second value and so on (fiddle):

排序
select array[2, 4] as "array"
union
select array[10] as "array"
union
select array[2, 3, 4] as "array"
union
select array[10, 11] as "array"
order by "array"
array
[2, 3, 4]
[2, 4]
[10]
[10, 11]

MySQL 和 MariaDB 上最接近的等效项似乎是 JSON arrays

MySQL 显然按长度 排序数组 或多或少是随机的 (fiddle):

select json_array(2, 4) as `array`
union
select json_array(10) as `array`
union
select json_array(2, 3, 4) as `array`
union
select json_array(10, 11) as `array`
order by `array`
array
[10]
[2, 4]
[10, 11]
[2, 3, 4]

MariaDB 有点按值排序,但这样做不正确 (fiddle)。整数像字符串一样排序(102 之前)并且具​​有相同开头的数组是相反的([10, 11][10] 之前):

select json_array(2, 4) as `array`
union
select json_array(10) as `array`
union
select json_array(2, 3, 4) as `array`
union
select json_array(10, 11) as `array`
order by `array`
array
[10, 11]
[10]
[2, 3, 4]
[2, 4]

有没有办法在 MySQL and/or MariaDB 上复制 PostgreSQL 的数组排序?

数组可以有任意长度,我不知道最大长度。

我目前看到的唯一 workaround/hack 是 将数组转换为字符串并用 0s 向左填充值以保持相同的长度:002.004, 010.011

使用JSON_VALUE

WITH cte AS (
  select json_array(2, 4) as `array`
  union
  select json_array(10) as `array`
  union
  select json_array(2, 3, 4) as `array`
  union
  select json_array(10, 11) as `array`
)
select *
from cte
order by CAST(JSON_VALUE(`array`, '$[0]') AS INT),
         CAST(JSON_VALUE(`array`, '$[1]') AS INT),
         CAST(JSON_VALUE(`array`, '$[2]') AS INT)
        -- ...;


-- MySQL 8.0.21+
select *
from cte
order by
 JSON_VALUE(`array`, '$[0]' RETURNING SIGNED),
 JSON_VALUE(`array`, '$[1]' RETURNING SIGNED),
 JSON_VALUE(`array`, '$[2]' RETURNING SIGNED)

db<>fiddle demo

输出:

我觉得这像是一个错误。根据docs

Two JSON arrays are equal if they have the same length and values in corresponding positions in the arrays are equal.

If the arrays are not equal, their order is determined by the elements in the first position where there is a difference. The array with the smaller value in that position is ordered first. If all values of the shorter array are equal to the corresponding values in the longer array, the shorter array is ordered first.

但是ORDER BY看起来根本不遵守这些规则。

这是 DB fiddle MySQL 8 & 5.7

我正在使用 CROSS JOIN 和显式比较来获得预期的顺序。

SELECT f.`array`, SUM(f.`array` > g.`array`) cmp
FROM jsons f
CROSS JOIN jsons g
GROUP BY f.`array`
ORDER BY cmp
;

对于MySQL 5.7 还有一个观察,当使用子查询时,> 正在做类似字符串比较的事情,它需要再次转换为JSON 才能得到正确的结果,而MySQL8不需要这样做。

SELECT f.`array`, SUM(CAST(f.`array` AS JSON) > CAST(g.`array` AS JSON)) cmp
FROM (
 select json_array(2, 4) as `array`
 union
 select json_array(10) as `array`
 union
 select json_array(2, 3, 4) as `array`
 union
 select json_array(10, 11) as `array`
) f
CROSS JOIN (
 select json_array(2, 4) as `array`
 union
 select json_array(10) as `array`
 union
 select json_array(2, 3, 4) as `array`
 union
 select json_array(10, 11) as `array`
) g
GROUP BY f.`array`
ORDER BY cmp
;

以上在 MariaDB 中不起作用.

https://mariadb.com/kb/en/incompatibilities-and-feature-differences-between-mariadb-106-and-mysql-80/

In MySQL, JSON is compared according to json values. In MariaDB JSON strings are normal strings and compared as strings.

下面的查询适用于 MariaDB

WITH RECURSIVE jsons AS (
 select json_array(2, 4) as `array`
 union
 select json_array(10) as `array`
 union
 select json_array(2, 3, 4) as `array`
 union
 select json_array(10, 11) as `array`
),
maxlength AS (
 SELECT MAX(JSON_LENGTH(`array`)) maxlength
 FROM jsons
),
numbers AS (
 SELECT 0 AS n
 FROM maxlength
 UNION ALL
 SELECT n + 1
 FROM numbers
 JOIN maxlength ON numbers.n < maxlength.maxlength - 1
),
expanded AS (
 SELECT a.`array`, b.n, JSON_EXTRACT(a.`array`, CONCAT('$[', b.n, ']')) v
 FROM jsons a
 CROSS JOIN numbers b
),
maxpadding AS (
 SELECT MAX(LENGTH(v)) maxpadding
 FROM expanded
)
SELECT a.`array`
FROM expanded a
CROSS JOIN maxpadding b
GROUP BY a.`array`
ORDER BY GROUP_CONCAT(LPAD(a.v, b.maxpadding, '0') ORDER BY a.n ASC)

documentation currently says that:

ORDER BY and GROUP BY for JSON values works according to these principles:

[...]

  • Sorting of nonscalar values is not currently supported and a warning occurs.

JSON 数组是非标量值,您的代码 does produce the following warning in MySQL 8:

Level Code Message
Warning 1235 This version of MySQL doesn't yet support 'sorting of non-scalar JSON values'

不幸的是,除了等待 MySQL 实现上述功能外,您无能为力。或者使用像这样的 hack,它需要 MySQL 8 JSON_TABLE 将 json 数组拆分成行,然后填充值并将它们再次分组连接以创建可排序的字符串:

select *, (
    select group_concat(lpad(jt.v, 8, '0') order by jt.i)
    from json_table(t.array, '$[*]' columns(i for ordinality, v int path '$')) as jt
) as sort_str
from t
order by sort_str

Demo on db<>fiddle

这是一个提供的解决方案:

  • 支持负数

  • 支持浮点数

  • 避免输入长的 CTE 查询*

* 这里的优点是当你必须频繁输入查询时,CTE 仍然是一个不错的选择

您只需select * from data order by json_weight(json_column,base_value);

为了能够做到这一点,创建这四个函数 json_maxjson_weightjson_maxdigitsjson_pad 并在 order by 子句中使用它们:

delimiter //
create or replace function json_max(j json) returns float deterministic
  begin
    declare l int;
    declare mv float;
    declare v float;
    set l = json_length(j);
    for i in 0..l-1 do
      set v = abs(json_value(j,concat('$[',i,']')));
      if (mv is null) or (v > mv) then
        set mv = v;
      end if;
    end for;
    return mv;
  end
//
create or replace function json_weight(j json, base int) returns float deterministic
  begin
    declare l int;
    declare w float;
    set w = 0;
    set l = json_length(j);
    for i in 0..l-1 do
      set w = w + pow(base,-i) * json_value(j,concat('$[',i,']'));
    end for;
    return w;
  end
//
create or replace function json_maxdigits(j json) returns int deterministic
  return length(cast(floor(abs(json_max(j))) as char(16)))
//
create or replace function json_pad(j json, digitcount int) returns varchar(512) deterministic
  begin
    declare l int;
    declare v int;
    declare w varchar(512);
    set w = '';
    set l = json_length(j);
    for i in 0..l-1 do
      set v = json_value(j,concat('$[',i,']'));
      set w = concat(w, if(v>=0,'0','-'), lpad(v, digitcount, 0));
    end for;
    return w;
  end
//
delimiter ;

然后按如下方式使用它们:

select * from (
select json_array(2, 4) as `array`
union
select json_array(10) as `array`
union
select json_array(2, 3, 4) as `array`
union
select json_array(10, 11) as `array`
) data order by json_weight(`array`,max(json_max(`array`)) over ());
-- or if you know that 11 is the max value:
--) data order by json_weight(`array`,11);
-- alternative method:
--) data order by json_pad(`array`,max(json_maxdigits(`array`)) over ());
-- alternative method and you know that only two digits are enough to represent numbers in the array:
--) data order by json_pad(`array`,2);

解释:

json_max 给出 json_array:

中的最大绝对值
select json_max('[22,33,-55]'); -- 55

json_maxdigits 给出 json_array:

中的最大位数(绝对数)
select json_maxdigits('[21,151,-4]'); -- 3

json_weight 将您的 json 数组转换为浮点等效值,其中数组的每个数字都等于您指定为参数的基数中的数字:

select json_weight('[1,3,5,7]', 10); -- 1.357
select json_weight('[1,0,1]', 2); -- 1.25 (like binary floats)

json_pad 将您的 json 数组转换为零填充数字的字符串,并包含负号作为额外符号以保证负序(或额外符号 0 否则因为+ 小于 - 按 ascii 顺序):

select json_pad('[1,-3,15,7]', 2); --'001-03015007'

您可以使用浮点权重或填充字符串对查询结果集进行排序。提供这两个选项是因为:

  • 当你有长 json 数组但有浮点支持时,浮点权重会失去精度
  • 填充字符串精度很高,这里设置为512位,你甚至可以增加这个数字,但是他们不提供浮点数支持(反正你没要求)。

如果您使用浮动权重,则必须设置基准。您可以手动设置它或使用最大的数字作为基数,您可以使用 max(json_max(column_name)) over () 获得它。如果您使用小于此最大值的基值,您可能会得到不一致的结果,如果您使用的数字太高,您将失去精度。

类似地,当使用填充字符串进行排序时,您必须提供最大绝对值所消耗的最大位数(-35 将是 2 个绝对数字)。

注意:这些函数适用于早期版本的 MariaDB,但仍不支持 json_table 函数。

如果您不能对数组的长度做出假设,并且您不想使用将数组重新格式化为填充值字符串等技巧,那么您不能在单个查询中执行此操作.

ORDER BY 子句中的表达式必须在查询开始读取任何行之前固定,就像查询的其他部分一样,例如 select-list 的列。

但您可以使用查询生成动态 SQL 查询,在 ORDER BY 子句中包含足够的术语以说明最大长度数组。

演示:

create table mytable (array json);

insert into mytable values  ('[2, 3, 4]'), ('[2, 4]'), ('[10]'), ('[10, 11]');

select max(json_length(array)) as maxlength from mytable;
+-----------+
| maxlength |
+-----------+
|         3 |
+-----------+

然后制作一个递归 CTE,生成从 0 到最大长度减 1 的整数:

with recursive array as (
    select max(json_length(array)) as maxlength from mytable
),
num as (
    select 0 as num
    union
    select num+1 from num cross join array where num < maxlength-1
)   
select num from num;
+------+
| num  |
+------+
|    0 |
|    1 |
|    2 |
+------+

这些整数可用于格式化表达式以在 ORDER BY 子句中使用:

with recursive array as (
    select max(json_length(array)) as maxlength from mytable
),
num as (
    select 0 as num
    union
    select num+1 from num cross join array where num < maxlength-1
)
select concat('CAST(JSON_EXTRACT(array, ', quote(concat('$[', num, ']')), ') AS UNSIGNED)') AS expr from num;
+-----------------------------------------------+
| expr                                          |
+-----------------------------------------------+
| CAST(JSON_EXTRACT(array, '$[0]') AS UNSIGNED) |
| CAST(JSON_EXTRACT(array, '$[1]') AS UNSIGNED) |
| CAST(JSON_EXTRACT(array, '$[2]') AS UNSIGNED) |
+-----------------------------------------------+

然后使用这些表达式生成一个 SQL 查询:

with recursive array as (
    select max(json_length(array)) as maxlength from mytable
),
num as (
    select 0 as num
    union
    select num+1 from num cross join array where num < maxlength-1
),
orders as (
    select num, concat('CAST(JSON_EXTRACT(array, ', quote(concat('$[', num, ']')), ') AS UNSIGNED)') AS expr from num
)
select concat(
    'SELECT array FROM mytable\nORDER BY \n  ',
    group_concat(expr order by num separator ',\n  '),
    ';'
) as query
from orders\G

query: SELECT array FROM mytable
ORDER BY 
  CAST(JSON_EXTRACT(array, '$[0]') AS UNSIGNED),
  CAST(JSON_EXTRACT(array, '$[1]') AS UNSIGNED),
  CAST(JSON_EXTRACT(array, '$[2]') AS UNSIGNED);

最后,捕获该查询的结果,并将其作为新的动态 SQL 查询执行:

SELECT array FROM mytable
ORDER BY 
  CAST(JSON_EXTRACT(array, '$[0]') AS UNSIGNED),
  CAST(JSON_EXTRACT(array, '$[1]') AS UNSIGNED),
  CAST(JSON_EXTRACT(array, '$[2]') AS UNSIGNED);
+-----------+
| array     |
+-----------+
| [2, 3, 4] |
| [2, 4]    |
| [10]      |
| [10, 11]  |
+-----------+