在 Azure SQL 数据仓库中将二进制列转换为字符串

Cast binary column to string in Azure SQL Data Warehouse

我目前在 Postgres 和 Redshift 中有一些函数,它们采用随机生成的字符串,对其进行哈希处理,然后使用部分哈希生成一个 0-99 之间的随机数。我正在尝试在 Azure SQL 数据仓库中复制此功能,以便我在 SQL DW 中获得与在 Postgres 和 Redshift 中相同的值。

我 运行 遇到的问题是,当我将结果转换为 VARCHAR 或使用字符串函数时,结果是一个截然不同的字符串。我想将 md5 函数的结果作为相同的 VARCHAR.

为了说明,这是 Azure SQL DW 中的一个查询:

SELECT
  'abc123' as random_string,
  HASHBYTES('md5', 'abc123') as md5,
  CAST(HASHBYTES('md5', 'abc123') AS VARCHAR) as md5_varchar,
  RIGHT(HASHBYTES('md5', 'abc123'), 5) as md5_right
;

这会产生

random_string,md5,md5_varchar
abc123,0xE99A18C428CB38D5F260853678922E03,éšÄ(Ë8Õò`…6x’.,6x’.

如您所见,生成的 varchar 与 md5 函数的输出有很大不同。有没有办法将 md5 的结果转换成相同的字符串?

在 Postgres 和 Redshift 中,md5 函数的结果是 VARCHAR,因此对其进行转换很简单。

以下是 Redshift 和 Postgres 中的查询:

-- Redshift
SELECT
  'abc123' as random_string,
  right(strtol(right(md5('abc123'), 3), 16), 2)::INT as tranche
;

-- Postgres
SELECT
  'abc123' as random_string,
  right(('x' || lpad(right(md5('abc123'), 3), 4, '0')) :: BIT(16) :: INT :: VARCHAR, 2) :: INT AS tranche
;

两个函数 return 值 87

使用转换应该可以解决该问题:

CONVERT(VARCHAR(32),HashBytes('MD5', 'abc123'),2)

因为你可以定义样式的参数,这是我们转换varbinary值时需要的。它在这里描述: https://technet.microsoft.com/pl-pl/library/ms187928(v=sql.105).aspx

这是该文档中有关使用 convert 进行二进制转换的备注部分:

Binary Styles When expression is binary(n), varbinary(n), char(n), or varchar(n), style can be one of the values shown in the following table. Style values that are not listed in the table return an error.

0 (default)

Translates ASCII characters to binary bytes or binary bytes to ASCII characters. Each character or byte is converted 1:1. If the data_type is a binary type, the characters 0x are added to the left of the result.

1, 2

If the data_type is a binary type, the expression must be a character expression. The expression must be composed of an even number of hexadecimal digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F, a, b, c, d, e, f). If the style is set to 1 the characters 0x must be the first two characters in the expression. If the expression contains an odd number of characters or if any of the characters are invalid an error is raised. If the length of the converted expression is greater than the length of the data_type the result will be right truncated. Fixed length data_types that are larger then the converted result will have zeros added to the right of the result. If the data_type is a character type, the expression must be a binary expression. Each binary character is converted into two hexadecimal characters. If the length of the converted expression is greater than the data_type length it will be right truncated. If the data_type is a fix sized character type and the length of the converted result is less than its length of the data_type; spaces are added to the right of the converted expression to maintain an even number of hexadecimal digits. The characters 0x will be added to the left of the converted result for style 1.