MariaDB 和 Java 应用程序之间的归类不匹配

Question

我的应用程序执行两个单独的查询来接收两个独立的数据列表，这些数据是通过在多个列上排序来接收的，排序键。

// Get input1 from the DB (sorted on input1.row.sortkey)
// Get Input2 from the DB (sorted on input2.row.sortkey)
// Get the first row from input1 => input1.row
// Get the first row from input2 => input2.row
// Loop until input1 and input2 have been exhausted.
//    Compare input1.row.sortkey with input2.row.sortkey
//    if input1.row.sortkey == input2.row.sortkey
//       update existing data
//    else if input1.row.sortkey > input2.row.sortkey
//       insert new data
//    else // thus input1.row.sortkey < input2.row.sortkey
//       deprecate old data
//    endif
//    Get the next row from input1
//    Get the next row from input2

现在这一步出现问题：

Compare input1.row.sortkey with input2.row.sortkey

数据库中两个键的排序顺序在 java 代码中不同。

在 MariaDB 中，我们使用字符集 UTF8（应该是 UTF8mb4 但无法转换 a.t.m。）和排序规则 UTF8_general_ci 作为排序规则。

例如：

在数据库中，像 0BSwN39hRWmg6goA0BGPDQ 这样的键被认为在 0b_4GHGyQyKKyuXY-TBnwA 之前，但是对于 java 这是反之亦然。

如何调整这种行为？任何解决方案都可以。我调查了 RuleBasedCollator 的可能性，但这需要我定义整个排序规则图表。

Answer 1

（评论太长）

B 应该小于 b 吗？如果是这样，您需要 utf8_bin 整理。

S 应该小于'_'吗？任何 utf8 归类都有效。

但是...

你的算法（我认为）不正确...你不应该每次都得到两个新行；情况 2 和 3 只需要从一个输入流中获取新的输入行。

Answer 2

为了解决这个问题，我只是将 java 中的 _ 字符翻译成 Character.MAX_VALUE。这使得比较对我有用，但这只是因为我的排序键可能具有的一组可能字符（a-zA-Z、0-9、- 和 _）。不保证这适用于其他符号或特殊字符。

MariaDB 和 Java 应用程序之间的归类不匹配

Collation mismatch between MariaDB and Java application

java

collation

comparator

mariadb