多个键模式的高效 Redis SCAN

Efficient Redis SCAN of multiple key patterns

我正在尝试通过对我的数据进行 SCAN 操作来支持一些多选查询和筛选操作,但我不确定我的方向是否正确。

我正在使用 AWS ElastiCache (Redis 5.0.6)。

按键设计:<配方id>:<配方名称>:<配方类型>:<原产国>

示例:

13434:Guacamole:Dip:Mexico
34244:Gazpacho:Soup:Spain
42344:Paella:Dish:Spain
23444:HotDog:StreetFood:USA
78687:CustardPie:Dessert:Portugal
75453:Churritos:Dessert:Spain

如果我想使用复杂的多选过滤器(例如 return 所有键匹配来自两个不同国家的五种食谱类型)来支持查询,而 SCAN glob 样式匹配模式不能' t处理,对于生产场景来说,常见的处理方法是什么?

假设我将通过对所有场交替模式和多场滤波器进行笛卡尔积来计算所有可能的模式:

[[Guacamole, Gazpacho], [Soup, Dish, Dessert], [Portugal]]
*:Guacamole:Soup:Portugal
*:Guacamole:Dish:Portugal
*:Guacamole:Dessert:Portugal
*:Gazpacho:Soup:Portugal
*:Gazpacho:Dish:Portugal
*:Gazpacho:Dessert:Portugal

我应该使用什么机制在 Redis 中实现这种模式匹配?

  1. 按顺序对每个可扫描模式执行多个 SCAN 并合并结果?
  2. LUA 脚本在扫描键时对每个模式使用改进的模式匹配,并在单个 SCAN?
  3. 中获取所有匹配键
  4. 建立在排序集合之上的索引,支持快速查找匹配单个字段的键,用ZUNIONSTORE解决同一字段的匹配交替,用ZINTERSTORE解决不同字段的交集?

<recipe name>:: => key1, key2, keyN
:<recipe type>: => key1, key2, keyN
::<country of origin> => key1, key2, keyN

  1. 建立在排序集之上的索引支持快速查找匹配所有维度组合的键,从而避免并集和交集但会浪费更多存储空间并扩展我的索引键空间足迹?

<recipe name>:: => key1, key2, keyN
<recipe name>:<recipe type>: => key1, key2, keyN
<recipe name>::<country of origin> => key1, key2, keyN
:<recipe type>: => key1, key2, keyN
:<recipe type>:<country of origin> => key1, key2, keyN
::<country of origin> => key1, key2, keyN

  1. 利用 RedisSearch? (虽然对我的用例来说不可能,但请参阅 Tug Grall 答案,这似乎是一个非常好的解决方案。)
  2. 其他?

我已经实现了 1),但性能很糟糕。

private static HashSet<String> redisScan(Jedis jedis, String pattern, int scanLimitSize) {

    ScanParams params = new ScanParams().count(scanLimitSize).match(pattern);

    ScanResult<String> scanResult;
    List<String> keys;
    String nextCursor = "0";
    HashSet<String> allMatchedKeys = new HashSet<>();

    do {
        scanResult = jedis.scan(nextCursor, params);
        keys = scanResult.getResult();
        allMatchedKeys.addAll(keys);
        nextCursor = scanResult.getCursor();
    } while (!nextCursor.equals("0"));

    return allMatchedKeys;

}

private static HashSet<String> redisMultiScan(Jedis jedis, ArrayList<String> patternList, int scanLimitSize) {

    HashSet<String> mergedHashSet = new HashSet<>();
    for (String pattern : patternList)
        mergedHashSet.addAll(redisScan(jedis, pattern, scanLimitSize));

    return mergedHashSet;
}

对于 2) 我已经创建了一个 Lua 脚本来帮助服务器端 SCAN 并且性能并不出色但比 1) 快得多,甚至考虑到 Lua 不支持交替匹配模式,我必须通过模式列表循环每个键以进行验证:

local function MatchAny( str, pats )
    for pat in string.gmatch(pats, '([^|]+)') do
        local w = string.match( str, pat )
        if w then return w end
    end
end

-- ARGV[1]: Scan Count
-- ARGV[2]: Scan Match Glob-Pattern
-- ARGV[3]: Patterns

local cur = 0
local rep = {}
local tmp

repeat
  tmp = redis.call("SCAN", cur, "MATCH", ARGV[2], "count", ARGV[1])
  cur = tonumber(tmp[1])
  if tmp[2] then
    for k, v in pairs(tmp[2]) do
      local fi = MatchAny(v, ARGV[3])
      if (fi) then
        rep[#rep+1] = v
      end
    end
  end
until cur == 0
return rep

这样调用:

private static ArrayList<String> redisLuaMultiScan(Jedis jedis, String luaSha, List<String> KEYS, List<String> ARGV) {
    Object response = jedis.evalsha(luaSha, KEYS, ARGV);
    if(response instanceof List<?>)
        return (ArrayList<String>) response;
    else
        return new ArrayList<>();
}  

对于 3) 我已经使用 Sorted Sets 为 3 个字段中的每一个实现并维护了一个二级索引更新,并在单个字段和多字段匹配模式上实现了交替匹配模式的查询,如下所示:

private static Set<String> redisIndexedMultiPatternQuery(Jedis jedis, ArrayList<ArrayList<String>> patternList) {

    ArrayList<String> unionedSets = new ArrayList<>();
    String keyName;
    Pipeline pipeline = jedis.pipelined();

    for (ArrayList<String> subPatternList : patternList) {
        if (subPatternList.isEmpty()) continue;
        keyName = "un:" + RandomStringUtils.random(KEY_CHAR_COUNT, true, true);
        pipeline.zunionstore(keyName, subPatternList.toArray(new String[0]));
        unionedSets.add(keyName);
    }

    String[] unionArray = unionedSets.toArray(new String[0]);
    keyName = "in:" + RandomStringUtils.random(KEY_CHAR_COUNT, true, true);
    pipeline.zinterstore(keyName, unionArray);
    Response<Set<String>> response = pipeline.zrange(keyName, 0, -1);
    pipeline.del(unionArray);
    pipeline.del(keyName);
    pipeline.sync();

    return response.get();
}

我的压力测试用例的结果在请求延迟方面明显支持 3):

我会投票给选项 3,但我可能会开始使用 RediSearch

你也看过 RediSearch 吗?此模块允许您创建二级索引并进行复杂查询和全文搜索。

这可能会简化您的开发。

我请你看project and Getting Started

安装完成后,您可以使用以下命令实现它:


HSET recipe:13434 name "Guacamole" type "Dip" country "Mexico" 

HSET recipe:34244 name "Gazpacho" type "Soup" country "Spain"

HSET recipe:42344 name "Paella"  type "Dish" country "Spain"

HSET recipe:23444 name "Hot Dog"  type "StreetFood" country "USA"

HSET recipe:78687  name "Custard Pie"  type  "Dessert" country "Portugal"

HSET recipe:75453  name "Churritos" type "Dessert" country "Spain"

FT.CREATE idx:recipe ON HASH PREFIX 1 recipe: SCHEMA name TEXT SORTABLE type TAG SORTABLE country TAG SORTABLE

FT.SEARCH idx:recipe "@type:{Dessert}"

FT.SEARCH idx:recipe "@type:{Dessert} @country:{Spain}" RETURN 1 name

FT.AGGREGATE idx:recipe "*" GROUPBY 1 @type REDUCE COUNT 0 as nb_of_recipe

我不会在这里详细解释所有命令,因为您可以在教程中找到解释,但这里是基础知识:

  • 使用哈希存储食谱
  • 创建RediSearch索引,索引你要查询的字段
  • 运行查询,例如:
    • 获得所有西班牙沙漠:FT.SEARCH idx:recipe "@type:{Dessert} @country:{Spain}" RETURN 1 name
    • 按类型统计菜谱数量:FT.AGGREGATE idx:recipe "*" GROUPBY 1 @type REDUCE COUNT 0 as nb_of_recipe

我最终使用一个简单的策略在创建键时更新每个字段的每个二级索引:

protected static void setKeyAndUpdateIndexes(Jedis jedis, String key, String value, int idxDimSize) {
    String[] key_arr = key.split(":");
    Pipeline pipeline = jedis.pipelined();

    pipeline.set(key, value);
    for (int y = 0; y < key_arr.length; y++)
        pipeline.zadd(
                "idx:" +
                    StringUtils.repeat(":", y) +
                    key_arr[y] +
                    StringUtils.repeat(":", idxDimSize-y),
                java.time.Instant.now().getEpochSecond(),
                key);

    pipeline.sync();
}

查找与包括交替模式和 multi-field 过滤器的模式匹配的多个键的搜索策略是这样实现的:

private static Set<String> redisIndexedMultiPatternQuery(Jedis jedis, ArrayList<ArrayList<String>> patternList) {

    ArrayList<String> unionedSets = new ArrayList<>();
    String keyName;
    Pipeline pipeline = jedis.pipelined();

    for (ArrayList<String> subPatternList : patternList) {
        if (subPatternList.isEmpty()) continue;
        keyName = "un:" + RandomStringUtils.random(KEY_CHAR_COUNT, true, true);
        pipeline.zunionstore(keyName, subPatternList.toArray(new String[0]));
        unionedSets.add(keyName);
    }

    String[] unionArray = unionedSets.toArray(new String[0]);
    keyName = "in:" + RandomStringUtils.random(KEY_CHAR_COUNT, true, true);
    pipeline.zinterstore(keyName, unionArray);
    Response<Set<String>> response = pipeline.zrange(keyName, 0, -1);
    pipeline.del(unionArray);
    pipeline.del(keyName);
    pipeline.sync();

    return response.get();
}