执行 google.cloud.happybase Bigtable RowKeyRegexFilter 扫描

Perform a google.cloud.happybase Bigtable RowKeyRegexFilter Scan

UPDATE: This only happens with Google Cloud Bigtable Emulator, not with actual development or production BigTable instances (Google Cloud SDK 149.0.0)

我正在尝试通过 Key regex 过滤器进行行过滤,一切都很顺利(按前缀过滤,按键开始和停止范围过滤,按键,按键)但我无法得到它在 RowKeyRegexFilteras 过滤器中工作,它只是 returns 所有键作为空键 scan:

# all the boilerplate to create a happybase connection skipped 
t = connection.table("sometable")
t.put(
    b'row1',
    {
       b"family1:col2": b".1",
       b"family2:col2": b".12",
    }
)
t.put(
    b'row2',
    {
       b"family1:col2": b".2",
       b"family2:col2": b".22",
    }
)
t.put(
    b'row3',
    {
       b"family1:col2": b".3",
       b"family2:col2": b".32",
    }
)
rows = t.scan(
    filter=RowKeyRegexFilter(b'.+3')
)
print(len([i for i in rows])

总是给出 3,无论你是否将 (nomatchforsure)+ 作为正则表达式,我都找不到任何带有工作示例的文档,最令人惊讶的是 google.cloud.happybase.table.Table.rows始终使用 RowKeyRegexFilter 按行键执行过滤,但将正则表达式传递给 rows 方法而不是真正的行键也不提供正则表达式过滤,您可以看到它

此处:https://github.com/GoogleCloudPlatform/google-cloud-python-happybase/blob/master/src/google/cloud/happybase/table.py#L197

这里:https://github.com/GoogleCloudPlatform/google-cloud-python-happybase/blob/master/src/google/cloud/happybase/table.py#L971

如有任何帮助,我们将不胜感激

UPDATE: It's actually annotated in the docs as noticed by @gary-elliott: https://cloud.google.com/bigtable/docs/emulator#filters Regular expressions must contain only valid UTF-8 characters, unlike the actual Cloud Bigtable service which can process regular expressions as arbitrary bytes. Although something simple like (notmatchforsure)+is not working either although it seems containing valid UTF8 characters, on my testings I would say it is not limited, but generally speaking not working. Anyway is correctly warned in docs.

实际问题是模拟器上的bug,我更新了答案以避免误导反馈,解决方案是创建一个开发实例来测试代码,所以现在如果您想在 BigTable 中使用 Regex 过滤器进行一些开发,您至少需要创建(并支付...)一个开发实例(响应时为 $0.65/小时,$0.17/GB)。希望这会有所帮助,就好像有人希望玩模拟器一样,他可能会像我一样被困几个小时。