BigTable:2 次写入同一个键,但有 3 个版本

BigTable: 2 Writes to the same key, but 3 versions

有时,如果我将多个版本写入同一个行键,并在多个批处理突变中使用多个列族(每个版本与多个写入一起批处理)。

这是由于数据压缩导致的预期行为吗?随着时间的推移,额外的版本会被删除吗?

这里的问题是您将这两列放在批处理中的两个单独的条目中,这意味着即使它们具有相同的行,它们也不会自动应用。

批量条目可以单独成功或失败,然后客户端将仅重试失败的条目。例如,如果一个条目成功而另一个超时但后来默默地成功,则重试“失败”条目可能会导致您看到的部分写入结果。

因此,在 python 中,您应该执行以下操作(改编自 cloud.google.com/bigtable/docs/samples-python-hello):

print('Writing some greetings to the table.')
greetings = ['Hello World!', 'Hello Cloud Bigtable!', 'Hello Python!']
rows = []
column1 = 'greeting1'.encode()
column1 = 'greeting2'.encode()
for i, value in enumerate(greetings):
    # Note: This example uses sequential numeric IDs for simplicity,
    # but this can result in poor performance in a production
    # application.  Since rows are stored in sorted order by key,
    # sequential keys can result in poor distribution of operations
    # across nodes.
    #
    # For more information about how to design a Bigtable schema for
    # the best performance, see the documentation:
    #
    #     https://cloud.google.com/bigtable/docs/schema-design
    row_key = 'greeting{}'.format(i).encode()
    row = table.row(row_key)

    # **Multiple calls to 'set_cell()' are allowed on the same batch
    # entry. Each entry will be applied atomically, but a separate
    # 'row' in the same batch will be applied separately even if it
    # shares its row key with another entry.**
    row.set_cell(column_family_id,
                 column1,
                 value,
                 timestamp=datetime.datetime.utcnow())
    row.set_cell(column_family_id,
                 column2,
                 value,
                 timestamp=datetime.datetime.utcnow())
    rows.append(row)
table.mutate_rows(rows)