BigTable:2 次写入同一个键,但有 3 个版本
BigTable: 2 Writes to the same key, but 3 versions
有时,如果我将多个版本写入同一个行键,并在多个批处理突变中使用多个列族(每个版本与多个写入一起批处理)。
这是由于数据压缩导致的预期行为吗?随着时间的推移,额外的版本会被删除吗?
这里的问题是您将这两列放在批处理中的两个单独的条目中,这意味着即使它们具有相同的行,它们也不会自动应用。
批量条目可以单独成功或失败,然后客户端将仅重试失败的条目。例如,如果一个条目成功而另一个超时但后来默默地成功,则重试“失败”条目可能会导致您看到的部分写入结果。
因此,在 python 中,您应该执行以下操作(改编自 cloud.google.com/bigtable/docs/samples-python-hello):
print('Writing some greetings to the table.')
greetings = ['Hello World!', 'Hello Cloud Bigtable!', 'Hello Python!']
rows = []
column1 = 'greeting1'.encode()
column1 = 'greeting2'.encode()
for i, value in enumerate(greetings):
# Note: This example uses sequential numeric IDs for simplicity,
# but this can result in poor performance in a production
# application. Since rows are stored in sorted order by key,
# sequential keys can result in poor distribution of operations
# across nodes.
#
# For more information about how to design a Bigtable schema for
# the best performance, see the documentation:
#
# https://cloud.google.com/bigtable/docs/schema-design
row_key = 'greeting{}'.format(i).encode()
row = table.row(row_key)
# **Multiple calls to 'set_cell()' are allowed on the same batch
# entry. Each entry will be applied atomically, but a separate
# 'row' in the same batch will be applied separately even if it
# shares its row key with another entry.**
row.set_cell(column_family_id,
column1,
value,
timestamp=datetime.datetime.utcnow())
row.set_cell(column_family_id,
column2,
value,
timestamp=datetime.datetime.utcnow())
rows.append(row)
table.mutate_rows(rows)
有时,如果我将多个版本写入同一个行键,并在多个批处理突变中使用多个列族(每个版本与多个写入一起批处理)。
这是由于数据压缩导致的预期行为吗?随着时间的推移,额外的版本会被删除吗?
这里的问题是您将这两列放在批处理中的两个单独的条目中,这意味着即使它们具有相同的行,它们也不会自动应用。
批量条目可以单独成功或失败,然后客户端将仅重试失败的条目。例如,如果一个条目成功而另一个超时但后来默默地成功,则重试“失败”条目可能会导致您看到的部分写入结果。
因此,在 python 中,您应该执行以下操作(改编自 cloud.google.com/bigtable/docs/samples-python-hello):
print('Writing some greetings to the table.')
greetings = ['Hello World!', 'Hello Cloud Bigtable!', 'Hello Python!']
rows = []
column1 = 'greeting1'.encode()
column1 = 'greeting2'.encode()
for i, value in enumerate(greetings):
# Note: This example uses sequential numeric IDs for simplicity,
# but this can result in poor performance in a production
# application. Since rows are stored in sorted order by key,
# sequential keys can result in poor distribution of operations
# across nodes.
#
# For more information about how to design a Bigtable schema for
# the best performance, see the documentation:
#
# https://cloud.google.com/bigtable/docs/schema-design
row_key = 'greeting{}'.format(i).encode()
row = table.row(row_key)
# **Multiple calls to 'set_cell()' are allowed on the same batch
# entry. Each entry will be applied atomically, but a separate
# 'row' in the same batch will be applied separately even if it
# shares its row key with another entry.**
row.set_cell(column_family_id,
column1,
value,
timestamp=datetime.datetime.utcnow())
row.set_cell(column_family_id,
column2,
value,
timestamp=datetime.datetime.utcnow())
rows.append(row)
table.mutate_rows(rows)