用于实时传感器数据的 Bigtable rowkey 设计？

Question

贵公司正在将工厂车间的实时传感器数据流式传输到 Bigtable，他们发现性能极差。应如何重新设计行键以提高填充实时仪表板的查询的 Bigtable 性能？

a) Use a row key of the form <timestamp>
b) Use a row key of the form <sensorid>
c) Use a row key of the form <timestamp>#<sensorid>
d) Use a row key of the form >#<sensorid>#<timestamp>

根据文档，这种情况下理想的行键是什么？我觉得应该是sensorid和timestamp的row key，但是我看到网上有一篇文章，上面的作业题只提到了'timestamp'。请帮忙。

我对上述特定用例有如下相互矛盾的理论： - 由于行是按字典顺序排序的，因此仅将时间戳用作行键是不明智的。（来自 Doc - 不推荐单独使用时间戳作为行键，因为大多数写入将被推送到单个节点上。） - 在这个用例中，由于需求是一个实时仪表板，这也意味着所有的sensorid数据可以只存储一个时间戳，因此可以只根据时间戳进行实时查询。

请帮助解决此用例的理想行键。

Answer 1

问题是，它没有指定实时仪表板显示的查询也没有太多关于性能的洞察力。请参阅包含一些示例场景的 schema design for time series 数据文档。如果您只有时间戳作为键，您可能会遇到热点问题。理想的键是 ##（选项 D），但它始终取决于问题中不是很清楚的用例。

Answer 2

根据 Bigtable schema design documentation：

"Using the timestamp by itself as the row key is not recommended, as most writes would be pushed onto a single node"。所以这排除了选项 A
"For the same reason, avoid placing a timestamp at the start of the row key."。有选项 C

此外，该页面显示 "Your row key for this data could combine an identifier for the machine with a timestamp for the data (for example, machine_4223421#1425330757685)."。这导致我们选择选项 D 作为最佳选项。

理论上选项B也可以，但选项D似乎更好。

用于实时传感器数据的 Bigtable rowkey 设计？

Bigtable rowkey design for real-time sensor data?

bigtable

google-cloud-bigtable