事实 table updatable/deletable 行
Fact table updatable/deletable rows
AFAIK,最佳实践表明您永远不应该更新事实 table 行,至少对于事务和定期快照粒度。在阅读有关 Fact Table Surrogate Key 的内容时,发现了更新的概念:
Certain ETL techniques for updating fact rows are only feasible if a
surrogate key is assigned to the fact rows. Specifically, one
technique for loading updates to fact rows is to insert the rows to be
updated as new rows, then to delete the original rows as a second step
as a single transaction. The advantages of this technique from an ETL
perspective are improved load performance, improved recovery
capability and improved audit capabilities. The surrogate key for the
fact table rows is required as multiple identical primary keys will
often exist for the old and new versions of the updated fact rows
between the time of the insert of the updated row and the delete of
the old row.
Bob Becker 的意思是 updates/deletes 来自事实 table?这是一种常见的做法吗?
有时您可能需要更新一个事实table,原因很简单,因为加载了错误的数据。
不确定代理键在这里有何帮助
- 您必须根据自然键找到要更改的行。
但是,是的,INSERT
和 DELETE
(可能只有 合乎逻辑的 删除设置取消标志)可能是
更喜欢简单的 UPDATE
基本上是出于可听性和恢复的原因。同样,我不确定这会如何影响性能。
最重要的是,2006 年的最佳实践不一定是当前的最佳实践 - 如今
重要的 fact tables 通常没有 primary key 作为唯一索引使得滚动分区 window 概念更难。 (如果需要,在 ETL 过程中检查唯一性)。
AFAIK,最佳实践表明您永远不应该更新事实 table 行,至少对于事务和定期快照粒度。在阅读有关 Fact Table Surrogate Key 的内容时,发现了更新的概念:
Certain ETL techniques for updating fact rows are only feasible if a surrogate key is assigned to the fact rows. Specifically, one technique for loading updates to fact rows is to insert the rows to be updated as new rows, then to delete the original rows as a second step as a single transaction. The advantages of this technique from an ETL perspective are improved load performance, improved recovery capability and improved audit capabilities. The surrogate key for the fact table rows is required as multiple identical primary keys will often exist for the old and new versions of the updated fact rows between the time of the insert of the updated row and the delete of the old row.
Bob Becker 的意思是 updates/deletes 来自事实 table?这是一种常见的做法吗?
有时您可能需要更新一个事实table,原因很简单,因为加载了错误的数据。
不确定代理键在这里有何帮助 - 您必须根据自然键找到要更改的行。
但是,是的,INSERT
和 DELETE
(可能只有 合乎逻辑的 删除设置取消标志)可能是
更喜欢简单的 UPDATE
基本上是出于可听性和恢复的原因。同样,我不确定这会如何影响性能。
最重要的是,2006 年的最佳实践不一定是当前的最佳实践 - 如今 重要的 fact tables 通常没有 primary key 作为唯一索引使得滚动分区 window 概念更难。 (如果需要,在 ETL 过程中检查唯一性)。