为什么需要对配置单元 table 进行分桶以支持 ACID 事务?

Why does a hive table need to be bucketed to support ACID transactions?

我想知道为什么需要对配置单元 table 进行分桶以支持 ACID 事务。这只是一些蜂巢怪癖吗?或者背后有什么原因吗?

这里有一些关于 Hive 压缩器的信息:

The compactor runs background MapReduce jobs to compact the delta and base files. There are two types of compaction: major and minor. The minor compaction merges many small delta files into one big delta file. The major compaction is more expensive, it takes delta files and merges them with the base files. All merging happens by creating a new file and removing the old ones. There is a special cleaning process to do so. The compaction is done for each bucket separately. Base and Delta files are created per bucket.

更多信息:https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions

所以,桶越多,压缩越快。