为什么我的 Code Repo 警告我不要使用 union 而是使用 unionByName？

Why is my Code Repo warning me not to use union and instead use unionByName?

我在我的存储库中看到它警告我使用 union，我应该使用 unionByName。这些不是一回事吗？我为什么要关心使用哪一个？

在 PySpark docs 中注意到 union:

Also as standard in SQL, this function resolves columns by position (not by name).

在大多数情况下这很危险，因为如果您的模式具有相同的类型但名称/用途不同，您可能会默默地合并不同且不兼容的模式。即，如果 schema1 是 [('col1', T.IntegerType()), ('col2', T.StringType())] 并且 schema2 是 [('col3', T.IntegerType()), ('col4', T.StringType())]， 它们可以通过 union 成功合并，即使 col1 和 col3 具有根本不同的含义，col2 和 col4

这与unionByName的不同之处在于：

The difference between this function and union() is that this function resolves columns by name (not by position)

在大多数情况下，这是一种更安全的工会方式，因此是首选。

为什么我的 Code Repo 警告我不要使用 union 而是使用 unionByName？

Why is my Code Repo warning me not to use union and instead use unionByName?

palantir-foundry

foundry-code-repositories

foundry-python-transform