尽管数据很小，但字符串列的 Power BI 字典大小都超过 1mb

Question

我今天突然出现了一个不寻常的情况，无论是在新创建的 Power BI 文件中还是在以前工作正常的现有文件中。

导入string数据时，无论导入的数据如何，列的字典大小都超过一兆字节。对于任何包含大量列的小表，这显然会导致模型大小显着膨胀。

此问题出现在从 SQL 服务器、Synapse、Data Lake Gen2 和本地文件存储导入的数据中。

可以在下面所有 string 列的 Col Size 值中看到效果，它们与 Cardinality 中的差异毫无相似之处。因此，导入单个 1,206Kb csv 文件会导致模型大小为 38.15Mb。

有没有其他人遇到过这个问题或知道如何纠正？我能想到的唯一改变（除了一个小的背景更新）是升级到新的模型视图，虽然我在大约一周前这样做了，但今天才出现...

加载了一个小 csv 的新模型的 VertiPaq 分析器指标：

Power BI 详细信息：

Release:
December 2020

Product Version:
2.88.1144.0 (20.12) (x64)

OS Version:
Microsoft Windows NT 10.0.18363.0 (x64 en-GB)

CLR Version:
4.7 or later [Release Number = 528040]

Model Default Mode:
Import

Model Version:
PowerBI_V3

Is Report V3 Models Enabled:
True

Enabled Preview Features:
PBI_NewWebTableInference
PBI_v3ModelsPreview

Disabled Preview Features:
PBI_shapeMapVisualEnabled
PBI_SpanishLinguisticsEnabled
PBI_JsonTableInference
PBI_ImportTextByExample
PBI_ExcelTableInference
PBI_qnaLiveConnect
PBI_eimInformationProtectionForDesktop
PBI_azureMapVisual
PBI_dataPointLassoSelect
PBI_compositeModelsOverAS
PBI_narrativeTextBox
PBI_dynamicParameters
PBI_anomalyDetection
PBI_newFieldList
PBI_cartesianMultiplesAuthoring

Disabled DirectQuery Options:
TreatHanaAsRelationalSource

Answer 1

这种情况已经explained in depth by Marco Russo at SQLBI。

In order to get correct dictionary measures from VertiPaq Analyzer, you have to connect VertiPaq Analyzer just after opening a Power BI file, without hitting Refresh or modifying any calculated table expressions. If this happens, then save the file, close Power BI, and open the file again in Power BI before running VertiPaq Analyzer again over it.

尽管数据很小，但字符串列的 Power BI 字典大小都超过 1mb

Power BI dictionary size for string columns all over 1mb despite small data

sql-server

powerquery

powerbi

azure-data-lake-gen2

azure-synapse