确定数据仓库中的度量是否可聚合

Determine if a measure is aggregable or not in data-warehouse

谁能告诉我一个 'general use' 的分步方法来找出数据集市中的不可聚合字段。这是我发现的一个例子:

注:斜体表示'key',粗体表示'shortenings','column'是'referencing'

的别名

Relational schema:

CALL(COD,DATE,FROM:S,TO:S,LEN)

SIM(SIM, USER:USER, TRIFF:T, BONUS)

TARIFF(TARIFF, CARRIER:CAR)

USER(USER, TOWN:TOW, LAST_TARIFF:TAR)

ROAMING_CALL(COD:CAL, FOREIGN_CARRIER:CAR)

PROMO_CALL(COD:CAL, PROMO_TARIFF:P_TA)

PROMO_TARIFF(TARIFF:TAR)

TOWN(TOWN,NATION)

CARRIER(CARRIER, NATION)

REQUESTS: build a fact-schema for 'CALL' with following

dimensions: DATE, SIM_FROM, CALLED_CARRIER, FOREIGN_CARRIER, PROMO_TARIFF and

measures: AVG_CALL_LENGTH, NUM_OUTGOING_SIM (as count distinct FROM), NUM_INCOMING_SIM (as count distinct TO)

现在我可以绘制事实架构,但我很难找到哪些度量可以沿哪些维度聚合

编辑: this 是我拥有的事实模式的 pdf(抱歉没有使用严格的语法,但包含阅读笔记)

措施:

Standard [obtained by the operational schema]:  
NUM_INCOMING_CALLS = COUNT DISTINCT (TO)    
UN-AGGREGABILITIES ->*THIS IS MY ISSUE*

Calculated [obtained by the operational schema, need partial data to add properly]:  
AVG_CALL_LENGTH = CL_SUM/CL_COUNT  
where  
CL_SUM = SUM (LENGTH), CL_COUNT = COUNT(LENGTH)  
UN-AGGREGABILITIES ->*THIS IS MY ISSUE*

Derived  [can be found as a dimension]:  
NUM_OUTGOING_CALLS = COUNT DISTINCT ( FROM )  
UN-AGGREGABILITIES ->*THIS IS MY ISSUE*

好的,我去问老师:他给了我一个简单的算法:

Given a schema D{D1, D2, D3, ... Dn}, for a Mesaure M= count distinct A n

if A U X -> Di is not trivial, X subset of D

X U A -> D1 (True)
X U A -> D2 (False)
X U A -> D3 (True)
...
X U A -> Dn-1 (False)

I have that NA = {D2, Dn-1}
NA: set of non-aggregabilities