确定数据仓库中的度量是否可聚合
Determine if a measure is aggregable or not in data-warehouse
谁能告诉我一个 'general use' 的分步方法来找出数据集市中的不可聚合字段。这是我发现的一个例子:
注:斜体表示'key',粗体表示'shortenings','column'是'referencing'
的别名
Relational schema:
CALL(COD,DATE,FROM:S,TO:S,LEN)
SIM(SIM, USER:USER, TRIFF:T, BONUS)
TARIFF(TARIFF, CARRIER:CAR)
USER(USER, TOWN:TOW, LAST_TARIFF:TAR)
ROAMING_CALL(COD:CAL, FOREIGN_CARRIER:CAR)
PROMO_CALL(COD:CAL, PROMO_TARIFF:P_TA)
PROMO_TARIFF(TARIFF:TAR)
TOWN(TOWN,NATION)
CARRIER(CARRIER, NATION)
REQUESTS:
build a fact-schema for 'CALL' with following
dimensions:
DATE, SIM_FROM, CALLED_CARRIER, FOREIGN_CARRIER, PROMO_TARIFF and
measures: AVG_CALL_LENGTH, NUM_OUTGOING_SIM (as count distinct FROM),
NUM_INCOMING_SIM (as count distinct TO)
现在我可以绘制事实架构,但我很难找到哪些度量可以沿哪些维度聚合
编辑:
this 是我拥有的事实模式的 pdf(抱歉没有使用严格的语法,但包含阅读笔记)
措施:
Standard [obtained by the operational schema]:
NUM_INCOMING_CALLS = COUNT DISTINCT (TO)
UN-AGGREGABILITIES ->*THIS IS MY ISSUE*
Calculated [obtained by the operational schema, need partial data to add properly]:
AVG_CALL_LENGTH = CL_SUM/CL_COUNT
where
CL_SUM = SUM (LENGTH), CL_COUNT = COUNT(LENGTH)
UN-AGGREGABILITIES ->*THIS IS MY ISSUE*
Derived [can be found as a dimension]:
NUM_OUTGOING_CALLS = COUNT DISTINCT ( FROM )
UN-AGGREGABILITIES ->*THIS IS MY ISSUE*
好的,我去问老师:他给了我一个简单的算法:
Given a schema D{D1, D2, D3, ... Dn}, for a Mesaure M= count distinct A n
if A U X -> Di is not trivial, X subset of D
X U A -> D1 (True)
X U A -> D2 (False)
X U A -> D3 (True)
...
X U A -> Dn-1 (False)
I have that NA = {D2, Dn-1}
NA: set of non-aggregabilities
谁能告诉我一个 'general use' 的分步方法来找出数据集市中的不可聚合字段。这是我发现的一个例子:
注:斜体表示'key',粗体表示'shortenings','column'是'referencing'
的别名Relational schema:
CALL(COD,DATE,FROM:S,TO:S,LEN)
SIM(SIM, USER:USER, TRIFF:T, BONUS)
TARIFF(TARIFF, CARRIER:CAR)
USER(USER, TOWN:TOW, LAST_TARIFF:TAR)
ROAMING_CALL(COD:CAL, FOREIGN_CARRIER:CAR)
PROMO_CALL(COD:CAL, PROMO_TARIFF:P_TA)
PROMO_TARIFF(TARIFF:TAR)
TOWN(TOWN,NATION)
CARRIER(CARRIER, NATION)
REQUESTS: build a fact-schema for 'CALL' with following
dimensions: DATE, SIM_FROM, CALLED_CARRIER, FOREIGN_CARRIER, PROMO_TARIFF and
measures: AVG_CALL_LENGTH, NUM_OUTGOING_SIM (as count distinct FROM), NUM_INCOMING_SIM (as count distinct TO)
现在我可以绘制事实架构,但我很难找到哪些度量可以沿哪些维度聚合
编辑: this 是我拥有的事实模式的 pdf(抱歉没有使用严格的语法,但包含阅读笔记)
措施:
Standard [obtained by the operational schema]:
NUM_INCOMING_CALLS = COUNT DISTINCT (TO)
UN-AGGREGABILITIES ->*THIS IS MY ISSUE*
Calculated [obtained by the operational schema, need partial data to add properly]:
AVG_CALL_LENGTH = CL_SUM/CL_COUNT
where
CL_SUM = SUM (LENGTH), CL_COUNT = COUNT(LENGTH)
UN-AGGREGABILITIES ->*THIS IS MY ISSUE*
Derived [can be found as a dimension]:
NUM_OUTGOING_CALLS = COUNT DISTINCT ( FROM )
UN-AGGREGABILITIES ->*THIS IS MY ISSUE*
好的,我去问老师:他给了我一个简单的算法:
Given a schema D{D1, D2, D3, ... Dn}, for a Mesaure M= count distinct A n
if A U X -> Di is not trivial, X subset of D
X U A -> D1 (True)
X U A -> D2 (False)
X U A -> D3 (True)
...
X U A -> Dn-1 (False)I have that NA = {D2, Dn-1}
NA: set of non-aggregabilities