Quicksight 中的 avgOver
avgOver in Quicksight
我有 2019 年每个月的数据,但只有到 2020 年 9 月的数据。每行包含一个 MonthNo.
,对应于日历月,以及一个用户 ID
条目。看起来像这样
| Month | Year | ID | MonthNo. |
|-----------|------|--------|----------|
| January | 2019 | 611330 | 01 |
| January | 2019 | 174519 | 01 |
| January | 2019 | 380747 | 01 |
| February | 2019 | 882347 | 02 |
| February | 2019 | 633797 | 02 |
| February | 2019 | 863219 | 02 |
| March | 2019 | 189924 | 03 |
| March | 2019 | 241922 | 03 |
| March | 2019 | 563335 | 03 |
| April | 2019 | 648660 | 04 |
| April | 2019 | 363710 | 04 |
| April | 2019 | 606284 | 04 |
| May | 2019 | 296508 | 05 |
| May | 2019 | 287650 | 05 |
| May | 2019 | 599909 | 05 |
| June | 2019 | 513844 | 06 |
| June | 2019 | 891633 | 06 |
| June | 2019 | 138250 | 06 |
| July | 2019 | 126235 | 07 |
| July | 2019 | 853840 | 07 |
| July | 2019 | 713104 | 07 |
| August | 2019 | 180511 | 08 |
| August | 2019 | 451735 | 08 |
| August | 2019 | 818095 | 08 |
| September | 2019 | 512621 | 09 |
| September | 2019 | 674079 | 09 |
| September | 2019 | 914015 | 09 |
| October | 2019 | 132859 | 10 |
| October | 2019 | 560572 | 10 |
| October | 2019 | 272557 | 10 |
| November | 2019 | 984001 | 11 |
| November | 2019 | 815688 | 11 |
| November | 2019 | 902748 | 11 |
| December | 2019 | 880285 | 12 |
| December | 2019 | 167629 | 12 |
| December | 2019 | 772039 | 12 |
| January | 2020 | 116886 | 01 |
| January | 2020 | 386078 | 01 |
| February | 2020 | 291060 | 02 |
| February | 2020 | 970032 | 02 |
| March | 2020 | 907555 | 03 |
| March | 2020 | 560827 | 03 |
| April | 2020 | 938039 | 04 |
| April | 2020 | 721640 | 04 |
| May | 2020 | 131719 | 05 |
| May | 2020 | 415596 | 05 |
| June | 2020 | 589375 | 06 |
| June | 2020 | 623663 | 06 |
| July | 2020 | 577748 | 07 |
| July | 2020 | 999572 | 07 |
| August | 2020 | 630975 | 08 |
| August | 2020 | 442278 | 08 |
| September | 2020 | 993318 | 09 |
| September | 2020 | 413214 | 09 |
这个例子 table 在 2019 年每个月正好有 3 条记录,在 2020 年每个月正好有 2 条记录。所以当我添加一个名为 MonthNotYearTraffic
的计算字段时,由
// Averages ID count by month number only, intentionally ignoring year.
avgOver(count(ID), [{MonthNo.}])
我希望得到以下结果
| MonthNo. | MonthNotYearTraffic |
|----------|---------------------|
| 01 | 2.5 |
| 02 | 2.5 |
| 03 | 2.5 |
| 04 | 2.5 |
| 05 | 2.5 |
| 06 | 2.5 |
| 07 | 2.5 |
| 08 | 2.5 |
| 09 | 2.5 |
| 10 | 3 |
| 11 | 3 |
| 12 | 3 |
因为 10-12 月只有上述 2019 年的三个条目。但是,结果是:
我已经尝试了几种不同的方法和以下方法的组合(其中一些我知道很疯狂,但其他人不确定):
- 起初不依赖自定义的计算字段
- 通过在计算字段定义中对月份和年份进行分区
- 通过扰乱级别感知聚合
- 确保数据类型为 strings/dimensions
没有骰子。
这看起来应该是简单的技术,所以任何指针都会很好。谢谢。
我认为问题在于 avgOver
仅当您像在第一个 table 中定义问题中的值时那样显示数据时才有效。由于您只显示 MonthNo.
字段并且具有相同 MonthNo.
值的行不多,因此该分区中每个月只有一行,因此它只是将计数除以 1。
也许可以试试 count(ID) / count("MonthNo.")
看来您需要按月对 ID 计数进行分区,然后将该计数除以您在该月拥有用户 ID 的年数。
使用您的示例数据,我能够获得您想要的输出。
MonthNotYearTraffic = countover(ID,[Month],PRE_FILTER)/distinctCountOver(Year,[Month],PRE_FILTER)
我有 2019 年每个月的数据,但只有到 2020 年 9 月的数据。每行包含一个 MonthNo.
,对应于日历月,以及一个用户 ID
条目。看起来像这样
| Month | Year | ID | MonthNo. |
|-----------|------|--------|----------|
| January | 2019 | 611330 | 01 |
| January | 2019 | 174519 | 01 |
| January | 2019 | 380747 | 01 |
| February | 2019 | 882347 | 02 |
| February | 2019 | 633797 | 02 |
| February | 2019 | 863219 | 02 |
| March | 2019 | 189924 | 03 |
| March | 2019 | 241922 | 03 |
| March | 2019 | 563335 | 03 |
| April | 2019 | 648660 | 04 |
| April | 2019 | 363710 | 04 |
| April | 2019 | 606284 | 04 |
| May | 2019 | 296508 | 05 |
| May | 2019 | 287650 | 05 |
| May | 2019 | 599909 | 05 |
| June | 2019 | 513844 | 06 |
| June | 2019 | 891633 | 06 |
| June | 2019 | 138250 | 06 |
| July | 2019 | 126235 | 07 |
| July | 2019 | 853840 | 07 |
| July | 2019 | 713104 | 07 |
| August | 2019 | 180511 | 08 |
| August | 2019 | 451735 | 08 |
| August | 2019 | 818095 | 08 |
| September | 2019 | 512621 | 09 |
| September | 2019 | 674079 | 09 |
| September | 2019 | 914015 | 09 |
| October | 2019 | 132859 | 10 |
| October | 2019 | 560572 | 10 |
| October | 2019 | 272557 | 10 |
| November | 2019 | 984001 | 11 |
| November | 2019 | 815688 | 11 |
| November | 2019 | 902748 | 11 |
| December | 2019 | 880285 | 12 |
| December | 2019 | 167629 | 12 |
| December | 2019 | 772039 | 12 |
| January | 2020 | 116886 | 01 |
| January | 2020 | 386078 | 01 |
| February | 2020 | 291060 | 02 |
| February | 2020 | 970032 | 02 |
| March | 2020 | 907555 | 03 |
| March | 2020 | 560827 | 03 |
| April | 2020 | 938039 | 04 |
| April | 2020 | 721640 | 04 |
| May | 2020 | 131719 | 05 |
| May | 2020 | 415596 | 05 |
| June | 2020 | 589375 | 06 |
| June | 2020 | 623663 | 06 |
| July | 2020 | 577748 | 07 |
| July | 2020 | 999572 | 07 |
| August | 2020 | 630975 | 08 |
| August | 2020 | 442278 | 08 |
| September | 2020 | 993318 | 09 |
| September | 2020 | 413214 | 09 |
这个例子 table 在 2019 年每个月正好有 3 条记录,在 2020 年每个月正好有 2 条记录。所以当我添加一个名为 MonthNotYearTraffic
的计算字段时,由
// Averages ID count by month number only, intentionally ignoring year.
avgOver(count(ID), [{MonthNo.}])
我希望得到以下结果
| MonthNo. | MonthNotYearTraffic |
|----------|---------------------|
| 01 | 2.5 |
| 02 | 2.5 |
| 03 | 2.5 |
| 04 | 2.5 |
| 05 | 2.5 |
| 06 | 2.5 |
| 07 | 2.5 |
| 08 | 2.5 |
| 09 | 2.5 |
| 10 | 3 |
| 11 | 3 |
| 12 | 3 |
因为 10-12 月只有上述 2019 年的三个条目。但是,结果是:
我已经尝试了几种不同的方法和以下方法的组合(其中一些我知道很疯狂,但其他人不确定):
- 起初不依赖自定义的计算字段
- 通过在计算字段定义中对月份和年份进行分区
- 通过扰乱级别感知聚合
- 确保数据类型为 strings/dimensions
没有骰子。
这看起来应该是简单的技术,所以任何指针都会很好。谢谢。
我认为问题在于 avgOver
仅当您像在第一个 table 中定义问题中的值时那样显示数据时才有效。由于您只显示 MonthNo.
字段并且具有相同 MonthNo.
值的行不多,因此该分区中每个月只有一行,因此它只是将计数除以 1。
也许可以试试 count(ID) / count("MonthNo.")
看来您需要按月对 ID 计数进行分区,然后将该计数除以您在该月拥有用户 ID 的年数。
使用您的示例数据,我能够获得您想要的输出。
MonthNotYearTraffic = countover(ID,[Month],PRE_FILTER)/distinctCountOver(Year,[Month],PRE_FILTER)