如何从具有最高信息(增益)的列数据点接收?
How can I receive from a column data points with the highest Information (Gain)?
假设我有这个数据框:
> df1
date count
1 2012-07-01 2.867133
2 2012-08-01 2.018745
3 2012-09-01 5.237515
4 2012-10-01 8.320493
5 2012-11-01 4.119850
6 2012-12-01 3.648649
7 2013-01-01 3.172867
8 2013-02-01 4.065041
9 2013-03-01 2.914798
10 2013-04-01 4.735683
11 2013-05-01 3.775411
12 2013-06-01 3.825717
13 2013-07-01 3.273427
14 2013-08-01 2.716469
15 2013-09-01 2.687296
16 2013-10-01 3.674121
17 2013-11-01 3.325942
18 2013-12-01 2.524038
我现在想以这样的方式拆分 df1$count,以便我得到 groups/ranges 信息最高的地方。我的想法是信息增益,但我知道 IG 用于属性,而不是列。
如果绘制数据,您可以区分高位上升和下降...因此 我的目标是 始终找到这些重要的 increases/decreases,其中包含 高位信息增益.
关于如何做到这一点有什么想法吗?
是这样的吗?
df1%>%
mutate(dif=ifelse((lag(count)-count)>0,0,1))%>%
mutate(group=rle(dif) %>% magrittr::extract2("lengths") %>% rep(seq_along(.), .))
date count dif group
1 2012-07-01 2.867133 NA 1
2 2012-08-01 2.018745 0 2
3 2012-09-01 5.237515 1 3
4 2012-10-01 8.320493 1 3
5 2012-11-01 4.119850 0 4
6 2012-12-01 3.648649 0 4
7 2013-01-01 3.172867 0 4
8 2013-02-01 4.065041 1 5
9 2013-03-01 2.914798 0 6
10 2013-04-01 4.735683 1 7
11 2013-05-01 3.775411 0 8
12 2013-06-01 3.825717 1 9
13 2013-07-01 3.273427 0 10
14 2013-08-01 2.716469 0 10
15 2013-09-01 2.687296 0 10
16 2013-10-01 3.674121 1 11
17 2013-11-01 3.325942 0 12
18 2013-12-01 2.524038 0 12
更新
df1%>%
mutate(nxt=lag(count),
dif=ifelse( abs(count-lag(count))>2 | count/lag(count)>3 | lag(count)/count>3,1,0))%>%
+ mutate(group=rle(dif) %>% magrittr::extract2("lengths") %>% rep(seq_along(.), .))
date count nxt dif group
1 2012-07-01 2.867133 NA NA 1
2 2012-08-01 2.018745 2.867133 0 2
3 2012-09-01 5.237515 2.018745 1 3
4 2012-10-01 8.320493 5.237515 1 3
5 2012-11-01 4.119850 8.320493 1 3
6 2012-12-01 3.648649 4.119850 0 4
7 2013-01-01 3.172867 3.648649 0 4
8 2013-02-01 4.065041 3.172867 0 4
9 2013-03-01 2.914798 4.065041 0 4
10 2013-04-01 4.735683 2.914798 0 4
11 2013-05-01 3.775411 4.735683 0 4
12 2013-06-01 3.825717 3.775411 0 4
13 2013-07-01 3.273427 3.825717 0 4
14 2013-08-01 2.716469 3.273427 0 4
15 2013-09-01 2.687296 2.716469 0 4
16 2013-10-01 3.674121 2.687296 0 4
17 2013-11-01 3.325942 3.674121 0 4
18 2013-12-01 2.524038 3.325942 0 4
假设我有这个数据框:
> df1
date count
1 2012-07-01 2.867133
2 2012-08-01 2.018745
3 2012-09-01 5.237515
4 2012-10-01 8.320493
5 2012-11-01 4.119850
6 2012-12-01 3.648649
7 2013-01-01 3.172867
8 2013-02-01 4.065041
9 2013-03-01 2.914798
10 2013-04-01 4.735683
11 2013-05-01 3.775411
12 2013-06-01 3.825717
13 2013-07-01 3.273427
14 2013-08-01 2.716469
15 2013-09-01 2.687296
16 2013-10-01 3.674121
17 2013-11-01 3.325942
18 2013-12-01 2.524038
我现在想以这样的方式拆分 df1$count,以便我得到 groups/ranges 信息最高的地方。我的想法是信息增益,但我知道 IG 用于属性,而不是列。 如果绘制数据,您可以区分高位上升和下降...因此 我的目标是 始终找到这些重要的 increases/decreases,其中包含 高位信息增益.
关于如何做到这一点有什么想法吗?
是这样的吗?
df1%>%
mutate(dif=ifelse((lag(count)-count)>0,0,1))%>%
mutate(group=rle(dif) %>% magrittr::extract2("lengths") %>% rep(seq_along(.), .))
date count dif group
1 2012-07-01 2.867133 NA 1
2 2012-08-01 2.018745 0 2
3 2012-09-01 5.237515 1 3
4 2012-10-01 8.320493 1 3
5 2012-11-01 4.119850 0 4
6 2012-12-01 3.648649 0 4
7 2013-01-01 3.172867 0 4
8 2013-02-01 4.065041 1 5
9 2013-03-01 2.914798 0 6
10 2013-04-01 4.735683 1 7
11 2013-05-01 3.775411 0 8
12 2013-06-01 3.825717 1 9
13 2013-07-01 3.273427 0 10
14 2013-08-01 2.716469 0 10
15 2013-09-01 2.687296 0 10
16 2013-10-01 3.674121 1 11
17 2013-11-01 3.325942 0 12
18 2013-12-01 2.524038 0 12
更新
df1%>%
mutate(nxt=lag(count),
dif=ifelse( abs(count-lag(count))>2 | count/lag(count)>3 | lag(count)/count>3,1,0))%>%
+ mutate(group=rle(dif) %>% magrittr::extract2("lengths") %>% rep(seq_along(.), .))
date count nxt dif group
1 2012-07-01 2.867133 NA NA 1
2 2012-08-01 2.018745 2.867133 0 2
3 2012-09-01 5.237515 2.018745 1 3
4 2012-10-01 8.320493 5.237515 1 3
5 2012-11-01 4.119850 8.320493 1 3
6 2012-12-01 3.648649 4.119850 0 4
7 2013-01-01 3.172867 3.648649 0 4
8 2013-02-01 4.065041 3.172867 0 4
9 2013-03-01 2.914798 4.065041 0 4
10 2013-04-01 4.735683 2.914798 0 4
11 2013-05-01 3.775411 4.735683 0 4
12 2013-06-01 3.825717 3.775411 0 4
13 2013-07-01 3.273427 3.825717 0 4
14 2013-08-01 2.716469 3.273427 0 4
15 2013-09-01 2.687296 2.716469 0 4
16 2013-10-01 3.674121 2.687296 0 4
17 2013-11-01 3.325942 3.674121 0 4
18 2013-12-01 2.524038 3.325942 0 4