如何根据根据条件参数计算的特定分位数创建具有值的列?
How to create a column with values based on specific quantiles calculated from a conditional argument?
一切都在标题中。为了说明,我构建了以下示例。
我有以下数据框:
date <- c("01.02.2011","01.02.2011","01.02.2011","01.02.2011","01.02.2011","01.02.2011",
"01.02.2011","01.02.2011","01.02.2011","01.02.2011",
"02.02.2011","02.02.2011","02.02.2011","02.02.2011","02.02.2011","02.02.2011",
"02.02.2011","02.02.2011","02.02.2011","02.02.2011")
date <- as.Date(date, format="%d.%m.%Y")
ID <- c("A","B","C","D","E","F","G","H","I","J",
"A","B","C","D","E","F","G","H","I","J")
values <- as.numeric(c("1","8","2","3","5","13","2","4","1","16",
"4","2","12","16","8","1","7","11","2","10"))
df <- data.frame(ID, date, values)
看起来像:
ID date values
1 A 2011-02-01 1
2 B 2011-02-01 8
3 C 2011-02-01 2
4 D 2011-02-01 3
5 E 2011-02-01 5
6 F 2011-02-01 13
7 G 2011-02-01 2
8 H 2011-02-01 4
9 I 2011-02-01 1
10 J 2011-02-01 16
11 A 2011-02-02 4
12 B 2011-02-02 2
13 C 2011-02-02 12
14 D 2011-02-02 16
15 E 2011-02-02 8
16 F 2011-02-02 1
17 G 2011-02-02 7
18 H 2011-02-02 11
19 I 2011-02-02 2
20 J 2011-02-02 10
我想创建一个新列 "QF",它采用以下值:
- 1 如果 <=按日期计算的第 40 个百分位数
- 2 如果按日期计算的 > 第 40 个百分位和 < 第 70 个百分位
- 3 如果按日期计算的 > 第 70 个百分位数
我想获得:
ID date values QF
1 A 2011-02-01 1 1
2 B 2011-02-01 8 3
3 C 2011-02-01 2 1
4 D 2011-02-01 3 2
5 E 2011-02-01 5 2
6 F 2011-02-01 13 3
7 G 2011-02-01 2 1
8 H 2011-02-01 4 2
9 I 2011-02-01 1 1
10 J 2011-02-01 16 3
11 A 2011-02-02 4 1
12 B 2011-02-02 2 1
13 C 2011-02-02 12 3
14 D 2011-02-02 16 3
15 E 2011-02-02 8 2
16 F 2011-02-02 1 1
17 G 2011-02-02 7 2
18 H 2011-02-02 11 3
19 I 2011-02-02 2 1
20 J 2011-02-02 10 2
如果需要对我的问题进行任何编辑,请随时告诉我
一个dplyr
选项可以是:
df %>%
group_by(date) %>%
mutate(QF = cut(values, c(0, quantile(values, probs = c(0.4, 0.7, 1))),
labels = 1:3))
ID date values QF
<fct> <date> <dbl> <fct>
1 A 2011-02-01 1 1
2 B 2011-02-01 8 3
3 C 2011-02-01 2 1
4 D 2011-02-01 3 2
5 E 2011-02-01 5 2
6 F 2011-02-01 13 3
7 G 2011-02-01 2 1
8 H 2011-02-01 4 2
9 I 2011-02-01 1 1
10 J 2011-02-01 16 3
11 A 2011-02-02 4 1
12 B 2011-02-02 2 1
13 C 2011-02-02 12 3
14 D 2011-02-02 16 3
15 E 2011-02-02 8 2
16 F 2011-02-02 1 1
17 G 2011-02-02 7 2
18 H 2011-02-02 11 3
19 I 2011-02-02 2 1
20 J 2011-02-02 10 2
我们可以使用fndInterval
library(dplyr)
df %>%
group_by(date) %>%
mutate(QF = findInterval(values, c(0, quantile(values, probs = c(0.4, 0.7, 1)))))
一切都在标题中。为了说明,我构建了以下示例。
我有以下数据框:
date <- c("01.02.2011","01.02.2011","01.02.2011","01.02.2011","01.02.2011","01.02.2011",
"01.02.2011","01.02.2011","01.02.2011","01.02.2011",
"02.02.2011","02.02.2011","02.02.2011","02.02.2011","02.02.2011","02.02.2011",
"02.02.2011","02.02.2011","02.02.2011","02.02.2011")
date <- as.Date(date, format="%d.%m.%Y")
ID <- c("A","B","C","D","E","F","G","H","I","J",
"A","B","C","D","E","F","G","H","I","J")
values <- as.numeric(c("1","8","2","3","5","13","2","4","1","16",
"4","2","12","16","8","1","7","11","2","10"))
df <- data.frame(ID, date, values)
看起来像:
ID date values
1 A 2011-02-01 1
2 B 2011-02-01 8
3 C 2011-02-01 2
4 D 2011-02-01 3
5 E 2011-02-01 5
6 F 2011-02-01 13
7 G 2011-02-01 2
8 H 2011-02-01 4
9 I 2011-02-01 1
10 J 2011-02-01 16
11 A 2011-02-02 4
12 B 2011-02-02 2
13 C 2011-02-02 12
14 D 2011-02-02 16
15 E 2011-02-02 8
16 F 2011-02-02 1
17 G 2011-02-02 7
18 H 2011-02-02 11
19 I 2011-02-02 2
20 J 2011-02-02 10
我想创建一个新列 "QF",它采用以下值:
- 1 如果 <=按日期计算的第 40 个百分位数
- 2 如果按日期计算的 > 第 40 个百分位和 < 第 70 个百分位
- 3 如果按日期计算的 > 第 70 个百分位数
我想获得:
ID date values QF
1 A 2011-02-01 1 1
2 B 2011-02-01 8 3
3 C 2011-02-01 2 1
4 D 2011-02-01 3 2
5 E 2011-02-01 5 2
6 F 2011-02-01 13 3
7 G 2011-02-01 2 1
8 H 2011-02-01 4 2
9 I 2011-02-01 1 1
10 J 2011-02-01 16 3
11 A 2011-02-02 4 1
12 B 2011-02-02 2 1
13 C 2011-02-02 12 3
14 D 2011-02-02 16 3
15 E 2011-02-02 8 2
16 F 2011-02-02 1 1
17 G 2011-02-02 7 2
18 H 2011-02-02 11 3
19 I 2011-02-02 2 1
20 J 2011-02-02 10 2
如果需要对我的问题进行任何编辑,请随时告诉我
一个dplyr
选项可以是:
df %>%
group_by(date) %>%
mutate(QF = cut(values, c(0, quantile(values, probs = c(0.4, 0.7, 1))),
labels = 1:3))
ID date values QF
<fct> <date> <dbl> <fct>
1 A 2011-02-01 1 1
2 B 2011-02-01 8 3
3 C 2011-02-01 2 1
4 D 2011-02-01 3 2
5 E 2011-02-01 5 2
6 F 2011-02-01 13 3
7 G 2011-02-01 2 1
8 H 2011-02-01 4 2
9 I 2011-02-01 1 1
10 J 2011-02-01 16 3
11 A 2011-02-02 4 1
12 B 2011-02-02 2 1
13 C 2011-02-02 12 3
14 D 2011-02-02 16 3
15 E 2011-02-02 8 2
16 F 2011-02-02 1 1
17 G 2011-02-02 7 2
18 H 2011-02-02 11 3
19 I 2011-02-02 2 1
20 J 2011-02-02 10 2
我们可以使用fndInterval
library(dplyr)
df %>%
group_by(date) %>%
mutate(QF = findInterval(values, c(0, quantile(values, probs = c(0.4, 0.7, 1)))))