如何根据 2 个标准生成第 6 个最差值并将结果插入单独的列?
How to produce 6th worst value based on 2 criteria and insert results into a separate column?
希望有人能提供帮助。
我正在尝试添加另一列:6th Worst
。我想要做的是让它根据指定的标准产生第 6 个最差的 y
结果:Date
.
这是我的 df 的一个例子:
Key Date y x1 x2 x3
1 1/10/2018 12:00:00 AM 2 3 2 5
1 1/11/2018 12:00:00 AM 3 5 7 2
1 1/12/2018 12:00:00 AM 5 7 4 7
1 1/13/2018 12:00:00 AM 7 2 7 6
2 1/10/2018 12:00:00 AM 2 6 3 8
2 1/11/2018 12:00:00 AM 3 7 7 3
2 1/12/2018 12:00:00 AM 3 2 3 4
2 1/13/2018 12:00:00 AM 7 6 2 7
3 1/10/2018 12:00:00 AM 2 3 2 5
3 1/11/2018 12:00:00 AM 3 5 7 2
3 1/12/2018 12:00:00 AM 5 7 4 7
3 1/13/2018 12:00:00 AM 7 2 7 6
3 1/10/2018 12:00:00 AM 2 6 3 8
3 1/11/2018 12:00:00 AM 3 7 7 3
3 1/12/2018 12:00:00 AM 3 2 3 4
3 1/13/2018 12:00:00 AM 7 6 2 7
4 1/10/2018 12:00:00 AM 2 3 2 5
4 1/11/2018 12:00:00 AM 3 5 7 2
4 1/12/2018 12:00:00 AM 5 7 4 7
4 1/13/2018 12:00:00 AM 7 2 7 6
4 1/10/2018 12:00:00 AM 2 6 3 8
4 1/11/2018 12:00:00 AM 3 7 7 3
5 1/12/2018 12:00:00 AM 3 2 3 4
5 1/13/2018 12:00:00 AM 7 6 2 7
5 1/10/2018 12:00:00 AM 2 3 2 5
5 1/11/2018 12:00:00 AM 3 5 7 2
5 1/12/2018 12:00:00 AM 5 7 4 7
5 1/13/2018 12:00:00 AM 7 2 7 6
6 1/10/2018 12:00:00 AM 2 6 3 8
6 1/11/2018 12:00:00 AM 3 7 7 3
6 1/12/2018 12:00:00 AM 3 2 3 4
6 1/13/2018 12:00:00 AM 7 6 2 7
所以对于 1/10/2018 3。因此,数据集将如下所示:
Key Date y x1 x2 x3 6th worst
1 1/10/2018 12:00:00 AM 2 3 2 5 3
1 1/11/2018 12:00:00 AM 3 5 7 2 ... (would have values)
1 1/12/2018 12:00:00 AM 5 7 4 7 ... (would have values)
1 1/13/2018 12:00:00 AM 7 2 7 6 ... (would have values)
2 1/10/2018 12:00:00 AM 2 6 3 8 3
2 1/11/2018 12:00:00 AM 3 7 7 3 etc.
2 1/12/2018 12:00:00 AM 3 2 3 4
2 1/13/2018 12:00:00 AM 7 6 2 7
3 1/10/2018 12:00:00 AM 2 3 2 5
3 1/11/2018 12:00:00 AM 3 5 7 2
3 1/12/2018 12:00:00 AM 5 7 4 7
3 1/13/2018 12:00:00 AM 7 2 7 6
3 1/10/2018 12:00:00 AM 2 6 3 8
3 1/11/2018 12:00:00 AM 3 7 7 3
3 1/12/2018 12:00:00 AM 3 2 3 4
3 1/13/2018 12:00:00 AM 7 6 2 7
4 1/10/2018 12:00:00 AM 2 3 2 5
4 1/11/2018 12:00:00 AM 3 5 7 2
4 1/12/2018 12:00:00 AM 5 7 4 7
4 1/13/2018 12:00:00 AM 7 2 7 6
4 1/10/2018 12:00:00 AM 2 6 3 8
4 1/11/2018 12:00:00 AM 3 7 7 3
5 1/12/2018 12:00:00 AM 3 2 3 4
5 1/13/2018 12:00:00 AM 7 6 2 7
5 1/10/2018 12:00:00 AM 2 3 2 5
5 1/11/2018 12:00:00 AM 3 5 7 2
5 1/12/2018 12:00:00 AM 5 7 4 7
5 1/13/2018 12:00:00 AM 7 2 7 6
6 1/10/2018 12:00:00 AM 2 6 3 8
6 1/11/2018 12:00:00 AM 3 7 7 3
6 1/12/2018 12:00:00 AM 3 2 3 4
6 1/13/2018 12:00:00 AM 7 6 2 7
这是我目前的情况:
#获取数据集中第6差的值
n=length(df$y)
df$`6th Worst`= df$`6th Worst`= "-"
df[1,3] = round(-sort(subset(df,c(unique(Date), "y")), partial=n-5)[n-5], digits = 2)
我收到以下错误:
Error in subset.data.frame(reg_predict, unique(reg_predict2$Date)) :
'subset' must be logical
编辑:
问题在几个方面不同于重复标记的问题。特别是事实上我需要一个有条件的第 6 个最坏的场景,而不仅仅是 worst/best 场景。
使用 data.table
包的选项:
library(data.table)
## Generate data
set.seed(1)
RowCount <- 100
DT <- data.table(Date = Sys.Date() + sample.int(3,RowCount,TRUE),
y = sample.int(100,RowCount,TRUE))
## Sort by y
setkey(DT,y)
## Too much to unpack here in inline commments, will expand further down
SixthWorst_DT <- DT[DT[,.I[6],by = .(Date)]$V1,.(Sixth_Worst = y), keyby = .(Date)]
print(SixthWorst_DT)
# Date Sixth_Worst
# 1: 2018-06-27 42
# 2: 2018-06-28 11
# 3: 2018-06-29 22
## Set DT Key to be date for update-join
setkey(DT,Date)
## Temporarily join `SixthWorst_DT` to `DT` (without making a full copy)
## and then create a column in `DT` based on the column `Sixth_Worst` in `SixthWorst_DT`
DT[SixthWorst_DT, Sixth_Worst := i.Sixth_Worst]
## Results
head(DT)
# Date y Sixth_Worst
# 1: 2018-06-27 18 42
# 2: 2018-06-27 18 42
# 3: 2018-06-27 19 42
# 4: 2018-06-27 19 42
# 5: 2018-06-27 39 42
# 6: 2018-06-27 42 42
操作的真正内容是一行:
SixthWorst_DT <- DT[DT[,.I[6],by = .(Date)]$V1,.(Sixth_Worst = y), keyby = .(Date)]
DT[,.I[6],by = .(Date)]
使用特殊符号.I
提取每个日期的第6行号
- 附加的
$V1
提取这些行号的向量
- 然后使用此向量对 DT 进行子集化
- DT 然后被键入 (并隐式排序) 并按
Date
分组以创建具有新列的摘要 table,Sixth_Worst
, 基于 y
要真正了解发生了什么,我建议运行以下陈述。
DT[,.I[6],by = .(Date)]
DT[,.I[6],by = .(Date)]$V1
DT[DT[,.I[6],by = .(Date)]$V1]
DT[DT[,.I[6],by = .(Date)]$V1,.(Sixth_Worst = y), keyby = .(Date)]
带有 dplyr
和 sort
的选项可以是:
注意: 可以在分组前将 Date
列转换为 POSIXct
格式,但我没有注意到任何优点。
library(dplyr)
df %>% group_by(Date) %>%
mutate(Worst6th = sort(y)[6])
# A tibble: 32 x 7
# Groups: Date [4]
Key Date y x1 x2 x3 Worst6th
<int> <chr> <int> <int> <int> <int> <int>
1 1 1/10/2018 12:00:00 AM 2 3 2 5 2
2 1 1/11/2018 12:00:00 AM 3 5 7 2 3
3 1 1/12/2018 12:00:00 AM 5 7 4 7 5
4 1 1/13/2018 12:00:00 AM 7 2 7 6 7
5 2 1/10/2018 12:00:00 AM 2 6 3 8 2
6 2 1/11/2018 12:00:00 AM 3 7 7 3 3
7 2 1/12/2018 12:00:00 AM 3 2 3 4 5
8 2 1/13/2018 12:00:00 AM 7 6 2 7 7
9 3 1/10/2018 12:00:00 AM 2 3 2 5 2
10 3 1/11/2018 12:00:00 AM 3 5 7 2 3
# ... with 22 more rows
数据:
df <- read.table(text="
Key Date y x1 x2 x3
1 '1/10/2018 12:00:00 AM' 2 3 2 5
1 '1/11/2018 12:00:00 AM' 3 5 7 2
1 '1/12/2018 12:00:00 AM' 5 7 4 7
1 '1/13/2018 12:00:00 AM' 7 2 7 6
2 '1/10/2018 12:00:00 AM' 2 6 3 8
2 '1/11/2018 12:00:00 AM' 3 7 7 3
2 '1/12/2018 12:00:00 AM' 3 2 3 4
2 '1/13/2018 12:00:00 AM' 7 6 2 7
3 '1/10/2018 12:00:00 AM' 2 3 2 5
3 '1/11/2018 12:00:00 AM' 3 5 7 2
3 '1/12/2018 12:00:00 AM' 5 7 4 7
3 '1/13/2018 12:00:00 AM' 7 2 7 6
3 '1/10/2018 12:00:00 AM' 2 6 3 8
3 '1/11/2018 12:00:00 AM' 3 7 7 3
3 '1/12/2018 12:00:00 AM' 3 2 3 4
3 '1/13/2018 12:00:00 AM' 7 6 2 7
4 '1/10/2018 12:00:00 AM' 2 3 2 5
4 '1/11/2018 12:00:00 AM' 3 5 7 2
4 '1/12/2018 12:00:00 AM' 5 7 4 7
4 '1/13/2018 12:00:00 AM' 7 2 7 6
4 '1/10/2018 12:00:00 AM' 2 6 3 8
4 '1/11/2018 12:00:00 AM' 3 7 7 3
5 '1/12/2018 12:00:00 AM' 3 2 3 4
5 '1/13/2018 12:00:00 AM' 7 6 2 7
5 '1/10/2018 12:00:00 AM' 2 3 2 5
5 '1/11/2018 12:00:00 AM' 3 5 7 2
5 '1/12/2018 12:00:00 AM' 5 7 4 7
5 '1/13/2018 12:00:00 AM' 7 2 7 6
6 '1/10/2018 12:00:00 AM' 2 6 3 8
6 '1/11/2018 12:00:00 AM' 3 7 7 3
6 '1/12/2018 12:00:00 AM' 3 2 3 4
6 '1/13/2018 12:00:00 AM' 7 6 2 7",
header = TRUE, stringsAsFactors = FALSE)
希望有人能提供帮助。
我正在尝试添加另一列:6th Worst
。我想要做的是让它根据指定的标准产生第 6 个最差的 y
结果:Date
.
这是我的 df 的一个例子:
Key Date y x1 x2 x3
1 1/10/2018 12:00:00 AM 2 3 2 5
1 1/11/2018 12:00:00 AM 3 5 7 2
1 1/12/2018 12:00:00 AM 5 7 4 7
1 1/13/2018 12:00:00 AM 7 2 7 6
2 1/10/2018 12:00:00 AM 2 6 3 8
2 1/11/2018 12:00:00 AM 3 7 7 3
2 1/12/2018 12:00:00 AM 3 2 3 4
2 1/13/2018 12:00:00 AM 7 6 2 7
3 1/10/2018 12:00:00 AM 2 3 2 5
3 1/11/2018 12:00:00 AM 3 5 7 2
3 1/12/2018 12:00:00 AM 5 7 4 7
3 1/13/2018 12:00:00 AM 7 2 7 6
3 1/10/2018 12:00:00 AM 2 6 3 8
3 1/11/2018 12:00:00 AM 3 7 7 3
3 1/12/2018 12:00:00 AM 3 2 3 4
3 1/13/2018 12:00:00 AM 7 6 2 7
4 1/10/2018 12:00:00 AM 2 3 2 5
4 1/11/2018 12:00:00 AM 3 5 7 2
4 1/12/2018 12:00:00 AM 5 7 4 7
4 1/13/2018 12:00:00 AM 7 2 7 6
4 1/10/2018 12:00:00 AM 2 6 3 8
4 1/11/2018 12:00:00 AM 3 7 7 3
5 1/12/2018 12:00:00 AM 3 2 3 4
5 1/13/2018 12:00:00 AM 7 6 2 7
5 1/10/2018 12:00:00 AM 2 3 2 5
5 1/11/2018 12:00:00 AM 3 5 7 2
5 1/12/2018 12:00:00 AM 5 7 4 7
5 1/13/2018 12:00:00 AM 7 2 7 6
6 1/10/2018 12:00:00 AM 2 6 3 8
6 1/11/2018 12:00:00 AM 3 7 7 3
6 1/12/2018 12:00:00 AM 3 2 3 4
6 1/13/2018 12:00:00 AM 7 6 2 7
所以对于 1/10/2018 3。因此,数据集将如下所示:
Key Date y x1 x2 x3 6th worst
1 1/10/2018 12:00:00 AM 2 3 2 5 3
1 1/11/2018 12:00:00 AM 3 5 7 2 ... (would have values)
1 1/12/2018 12:00:00 AM 5 7 4 7 ... (would have values)
1 1/13/2018 12:00:00 AM 7 2 7 6 ... (would have values)
2 1/10/2018 12:00:00 AM 2 6 3 8 3
2 1/11/2018 12:00:00 AM 3 7 7 3 etc.
2 1/12/2018 12:00:00 AM 3 2 3 4
2 1/13/2018 12:00:00 AM 7 6 2 7
3 1/10/2018 12:00:00 AM 2 3 2 5
3 1/11/2018 12:00:00 AM 3 5 7 2
3 1/12/2018 12:00:00 AM 5 7 4 7
3 1/13/2018 12:00:00 AM 7 2 7 6
3 1/10/2018 12:00:00 AM 2 6 3 8
3 1/11/2018 12:00:00 AM 3 7 7 3
3 1/12/2018 12:00:00 AM 3 2 3 4
3 1/13/2018 12:00:00 AM 7 6 2 7
4 1/10/2018 12:00:00 AM 2 3 2 5
4 1/11/2018 12:00:00 AM 3 5 7 2
4 1/12/2018 12:00:00 AM 5 7 4 7
4 1/13/2018 12:00:00 AM 7 2 7 6
4 1/10/2018 12:00:00 AM 2 6 3 8
4 1/11/2018 12:00:00 AM 3 7 7 3
5 1/12/2018 12:00:00 AM 3 2 3 4
5 1/13/2018 12:00:00 AM 7 6 2 7
5 1/10/2018 12:00:00 AM 2 3 2 5
5 1/11/2018 12:00:00 AM 3 5 7 2
5 1/12/2018 12:00:00 AM 5 7 4 7
5 1/13/2018 12:00:00 AM 7 2 7 6
6 1/10/2018 12:00:00 AM 2 6 3 8
6 1/11/2018 12:00:00 AM 3 7 7 3
6 1/12/2018 12:00:00 AM 3 2 3 4
6 1/13/2018 12:00:00 AM 7 6 2 7
这是我目前的情况:
#获取数据集中第6差的值
n=length(df$y)
df$`6th Worst`= df$`6th Worst`= "-"
df[1,3] = round(-sort(subset(df,c(unique(Date), "y")), partial=n-5)[n-5], digits = 2)
我收到以下错误:
Error in subset.data.frame(reg_predict, unique(reg_predict2$Date)) :
'subset' must be logical
编辑: 问题在几个方面不同于重复标记的问题。特别是事实上我需要一个有条件的第 6 个最坏的场景,而不仅仅是 worst/best 场景。
使用 data.table
包的选项:
library(data.table)
## Generate data
set.seed(1)
RowCount <- 100
DT <- data.table(Date = Sys.Date() + sample.int(3,RowCount,TRUE),
y = sample.int(100,RowCount,TRUE))
## Sort by y
setkey(DT,y)
## Too much to unpack here in inline commments, will expand further down
SixthWorst_DT <- DT[DT[,.I[6],by = .(Date)]$V1,.(Sixth_Worst = y), keyby = .(Date)]
print(SixthWorst_DT)
# Date Sixth_Worst
# 1: 2018-06-27 42
# 2: 2018-06-28 11
# 3: 2018-06-29 22
## Set DT Key to be date for update-join
setkey(DT,Date)
## Temporarily join `SixthWorst_DT` to `DT` (without making a full copy)
## and then create a column in `DT` based on the column `Sixth_Worst` in `SixthWorst_DT`
DT[SixthWorst_DT, Sixth_Worst := i.Sixth_Worst]
## Results
head(DT)
# Date y Sixth_Worst
# 1: 2018-06-27 18 42
# 2: 2018-06-27 18 42
# 3: 2018-06-27 19 42
# 4: 2018-06-27 19 42
# 5: 2018-06-27 39 42
# 6: 2018-06-27 42 42
操作的真正内容是一行:
SixthWorst_DT <- DT[DT[,.I[6],by = .(Date)]$V1,.(Sixth_Worst = y), keyby = .(Date)]
DT[,.I[6],by = .(Date)]
使用特殊符号.I
提取每个日期的第6行号- 附加的
$V1
提取这些行号的向量 - 然后使用此向量对 DT 进行子集化
- DT 然后被键入 (并隐式排序) 并按
Date
分组以创建具有新列的摘要 table,Sixth_Worst
, 基于y
要真正了解发生了什么,我建议运行以下陈述。
DT[,.I[6],by = .(Date)]
DT[,.I[6],by = .(Date)]$V1
DT[DT[,.I[6],by = .(Date)]$V1]
DT[DT[,.I[6],by = .(Date)]$V1,.(Sixth_Worst = y), keyby = .(Date)]
带有 dplyr
和 sort
的选项可以是:
注意: 可以在分组前将 Date
列转换为 POSIXct
格式,但我没有注意到任何优点。
library(dplyr)
df %>% group_by(Date) %>%
mutate(Worst6th = sort(y)[6])
# A tibble: 32 x 7
# Groups: Date [4]
Key Date y x1 x2 x3 Worst6th
<int> <chr> <int> <int> <int> <int> <int>
1 1 1/10/2018 12:00:00 AM 2 3 2 5 2
2 1 1/11/2018 12:00:00 AM 3 5 7 2 3
3 1 1/12/2018 12:00:00 AM 5 7 4 7 5
4 1 1/13/2018 12:00:00 AM 7 2 7 6 7
5 2 1/10/2018 12:00:00 AM 2 6 3 8 2
6 2 1/11/2018 12:00:00 AM 3 7 7 3 3
7 2 1/12/2018 12:00:00 AM 3 2 3 4 5
8 2 1/13/2018 12:00:00 AM 7 6 2 7 7
9 3 1/10/2018 12:00:00 AM 2 3 2 5 2
10 3 1/11/2018 12:00:00 AM 3 5 7 2 3
# ... with 22 more rows
数据:
df <- read.table(text="
Key Date y x1 x2 x3
1 '1/10/2018 12:00:00 AM' 2 3 2 5
1 '1/11/2018 12:00:00 AM' 3 5 7 2
1 '1/12/2018 12:00:00 AM' 5 7 4 7
1 '1/13/2018 12:00:00 AM' 7 2 7 6
2 '1/10/2018 12:00:00 AM' 2 6 3 8
2 '1/11/2018 12:00:00 AM' 3 7 7 3
2 '1/12/2018 12:00:00 AM' 3 2 3 4
2 '1/13/2018 12:00:00 AM' 7 6 2 7
3 '1/10/2018 12:00:00 AM' 2 3 2 5
3 '1/11/2018 12:00:00 AM' 3 5 7 2
3 '1/12/2018 12:00:00 AM' 5 7 4 7
3 '1/13/2018 12:00:00 AM' 7 2 7 6
3 '1/10/2018 12:00:00 AM' 2 6 3 8
3 '1/11/2018 12:00:00 AM' 3 7 7 3
3 '1/12/2018 12:00:00 AM' 3 2 3 4
3 '1/13/2018 12:00:00 AM' 7 6 2 7
4 '1/10/2018 12:00:00 AM' 2 3 2 5
4 '1/11/2018 12:00:00 AM' 3 5 7 2
4 '1/12/2018 12:00:00 AM' 5 7 4 7
4 '1/13/2018 12:00:00 AM' 7 2 7 6
4 '1/10/2018 12:00:00 AM' 2 6 3 8
4 '1/11/2018 12:00:00 AM' 3 7 7 3
5 '1/12/2018 12:00:00 AM' 3 2 3 4
5 '1/13/2018 12:00:00 AM' 7 6 2 7
5 '1/10/2018 12:00:00 AM' 2 3 2 5
5 '1/11/2018 12:00:00 AM' 3 5 7 2
5 '1/12/2018 12:00:00 AM' 5 7 4 7
5 '1/13/2018 12:00:00 AM' 7 2 7 6
6 '1/10/2018 12:00:00 AM' 2 6 3 8
6 '1/11/2018 12:00:00 AM' 3 7 7 3
6 '1/12/2018 12:00:00 AM' 3 2 3 4
6 '1/13/2018 12:00:00 AM' 7 6 2 7",
header = TRUE, stringsAsFactors = FALSE)