在r中将数据从一个数据帧提取到另一个数据帧
Extracting data from one dataframe to another in r
我有一个数据框,其中包含几年来证券交易所的每日价格及其各自的日期。我想提取一个月内的最后 3 个观察结果和下个月的前 5 个观察结果,每个月,并将其存储在一个新的数据框中。
除了日期(格式为“%Y-%m-%d”)之外,我还有一列包含每月每个交易日的计数器。示例数据如下所示:
df$date <- as.Date(c("2017-03-25","2017-03-26","2017-03-27","2017-03-29","2017-03-30",
"2017-03-31","2017-04-03","2017-04-04","2017-04-05","2017-04-06",
"2017-04-07","2017-04-08","2017-04-09"))
df$DayofMonth <- c(18,19,20,21,22,23,1,2,3,4,5,6,7)
df$price <- (100, 100.53, 101.3 ,100.94, 101.42, 101.40, 101.85, 102, 101.9, 102, 102.31, 102.1, 102.23)
现在我想提取 3 月份的最后 3 个观测值和 4 月份的前 5 个观测值(然后是 4 月份的最后 3 个观测值和 5 月份的前 5 个观测值等,包括相应行的所有列)和将其存储在新的数据框中。唯一的问题是我该怎么做?
感谢您的帮助!
第一个想法:
date <- c("2017-03-25","2017-03-26","2017-03-27","2017-03-29","2017-03-30",
"2017-03-31","2017-04-03","2017-04-04","2017-04-05","2017-04-06",
"2017-04-07","2017-04-08","2017-04-09")
df <- data.table(Date = date)
df[,YearMonth:=str_sub(Date,1,7)]
df[, DayofMonth := seq(.N), by = YearMonth]
first <- df[, .SD[1:ifelse(.N < 5, .N, 5)], by = YearMonth] #first trading days each month
last <- df[, .SD[(ifelse((.N-2) < 0, 0, (.N-2))):.N], by = YearMonth] #last trading days each month
final <- rbind(first, last)
setorder(final, Date)
# be aware that it leads to duplicates for a month if it has less than 8 trading days,
# to resolve that use unique()
final <- unique(final)
快速而肮脏:
添加一个类似于 DayofMonth 列的列,但向下移动 3
df$dom2 <- df$DayofMonth[4:(nrow(df)+3)]
subset(df, DayofMonth<=5 | dom2<=3)
我们仍然使用实际的 DayofMonth 列(而不是说 dom2<=8)进行过滤的唯一原因是在 dom2 的末尾会有一个 NA 用于您的示例。不知道你的真实数据是什么样的,但安全总比后悔好。
我有一个数据框,其中包含几年来证券交易所的每日价格及其各自的日期。我想提取一个月内的最后 3 个观察结果和下个月的前 5 个观察结果,每个月,并将其存储在一个新的数据框中。
除了日期(格式为“%Y-%m-%d”)之外,我还有一列包含每月每个交易日的计数器。示例数据如下所示:
df$date <- as.Date(c("2017-03-25","2017-03-26","2017-03-27","2017-03-29","2017-03-30",
"2017-03-31","2017-04-03","2017-04-04","2017-04-05","2017-04-06",
"2017-04-07","2017-04-08","2017-04-09"))
df$DayofMonth <- c(18,19,20,21,22,23,1,2,3,4,5,6,7)
df$price <- (100, 100.53, 101.3 ,100.94, 101.42, 101.40, 101.85, 102, 101.9, 102, 102.31, 102.1, 102.23)
现在我想提取 3 月份的最后 3 个观测值和 4 月份的前 5 个观测值(然后是 4 月份的最后 3 个观测值和 5 月份的前 5 个观测值等,包括相应行的所有列)和将其存储在新的数据框中。唯一的问题是我该怎么做?
感谢您的帮助!
第一个想法:
date <- c("2017-03-25","2017-03-26","2017-03-27","2017-03-29","2017-03-30",
"2017-03-31","2017-04-03","2017-04-04","2017-04-05","2017-04-06",
"2017-04-07","2017-04-08","2017-04-09")
df <- data.table(Date = date)
df[,YearMonth:=str_sub(Date,1,7)]
df[, DayofMonth := seq(.N), by = YearMonth]
first <- df[, .SD[1:ifelse(.N < 5, .N, 5)], by = YearMonth] #first trading days each month
last <- df[, .SD[(ifelse((.N-2) < 0, 0, (.N-2))):.N], by = YearMonth] #last trading days each month
final <- rbind(first, last)
setorder(final, Date)
# be aware that it leads to duplicates for a month if it has less than 8 trading days,
# to resolve that use unique()
final <- unique(final)
快速而肮脏: 添加一个类似于 DayofMonth 列的列,但向下移动 3
df$dom2 <- df$DayofMonth[4:(nrow(df)+3)]
subset(df, DayofMonth<=5 | dom2<=3)
我们仍然使用实际的 DayofMonth 列(而不是说 dom2<=8)进行过滤的唯一原因是在 dom2 的末尾会有一个 NA 用于您的示例。不知道你的真实数据是什么样的,但安全总比后悔好。