如何使用 R 将零或 NA 放入具有两种日期类型的数据框中?
How can I put zero or NA in a dataframe having 2 type of dates using R?
基本思想是添加 0 或 NA,因此在当天没有值的变量中添加 NA 或 0 的行。enter image description here
该函数应该对齐两个变量,但也应该用于价格,同时在没有值的变量中添加 NA。
enter image description here
这是我的数据集,我想添加 0 和 NA,例如在变量期货和变量日期中错过此值的时期。
|数据 |日期 | ‘期货’
| | |
|1 2021-12-23 | 2021-12-23 | 1388.17
|2 2021-12-22 | 2021-12-22 | 1432.36
|3 2021-12-21 | 2021-12-21 | 1508.98
|4 2021-12-20 | 2021-12-20 | 1493.13
|5 2021-12-19 | 2021-12-17 | 1379.97
|6 2021-12-18 | 2021-12-16 | 1597.91
函数应该大致像这样工作:
|数据 |日期 | ‘期货’
| | |
|1 2021-12-23 | 2021-12-23 | 1388.17
|2 2021-12-22 | 2021-12-22 | 1432.36
|3 2021-12-21 | 2021-12-21 | 1508.98
|4 2021-12-20 | 2021-12-20 | 1493.13
|5 2021-12-19 |不适用 |不适用
|6 2021-12-18 |不适用 |不适用
我想过for循环,但我做不到。
非常感谢您的帮助
我创建了一些任意数据并随机删除了行。这是根据您对问题所做的更改进行编辑的。你没有具体说明这一点,但我假设如果第一个日期字段是 NA,你想保留价格。
library(tidyverse)
daten <- seq(as.Date("2020/10/23"), as.Date("2021/10/12"), "days")
price <- round(runif(350, 1000, 1500), digits = 2)
dff <- data.frame(dateOne = sort(sample(daten, size = 325, replace = F),
decreasing = T),
dateTwo = sort(sample(daten, size = 325, replace = F),
decreasing = T),
futures = sample(price, size = 325, replace = T))
这是基于日期是有序的假设。
ordering <- function(d, dT, df1){ # two fields with dates and the data frame
# get indices of date columns
tellMe <- which(colnames(df1) %in% c(d, dT))
# create ranking (to return original sorting)
df1$rank <- 1:nrow(df1)
# separate and sort date columns
# the first date; the rank
dc <<- df1[, c(tellMe[[1]], ncol(df1))] %>% arrange_at(vars(d))
# everything *except the first date field & rank
dTc <<- df1[, -c(tellMe[[1]], ncol(df1))] %>% arrange_at(vars(dT))
# identify the index of the date in dTc
tellMe2 <- which(colnames(dTc) == {{ dT }})
# find differences
# missing in the first date field
# the index of the date is in tellMe2
dfd2 <- dTc[!dTc[, tellMe2] %in% dc[, 1], tellMe2]
# missing in the second date field
dfd <- dc[!dc[, 1] %in% dTc[, tellMe2], 1] # since date is in column 1
# find indices of where the NA's need to be placed
dcInt <<- lapply(dfd2,
findInterval,
unlist(dc[, 1])) %>%
unlist()
dTcInt <<- lapply(dfd,
findInterval,
unlist(dTc[, tellMe2])) %>%
unlist()
# build up with differences as NA
# preceding index provided, offset by index number - 1
for(i in 1:length(dcInt)){
dc <- rbind(dc[0:(dcInt[[i]] + i - 1), ], # everything before
rep(NA, times = ncol(dc)),
dc[(dcInt[[i]] + i):nrow(dc), ], # everything after
make.row.names = F)
}
# preceding index provided, offset by index number - 1
for(j in 1:length(dTcInt)){
dTc <- rbind(dTc[0:(dTcInt[[j]] + j - 1), ], # everything before
rep(NA, times = ncol(dTc)),
dTc[(dTcInt[[j]] + j):nrow(dTc), ], # everything after
make.row.names = F)
}
# reassemble the data, in the original order
df2 <- cbind(dc, dTc) %>%
select(colnames(df1), rank)
# check row order
# if the ranking added has any number 1:10 in the first 10 rows
if(length(df2[1:10, ]$rank %in% 1:10) == 0){
# add a new ranking variable
df2$rank2 <- 1:nrow(df2)
# reverse the new ranking variable and delete both ranking variables
df2 <- arrange(df2, -rank2) %>% select(-rank, -rank2)
} else {
# delete the ranking variable; they are already in the right order
df2 <- select(df2, -rank)
}
return(df2)
}
现在您可以对数据使用此函数了。
tryIt <- ordering("dateOne", "dateTwo", dff)
head(tryIt)
这将 return 一个数据框。发送到函数时,无论是按日期递增还是递减排序,它都会 return 按照发送的顺序排列。
tail(tryIt, n = 15)
基本思想是添加 0 或 NA,因此在当天没有值的变量中添加 NA 或 0 的行。enter image description here
该函数应该对齐两个变量,但也应该用于价格,同时在没有值的变量中添加 NA。 enter image description here
这是我的数据集,我想添加 0 和 NA,例如在变量期货和变量日期中错过此值的时期。
|数据 |日期 | ‘期货’
| | |
|1 2021-12-23 | 2021-12-23 | 1388.17
|2 2021-12-22 | 2021-12-22 | 1432.36
|3 2021-12-21 | 2021-12-21 | 1508.98
|4 2021-12-20 | 2021-12-20 | 1493.13
|5 2021-12-19 | 2021-12-17 | 1379.97
|6 2021-12-18 | 2021-12-16 | 1597.91
函数应该大致像这样工作:
|数据 |日期 | ‘期货’
| | |
|1 2021-12-23 | 2021-12-23 | 1388.17
|2 2021-12-22 | 2021-12-22 | 1432.36
|3 2021-12-21 | 2021-12-21 | 1508.98
|4 2021-12-20 | 2021-12-20 | 1493.13
|5 2021-12-19 |不适用 |不适用
|6 2021-12-18 |不适用 |不适用
我想过for循环,但我做不到。
非常感谢您的帮助
我创建了一些任意数据并随机删除了行。这是根据您对问题所做的更改进行编辑的。你没有具体说明这一点,但我假设如果第一个日期字段是 NA,你想保留价格。
library(tidyverse)
daten <- seq(as.Date("2020/10/23"), as.Date("2021/10/12"), "days")
price <- round(runif(350, 1000, 1500), digits = 2)
dff <- data.frame(dateOne = sort(sample(daten, size = 325, replace = F),
decreasing = T),
dateTwo = sort(sample(daten, size = 325, replace = F),
decreasing = T),
futures = sample(price, size = 325, replace = T))
这是基于日期是有序的假设。
ordering <- function(d, dT, df1){ # two fields with dates and the data frame
# get indices of date columns
tellMe <- which(colnames(df1) %in% c(d, dT))
# create ranking (to return original sorting)
df1$rank <- 1:nrow(df1)
# separate and sort date columns
# the first date; the rank
dc <<- df1[, c(tellMe[[1]], ncol(df1))] %>% arrange_at(vars(d))
# everything *except the first date field & rank
dTc <<- df1[, -c(tellMe[[1]], ncol(df1))] %>% arrange_at(vars(dT))
# identify the index of the date in dTc
tellMe2 <- which(colnames(dTc) == {{ dT }})
# find differences
# missing in the first date field
# the index of the date is in tellMe2
dfd2 <- dTc[!dTc[, tellMe2] %in% dc[, 1], tellMe2]
# missing in the second date field
dfd <- dc[!dc[, 1] %in% dTc[, tellMe2], 1] # since date is in column 1
# find indices of where the NA's need to be placed
dcInt <<- lapply(dfd2,
findInterval,
unlist(dc[, 1])) %>%
unlist()
dTcInt <<- lapply(dfd,
findInterval,
unlist(dTc[, tellMe2])) %>%
unlist()
# build up with differences as NA
# preceding index provided, offset by index number - 1
for(i in 1:length(dcInt)){
dc <- rbind(dc[0:(dcInt[[i]] + i - 1), ], # everything before
rep(NA, times = ncol(dc)),
dc[(dcInt[[i]] + i):nrow(dc), ], # everything after
make.row.names = F)
}
# preceding index provided, offset by index number - 1
for(j in 1:length(dTcInt)){
dTc <- rbind(dTc[0:(dTcInt[[j]] + j - 1), ], # everything before
rep(NA, times = ncol(dTc)),
dTc[(dTcInt[[j]] + j):nrow(dTc), ], # everything after
make.row.names = F)
}
# reassemble the data, in the original order
df2 <- cbind(dc, dTc) %>%
select(colnames(df1), rank)
# check row order
# if the ranking added has any number 1:10 in the first 10 rows
if(length(df2[1:10, ]$rank %in% 1:10) == 0){
# add a new ranking variable
df2$rank2 <- 1:nrow(df2)
# reverse the new ranking variable and delete both ranking variables
df2 <- arrange(df2, -rank2) %>% select(-rank, -rank2)
} else {
# delete the ranking variable; they are already in the right order
df2 <- select(df2, -rank)
}
return(df2)
}
现在您可以对数据使用此函数了。
tryIt <- ordering("dateOne", "dateTwo", dff)
head(tryIt)
这将 return 一个数据框。发送到函数时,无论是按日期递增还是递减排序,它都会 return 按照发送的顺序排列。
tail(tryIt, n = 15)