R中的线性插值或重采样
Linear interpolation or resampling in R
我有一个与插值相关的问题。我有 2 列($1 是以秒为单位的时间,另一个是海平面)。我尝试过的例子大多来自日期列,例如1970-11-11,但我有秒数的记录,我想将其线性插值到分钟数。采样最初是每 0.3 秒一次。关于哪个包最好,请问有什么建议吗?在下文中,它生成了一个大矩阵,但没有按预期减少值的数量。格式只有 2 列。尝试在进一步分析中使用,数据不是每 0.1 秒而是每 1 分钟采样一次。
set.seed(1);
time <- rep(seq(0,180,by=0.1));
sl <-runif(1801,-0.1,4.0);
data1 <- cbind2(time,sl);
#Output needed...
time(min) sl(cm)
#Examples tried:
time<-data1$V1
SL<-data1$V2
seq1 <- zoo(order.by=((seq(min(time), max(time), by=30))))
mer1 <- merge(zoo(x=data1[1:2],order.by=time), seq1)
#Linear interpolation
dataL <- na.approx(mer1)
这是一种解决方案。这种方法不使用任何线性插值,而是取以每分钟为中心的平均值。
library(dplyr) # for group_by and summarize
colnames(data1) <- c("time", "sl") # makes it easier to call variables by names
data1 <- as.data.frame(data1)
data1$minute <- round(data1$time/60,0) #
head(data1)
# time sl minute
# 1 0.0 0.9885855 0
# 2 0.1 1.4257080 0
# 3 0.2 2.2486988 0
# 4 0.3 3.6236519 0
# 5 0.4 0.7268959 0
# 6 0.5 3.5833977 0
data_by_minute <- data1 %>%
group_by(minute) %>%
summarize(sl_avg = mean(sl))
data_by_minute
# # A tibble: 4 x 2
# minute sl_avg
# <dbl> <dbl>
# 1 0 1.91
# 2 1 1.98
# 3 2 1.87
# 4 3 1.96
如果您只想每分钟读取一次实际读数,而不是计算平均值,另一种方法是:
data1[data1$time%%60==0,] # only returns the observations on the minute. throws everything else out
# time sl
# 1 0 0.9885855
# 601 60 3.2384322
# 1201 120 1.4027590
# 1801 180 0.1525986
或者,如果您正在寻找一个插值,您可以使用:
minutes <- time/60 # calculate minutes based on the time variable
mod_leoss <- loess(minutes~sl) # fit a loess model to your data, this is essentially a smoothed version of your sl data based on time
Minute <- c(0,1,2,3) # minutes for which you want a predicaiton
SL_Preds <- predict(mod_leoss, Minute) # calculate values from the model
tableA <- cbind(Minute, SL_Preds)
tableA
# Minute SL_Preds
# [1,] 0 1.665899
# [2,] 1 1.463291
# [3,] 2 1.445809
# [4,] 3 1.498165
我有一个与插值相关的问题。我有 2 列($1 是以秒为单位的时间,另一个是海平面)。我尝试过的例子大多来自日期列,例如1970-11-11,但我有秒数的记录,我想将其线性插值到分钟数。采样最初是每 0.3 秒一次。关于哪个包最好,请问有什么建议吗?在下文中,它生成了一个大矩阵,但没有按预期减少值的数量。格式只有 2 列。尝试在进一步分析中使用,数据不是每 0.1 秒而是每 1 分钟采样一次。
set.seed(1);
time <- rep(seq(0,180,by=0.1));
sl <-runif(1801,-0.1,4.0);
data1 <- cbind2(time,sl);
#Output needed...
time(min) sl(cm)
#Examples tried:
time<-data1$V1
SL<-data1$V2
seq1 <- zoo(order.by=((seq(min(time), max(time), by=30))))
mer1 <- merge(zoo(x=data1[1:2],order.by=time), seq1)
#Linear interpolation
dataL <- na.approx(mer1)
这是一种解决方案。这种方法不使用任何线性插值,而是取以每分钟为中心的平均值。
library(dplyr) # for group_by and summarize
colnames(data1) <- c("time", "sl") # makes it easier to call variables by names
data1 <- as.data.frame(data1)
data1$minute <- round(data1$time/60,0) #
head(data1)
# time sl minute
# 1 0.0 0.9885855 0
# 2 0.1 1.4257080 0
# 3 0.2 2.2486988 0
# 4 0.3 3.6236519 0
# 5 0.4 0.7268959 0
# 6 0.5 3.5833977 0
data_by_minute <- data1 %>%
group_by(minute) %>%
summarize(sl_avg = mean(sl))
data_by_minute
# # A tibble: 4 x 2
# minute sl_avg
# <dbl> <dbl>
# 1 0 1.91
# 2 1 1.98
# 3 2 1.87
# 4 3 1.96
如果您只想每分钟读取一次实际读数,而不是计算平均值,另一种方法是:
data1[data1$time%%60==0,] # only returns the observations on the minute. throws everything else out
# time sl
# 1 0 0.9885855
# 601 60 3.2384322
# 1201 120 1.4027590
# 1801 180 0.1525986
或者,如果您正在寻找一个插值,您可以使用:
minutes <- time/60 # calculate minutes based on the time variable
mod_leoss <- loess(minutes~sl) # fit a loess model to your data, this is essentially a smoothed version of your sl data based on time
Minute <- c(0,1,2,3) # minutes for which you want a predicaiton
SL_Preds <- predict(mod_leoss, Minute) # calculate values from the model
tableA <- cbind(Minute, SL_Preds)
tableA
# Minute SL_Preds
# [1,] 0 1.665899
# [2,] 1 1.463291
# [3,] 2 1.445809
# [4,] 3 1.498165