R:按列平滑 df 值
R: smooth df values by columns
我有一个 df,我想将其转换为折线图。
正如您在下面看到的,这些图表有很多尖峰,所以我决定我应该平滑每个变量的值。
有没有办法在不使用循环的情况下用 20 行平均值替换 df 的每个单元格?
示例数据:
df = structure(list(Date = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90,
100, 110, 120, 130, 140, 150, 160, 170, 180, 190), `0_3` = c(2.96069423175089,
2.98934234468417, 3.0166710770045, 2.93318848451928, 2.9029582526956,
2.93122886133033, 2.95467584624211, 2.92056074766355, 2.9673590504451,
2.99909118448955, 3.0678648899907, 3.08758664146188, 3.16639741518578,
3.1981536432575, 3.23886639676113, 3.32871012482663, 3.2554847841472,
3.33575054387237, 3.25720703856234, 3.28495034377387), `0_6` = c(2.65441551812149,
2.70340525084481, 2.75205080709182, 2.71591526344378, 2.76472214542438,
2.73393461104848, 2.75387263339071, 2.77453271028037, 2.7299703264095,
2.66585883065738, 2.69600247908274, 2.67800882167612, 2.7140549273021,
2.63765248928454, 2.69905533063428, 2.66990291262136, 2.689313517339,
2.75562001450326, 2.77049794084613, 2.78838808250573)), .Names = c("Date",
"0_3", "0_6"), row.names = c(NA, 20L), class = "data.frame")
到目前为止我只能用循环来平滑它:
smooth_factor = 5
smooth_df = df[smooth_factor:nrow(df),]
for (i in rownames(smooth_df)) {
i = as.numeric(i)
for (j in colnames(smooth_df)[2:ncol(smooth_df)]){
# The first column contains Date that should not be smoothed
smooth_percent[i,j] = mean(df[(i-smooth_factor):i,j])
}
}
smooth_df$Date = df$Date
如果我将此方法应用于更大的数据集,这就是它的样子:
变成
您正在寻找 apply (base) 和 rollmean(来自包 zoo)
library(zoo)
npoints <- 5
apply(df,2,function(x){rollmean(x,npoints)})
Date 0_3 0_6
[1,] 20 2.960571 2.718102
[2,] 30 2.954678 2.734006
[3,] 40 2.947745 2.744099
[4,] 50 2.928522 2.748595
[5,] 60 2.935357 2.751406
[6,] 70 2.954583 2.731634
[7,] 80 2.981910 2.724047
[8,] 90 3.008493 2.708875
[9,] 100 3.057660 2.696779
[10,] 110 3.103819 2.678316
[11,] 120 3.151774 2.684955
[12,] 130 3.203943 2.679735
[13,] 140 3.237522 2.681996
[14,] 150 3.271393 2.690309
[15,] 160 3.283204 2.716878
[16,] 170 3.292421 2.734744
与 data.table 将是:
df[, lapply(.SD,function(x){rollmean(x,npoints)}),.SDcols = names(df)]
我有一个 df,我想将其转换为折线图。 正如您在下面看到的,这些图表有很多尖峰,所以我决定我应该平滑每个变量的值。
有没有办法在不使用循环的情况下用 20 行平均值替换 df 的每个单元格?
示例数据:
df = structure(list(Date = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90,
100, 110, 120, 130, 140, 150, 160, 170, 180, 190), `0_3` = c(2.96069423175089,
2.98934234468417, 3.0166710770045, 2.93318848451928, 2.9029582526956,
2.93122886133033, 2.95467584624211, 2.92056074766355, 2.9673590504451,
2.99909118448955, 3.0678648899907, 3.08758664146188, 3.16639741518578,
3.1981536432575, 3.23886639676113, 3.32871012482663, 3.2554847841472,
3.33575054387237, 3.25720703856234, 3.28495034377387), `0_6` = c(2.65441551812149,
2.70340525084481, 2.75205080709182, 2.71591526344378, 2.76472214542438,
2.73393461104848, 2.75387263339071, 2.77453271028037, 2.7299703264095,
2.66585883065738, 2.69600247908274, 2.67800882167612, 2.7140549273021,
2.63765248928454, 2.69905533063428, 2.66990291262136, 2.689313517339,
2.75562001450326, 2.77049794084613, 2.78838808250573)), .Names = c("Date",
"0_3", "0_6"), row.names = c(NA, 20L), class = "data.frame")
到目前为止我只能用循环来平滑它:
smooth_factor = 5
smooth_df = df[smooth_factor:nrow(df),]
for (i in rownames(smooth_df)) {
i = as.numeric(i)
for (j in colnames(smooth_df)[2:ncol(smooth_df)]){
# The first column contains Date that should not be smoothed
smooth_percent[i,j] = mean(df[(i-smooth_factor):i,j])
}
}
smooth_df$Date = df$Date
如果我将此方法应用于更大的数据集,这就是它的样子:
变成
您正在寻找 apply (base) 和 rollmean(来自包 zoo)
library(zoo)
npoints <- 5
apply(df,2,function(x){rollmean(x,npoints)})
Date 0_3 0_6
[1,] 20 2.960571 2.718102
[2,] 30 2.954678 2.734006
[3,] 40 2.947745 2.744099
[4,] 50 2.928522 2.748595
[5,] 60 2.935357 2.751406
[6,] 70 2.954583 2.731634
[7,] 80 2.981910 2.724047
[8,] 90 3.008493 2.708875
[9,] 100 3.057660 2.696779
[10,] 110 3.103819 2.678316
[11,] 120 3.151774 2.684955
[12,] 130 3.203943 2.679735
[13,] 140 3.237522 2.681996
[14,] 150 3.271393 2.690309
[15,] 160 3.283204 2.716878
[16,] 170 3.292421 2.734744
与 data.table 将是:
df[, lapply(.SD,function(x){rollmean(x,npoints)}),.SDcols = names(df)]