在 R 中处理 NA 的聚合函数
Handling NA's in aggregate function in R
我正在尝试使用聚合函数从 csv 文件中获取每日总和,但我遇到了以下错误:
Error in Summary.factor(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), na.rm = FALSE) : ‘sum’ not meaningful for factors
Calls: aggregate ... aggregate.data.frame -> lapply -> FUN -> lapply -> Summary.factor
Execution halted
这里是link到数据Data
这是我的代码:
dat<-read.csv("Laoag_tc_induced.csv",header=TRUE,sep=",")
dat[dat == -999] <- NA
dat[dat == -888] <- 0
dat$Date <- as.Date(strptime(dat$key, '%Y_%m_%d_%H'))
df <- data.frame(dat$Date,dat$RR,dat$dist)
df <- aggregate(RR ~ Date, dat,sum)
names(df)[1] <- "Date"
names(df)[2] <- "Rain"
write.table(df,file="test.csv",sep=",")
我尝试使用:
df <- aggregate(RR ~ Date, dat,sum,na.rm=TRUE)
和
df <- aggregate(RR ~ Date,dat,sum,na.rm=TRUE,na.action=na.pass)
错误依旧:
‘sum’ not meaningful for factors
'RR' 中有某些元素,即 " NA"
,将列的 class 更改为 factor
(也使用 stringsAsFactors = FALSE
)。该选项是指定 na.strings
中的 NA 字符串被读取为 NA
dat <- read.csv(file, header = TRUE, stringsAsFactors = FALSE,
na.strings = " NA", strip.white = TRUE)
完成 OP 后 transformation/replacement、
res <- aggregate(RR ~ Date, dat,sum)
head(res, 5)
# Date RR
#1 1994-08-09 0.0
#2 1994-08-10 0.0
#3 1994-08-11 0.0
#4 1994-08-12 0.3
#5 1994-08-13 0.0
由于 OP 表示日期正在更改,根据提供的数据它工作正常
dat[78:81,]
# X.1 key SN CY Lat.x Lon.x X RR Lat.y Lon.y dist Date
#78 78 1994_8_19_0 199419 19 0.3700098 2.230531 49133 28.8 0.3176499 2.104727 824.8680 1994-08-19
#79 79 1994_8_19_6 199419 19 0.3787364 2.214823 49134 28.8 0.3176499 2.104727 765.4631 1994-08-19
#80 80 1994_8_19_12 199419 19 0.3857178 2.200860 49135 28.8 0.3176499 2.104727 720.0335 1994-08-19
#81 81 1994_8_19_18 199419 19 0.3926991 2.190388 49136 28.8 0.3176499 2.104727 700.1729 1994-08-19
与csv数据相同
我正在尝试使用聚合函数从 csv 文件中获取每日总和,但我遇到了以下错误:
Error in Summary.factor(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), na.rm = FALSE) : ‘sum’ not meaningful for factors
Calls: aggregate ... aggregate.data.frame -> lapply -> FUN -> lapply -> Summary.factor
Execution halted
这里是link到数据Data
这是我的代码:
dat<-read.csv("Laoag_tc_induced.csv",header=TRUE,sep=",")
dat[dat == -999] <- NA
dat[dat == -888] <- 0
dat$Date <- as.Date(strptime(dat$key, '%Y_%m_%d_%H'))
df <- data.frame(dat$Date,dat$RR,dat$dist)
df <- aggregate(RR ~ Date, dat,sum)
names(df)[1] <- "Date"
names(df)[2] <- "Rain"
write.table(df,file="test.csv",sep=",")
我尝试使用:
df <- aggregate(RR ~ Date, dat,sum,na.rm=TRUE)
和
df <- aggregate(RR ~ Date,dat,sum,na.rm=TRUE,na.action=na.pass)
错误依旧:
‘sum’ not meaningful for factors
'RR' 中有某些元素,即 " NA"
,将列的 class 更改为 factor
(也使用 stringsAsFactors = FALSE
)。该选项是指定 na.strings
中的 NA 字符串被读取为 NA
dat <- read.csv(file, header = TRUE, stringsAsFactors = FALSE,
na.strings = " NA", strip.white = TRUE)
完成 OP 后 transformation/replacement、
res <- aggregate(RR ~ Date, dat,sum)
head(res, 5)
# Date RR
#1 1994-08-09 0.0
#2 1994-08-10 0.0
#3 1994-08-11 0.0
#4 1994-08-12 0.3
#5 1994-08-13 0.0
由于 OP 表示日期正在更改,根据提供的数据它工作正常
dat[78:81,]
# X.1 key SN CY Lat.x Lon.x X RR Lat.y Lon.y dist Date
#78 78 1994_8_19_0 199419 19 0.3700098 2.230531 49133 28.8 0.3176499 2.104727 824.8680 1994-08-19
#79 79 1994_8_19_6 199419 19 0.3787364 2.214823 49134 28.8 0.3176499 2.104727 765.4631 1994-08-19
#80 80 1994_8_19_12 199419 19 0.3857178 2.200860 49135 28.8 0.3176499 2.104727 720.0335 1994-08-19
#81 81 1994_8_19_18 199419 19 0.3926991 2.190388 49136 28.8 0.3176499 2.104727 700.1729 1994-08-19
与csv数据相同