计算 R 中具有 NA 值的数据集的每一行的线系数
Calculate line coefficients for each row of a dataset with NA values in R
我有一个大约有 75 行和 25 列的数据集,每行显示一个学生,列显示 1 到 5 之间的分数。
S1 S2 ..... S24
x1 0 2 ..... 2
x2 1 3 ..... Na
x3 NA 4 ..... 4
x4 4 NA ..... 2
x5 4 3 ..... 2
我想在不考虑每行的 NA 值的情况下获得每条线的截距和斜率,并将它们添加到原始数据集中。我正在使用下面的代码,但它仍然包含 NA 值。我正在使用 R.
df = read.csv('exc.csv')
Slope = function(x) {
TempDF = data.frame(x, survey=1:ncol(df))
lm(x ~ survey, data=TempDF,na.rm=TRUE)$coefficients[2]
}
Intercept = function(x) {
TempDF = data.frame(x, survey=1:ncol(df))
lm(x ~ survey, data=TempDF,na.rm=TRUE)$coefficients[1]
}
TData = as.data.frame(t(df))
dataset$Intercept = sapply(TData, Intercept)
dataset$slope = sapply(TData, Slope)
回归本身仅适用于成对的非 NA 值。因此,任何具有 NA 值的东西都不会影响您的情况的斜率或截距:
set.seed(100)
y = rnorm(100)
x = rnorm(100)
y[1:10] = NA
x[91:100] = NA
df = data.frame(x,y)
lm(y ~x,data=df)
Call:
lm(formula = y ~ x, data = df)
Coefficients:
(Intercept) x
0.02871 -0.15929
我们只使用 x 和 y 中没有 NA 的对:
df = df[!is.na(df$x) & !is.na(df$y),]
lm(y ~x,data=df)
Call:
lm(formula = y ~ x, data = df)
Coefficients:
(Intercept) x
0.02871 -0.15929
如果您还需要它来做其他事情,请按以下步骤操作:
#simulate your data
df = data.frame(matrix(sample(1:5,25*5,replace=TRUE),ncol=25))
colnames(df) = paste("S",1:25,sep="")
#make some NAs
df[cbind(c(1,3,5),c(2,3,4))] <- NA
# fit once, take both coefficient and intercept
Coef = function(x) {
TempDF = data.frame(x, survey=1:ncol(df))
TempDF = TempDF[!is.na(x),]
c(lm(x ~ survey, data=TempDF)$coefficients,n=nrow(TempDF))
}
TData = as.data.frame(t(df))
dataset = data.frame(t(sapply(TData, Coef)))
我有一个大约有 75 行和 25 列的数据集,每行显示一个学生,列显示 1 到 5 之间的分数。
S1 S2 ..... S24
x1 0 2 ..... 2
x2 1 3 ..... Na
x3 NA 4 ..... 4
x4 4 NA ..... 2
x5 4 3 ..... 2
我想在不考虑每行的 NA 值的情况下获得每条线的截距和斜率,并将它们添加到原始数据集中。我正在使用下面的代码,但它仍然包含 NA 值。我正在使用 R.
df = read.csv('exc.csv')
Slope = function(x) {
TempDF = data.frame(x, survey=1:ncol(df))
lm(x ~ survey, data=TempDF,na.rm=TRUE)$coefficients[2]
}
Intercept = function(x) {
TempDF = data.frame(x, survey=1:ncol(df))
lm(x ~ survey, data=TempDF,na.rm=TRUE)$coefficients[1]
}
TData = as.data.frame(t(df))
dataset$Intercept = sapply(TData, Intercept)
dataset$slope = sapply(TData, Slope)
回归本身仅适用于成对的非 NA 值。因此,任何具有 NA 值的东西都不会影响您的情况的斜率或截距:
set.seed(100)
y = rnorm(100)
x = rnorm(100)
y[1:10] = NA
x[91:100] = NA
df = data.frame(x,y)
lm(y ~x,data=df)
Call:
lm(formula = y ~ x, data = df)
Coefficients:
(Intercept) x
0.02871 -0.15929
我们只使用 x 和 y 中没有 NA 的对:
df = df[!is.na(df$x) & !is.na(df$y),]
lm(y ~x,data=df)
Call:
lm(formula = y ~ x, data = df)
Coefficients:
(Intercept) x
0.02871 -0.15929
如果您还需要它来做其他事情,请按以下步骤操作:
#simulate your data
df = data.frame(matrix(sample(1:5,25*5,replace=TRUE),ncol=25))
colnames(df) = paste("S",1:25,sep="")
#make some NAs
df[cbind(c(1,3,5),c(2,3,4))] <- NA
# fit once, take both coefficient and intercept
Coef = function(x) {
TempDF = data.frame(x, survey=1:ncol(df))
TempDF = TempDF[!is.na(x),]
c(lm(x ~ survey, data=TempDF)$coefficients,n=nrow(TempDF))
}
TData = as.data.frame(t(df))
dataset = data.frame(t(sapply(TData, Coef)))