计算 R 中具有 NA 值的数据集的每一行的线系数

Calculate line coefficients for each row of a dataset with NA values in R

我有一个大约有 75 行和 25 列的数据集,每行显示一个学生,列显示 1 到 5 之间的分数。

            S1   S2      ..... S24
x1           0   2       ..... 2
x2           1   3       ..... Na
x3           NA  4       ..... 4
x4           4   NA      ..... 2
x5           4   3       ..... 2

我想在不考虑每行的 NA 值的情况下获得每条线的截距和斜率,并将它们添加到原始数据集中。我正在使用下面的代码,但它仍然包含 NA 值。我正在使用 R.

df = read.csv('exc.csv')

Slope = function(x) {
  TempDF = data.frame(x, survey=1:ncol(df))
  lm(x ~ survey, data=TempDF,na.rm=TRUE)$coefficients[2]
}

Intercept = function(x) {
  TempDF = data.frame(x, survey=1:ncol(df))
  lm(x ~ survey, data=TempDF,na.rm=TRUE)$coefficients[1]
}

TData = as.data.frame(t(df))

dataset$Intercept = sapply(TData, Intercept)
dataset$slope = sapply(TData, Slope)

回归本身仅适用于成对的非 NA 值。因此,任何具有 NA 值的东西都不会影响您的情况的斜率或截距:

set.seed(100)
y = rnorm(100)
x = rnorm(100)
y[1:10] = NA
x[91:100] = NA
df = data.frame(x,y)
lm(y ~x,data=df)

Call:
lm(formula = y ~ x, data = df)

Coefficients:
(Intercept)            x
    0.02871     -0.15929

我们只使用 x 和 y 中没有 NA 的对:

df = df[!is.na(df$x) & !is.na(df$y),]
lm(y ~x,data=df)

Call:
lm(formula = y ~ x, data = df)

Coefficients:
(Intercept)            x
    0.02871     -0.15929

如果您还需要它来做其他事情,请按以下步骤操作:

#simulate your data
df = data.frame(matrix(sample(1:5,25*5,replace=TRUE),ncol=25))
colnames(df) = paste("S",1:25,sep="")
#make some NAs
df[cbind(c(1,3,5),c(2,3,4))] <- NA

# fit once, take both coefficient and intercept
Coef = function(x) {
  TempDF = data.frame(x, survey=1:ncol(df))
  TempDF = TempDF[!is.na(x),]
  c(lm(x ~ survey, data=TempDF)$coefficients,n=nrow(TempDF))
}

TData = as.data.frame(t(df))

dataset = data.frame(t(sapply(TData, Coef)))