R 中 lm() 的日志转换不起作用
Log transformation for lm() in R not working
我正在尝试转换一些数据,以满足线性模型的假设(独立性、线性、方差同质性、正态性)。我想这样做是为了执行 ANOVA 或类似操作。在我的线性模型中对响应变量进行平方根变换已经奏效,但是当我尝试对数变换时出现错误。
我试过:
logCC_emergent_biomass.lm <- lm(log(Total_CC_noAcari_Biomass)~ Dungfauna*Water*Earthworms, data= biomass)
但是出现这个错误:
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : NA/NaN/Inf in 'y'
通常以这种方式进行日志转换对我有用,所以我不确定这里有什么问题。响应变量的数据都是十进制数据(例如0.001480370),可能是这个原因?如果是这种情况,任何人都可以指出我如何转换这些数据的方向。
这是数据未转换时的残差图:
您可能在要记录转换的变量中有零。 示例:
df1[1, 1] <- 0
lm(Y ~ log(X1) + X2 + X3, df1)
# Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
# NA/NaN/Inf in 'x'
# In addition: Warning message:
# In log(X1) : NaNs produced
你可以考虑 log1p
计算 log(x + 1)
.
lm(Y ~ log1p(X1) + X2 + X3, df1)
# Call:
# lm(formula = Y ~ log1p(X1) + X2 + X3, data = df1)
#
# Coefficients:
# (Intercept) log1p(X1) X2 X3
# 0.9963 -0.8648 0.5293 1.0904
然而,这改变了解释,见 related post on Cross Validated。无论如何,您应该决定如何处理零值。
另见 post:How should I transform non-negative data including zeros?
数据:
df1 <- structure(list(X1 = c(0, -0.564698171396089, 0.363128411337339,
0.63286260496104, 0.404268323140999, -0.106124516091484, 1.51152199743894,
-0.0946590384130976, 2.01842371387704, -0.062714099052421), X2 = c(1.30486965422349,
2.28664539270111, -1.38886070111234, -0.278788766817371, -0.133321336393658,
0.635950398070074, -0.284252921416072, -2.65645542090478, -2.44046692857552,
1.32011334573019), X3 = c(-0.306638594078475, -1.78130843398,
-0.171917355759621, 1.2146746991726, 1.89519346126497, -0.4304691316062,
-0.25726938276893, -1.76316308519478, 0.460097354831271, -0.639994875960119
), Y = c(2.00627879909717, 1.08150911284604, 1.41465103918476,
1.37787039819613, 3.04863502238068, -0.828228728348569, 0.198328716326719,
-2.34295203837687, -1.61863179473641, 1.03962922460575)), row.names = c(NA,
-10L), class = "data.frame")
我正在尝试转换一些数据,以满足线性模型的假设(独立性、线性、方差同质性、正态性)。我想这样做是为了执行 ANOVA 或类似操作。在我的线性模型中对响应变量进行平方根变换已经奏效,但是当我尝试对数变换时出现错误。
我试过:
logCC_emergent_biomass.lm <- lm(log(Total_CC_noAcari_Biomass)~ Dungfauna*Water*Earthworms, data= biomass)
但是出现这个错误:
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : NA/NaN/Inf in 'y'
通常以这种方式进行日志转换对我有用,所以我不确定这里有什么问题。响应变量的数据都是十进制数据(例如0.001480370),可能是这个原因?如果是这种情况,任何人都可以指出我如何转换这些数据的方向。
这是数据未转换时的残差图:
您可能在要记录转换的变量中有零。 示例:
df1[1, 1] <- 0
lm(Y ~ log(X1) + X2 + X3, df1)
# Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
# NA/NaN/Inf in 'x'
# In addition: Warning message:
# In log(X1) : NaNs produced
你可以考虑 log1p
计算 log(x + 1)
.
lm(Y ~ log1p(X1) + X2 + X3, df1)
# Call:
# lm(formula = Y ~ log1p(X1) + X2 + X3, data = df1)
#
# Coefficients:
# (Intercept) log1p(X1) X2 X3
# 0.9963 -0.8648 0.5293 1.0904
然而,这改变了解释,见 related post on Cross Validated。无论如何,您应该决定如何处理零值。
另见 post:How should I transform non-negative data including zeros?
数据:
df1 <- structure(list(X1 = c(0, -0.564698171396089, 0.363128411337339,
0.63286260496104, 0.404268323140999, -0.106124516091484, 1.51152199743894,
-0.0946590384130976, 2.01842371387704, -0.062714099052421), X2 = c(1.30486965422349,
2.28664539270111, -1.38886070111234, -0.278788766817371, -0.133321336393658,
0.635950398070074, -0.284252921416072, -2.65645542090478, -2.44046692857552,
1.32011334573019), X3 = c(-0.306638594078475, -1.78130843398,
-0.171917355759621, 1.2146746991726, 1.89519346126497, -0.4304691316062,
-0.25726938276893, -1.76316308519478, 0.460097354831271, -0.639994875960119
), Y = c(2.00627879909717, 1.08150911284604, 1.41465103918476,
1.37787039819613, 3.04863502238068, -0.828228728348569, 0.198328716326719,
-2.34295203837687, -1.61863179473641, 1.03962922460575)), row.names = c(NA,
-10L), class = "data.frame")