如何导出分组数据的回归方程?
How to export regression equations for grouped data?
我有一个包含 3 列的数据框 PlotData_df
:Velocity
(数字)、Height
(数字)、Gender
(分类)。
Velocity Height Gender
1 4.1 3.0 Male
2 3.1 4.0 Female
3 3.9 2.4 Female
4 4.6 2.8 Male
5 4.1 3.3 Female
6 3.1 3.2 Female
7 3.7 3.0 Male
8 3.6 2.4 Male
9 3.2 2.7 Female
10 4.2 2.5 Male
我使用以下公式给出了完整数据的回归方程:
c <- lm(Height ~ Velocity, data = PlotData_df)
summary(c)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 4.1283 1.0822 3.815 0.00513 **
# Velocity -0.3240 0.2854 -1.135 0.28915
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Residual standard error: 0.4389 on 8 degrees of freedom
# Multiple R-squared: 0.1387, Adjusted R-squared: 0.03108
# F-statistic: 1.289 on 1 and 8 DF, p-value: 0.2892
a <- signif(coef(c)[1], digits = 2)
b <- signif(coef(c)[2], digits = 2)
Regression <- paste0("Velocity = ",b," * Height + ",a)
print(Regression)
# [1] "Velocity = -0.32 * Height + 4.13"
如何扩展它以显示两个回归方程(取决于性别是男性还是女性)?
How can I extend this to display two regression equations (depending on whether Gender is Male or Female)?
您首先需要一个在 Height
和 Gender
之间相互作用的线性模型。尝试:
fit <- lm(formula = Velocity ~ Height * Gender, data = PlotData_df)
然后如果你想显示拟合回归函数/方程。您应该使用两个等式,一个用于 Male
,一个用于 Female
。真的没有别的办法,因为我们决定插入系数/数字。下面就为大家介绍一下获取方法。
## formatted coefficients
beta <- signif(fit$coef, digits = 2)
# (Intercept) Height GenderMale Height:GenderMale
# 4.42 -0.30 -1.01 0.54
## equation for Female:
eqn.female <- paste0("Velocity = ", beta[2], " * Height + ", beta[1])
# [1] "Velocity = -0.30 * Height + 4.42"
## equation for Male:
eqn.male <- paste0("Velocity = ", beta[2] + beta[4], " * Height + ", beta[1] + beta[3])
# [1] "Velocity = 0.24 * Height + 3.41"
如果你不清楚为什么
- 组
Male
的截距是 beta[1] + beta[3]
;
Male
的斜率是 beta[2] + beta[4]
,
您需要阅读有关方差分析和 对比处理 的因子变量。 This question on Cross Validated: How to interpret dummy and ratio variable interactions in R 与您的设置非常相似。关于系数的解释,我在那里做了一个非常简短的回答,所以也许你可以看看。
我有一个包含 3 列的数据框 PlotData_df
:Velocity
(数字)、Height
(数字)、Gender
(分类)。
Velocity Height Gender
1 4.1 3.0 Male
2 3.1 4.0 Female
3 3.9 2.4 Female
4 4.6 2.8 Male
5 4.1 3.3 Female
6 3.1 3.2 Female
7 3.7 3.0 Male
8 3.6 2.4 Male
9 3.2 2.7 Female
10 4.2 2.5 Male
我使用以下公式给出了完整数据的回归方程:
c <- lm(Height ~ Velocity, data = PlotData_df)
summary(c)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 4.1283 1.0822 3.815 0.00513 **
# Velocity -0.3240 0.2854 -1.135 0.28915
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Residual standard error: 0.4389 on 8 degrees of freedom
# Multiple R-squared: 0.1387, Adjusted R-squared: 0.03108
# F-statistic: 1.289 on 1 and 8 DF, p-value: 0.2892
a <- signif(coef(c)[1], digits = 2)
b <- signif(coef(c)[2], digits = 2)
Regression <- paste0("Velocity = ",b," * Height + ",a)
print(Regression)
# [1] "Velocity = -0.32 * Height + 4.13"
如何扩展它以显示两个回归方程(取决于性别是男性还是女性)?
How can I extend this to display two regression equations (depending on whether Gender is Male or Female)?
您首先需要一个在 Height
和 Gender
之间相互作用的线性模型。尝试:
fit <- lm(formula = Velocity ~ Height * Gender, data = PlotData_df)
然后如果你想显示拟合回归函数/方程。您应该使用两个等式,一个用于 Male
,一个用于 Female
。真的没有别的办法,因为我们决定插入系数/数字。下面就为大家介绍一下获取方法。
## formatted coefficients
beta <- signif(fit$coef, digits = 2)
# (Intercept) Height GenderMale Height:GenderMale
# 4.42 -0.30 -1.01 0.54
## equation for Female:
eqn.female <- paste0("Velocity = ", beta[2], " * Height + ", beta[1])
# [1] "Velocity = -0.30 * Height + 4.42"
## equation for Male:
eqn.male <- paste0("Velocity = ", beta[2] + beta[4], " * Height + ", beta[1] + beta[3])
# [1] "Velocity = 0.24 * Height + 3.41"
如果你不清楚为什么
- 组
Male
的截距是beta[1] + beta[3]
; Male
的斜率是beta[2] + beta[4]
,
您需要阅读有关方差分析和 对比处理 的因子变量。 This question on Cross Validated: How to interpret dummy and ratio variable interactions in R 与您的设置非常相似。关于系数的解释,我在那里做了一个非常简短的回答,所以也许你可以看看。