为什么我在 R 中的摘要只包括我的一些变量？

Question

我想看看蝙蝠叫声的次数和幼崽饲养季节的时间是否有关系。 pup变量有“Pre”、“Middle”、“Post”三类。当我要求提供摘要时，它只包括 Pre 和 Post pup 生产的 p 值。我在下面创建了一个示例数据集。对于样本数据集，我只是得到一个错误....对于我的实际数据集，我得到了上面描述的输出。

样本数据集：

 Calls<- c("55","60","180","160","110","50") 
 Pup<-c("Pre","Middle","Post","Post","Middle","Pre")
 q<-data.frame(Calls, Pup)
 q
 q1<-lm(Calls~Pup, data=q)
 summary(q1)

示例的输出和错误消息：

> Calls    Pup
1    55    Pre
2    60 Middle
3   180   Post
4   160   Post
5   110 Middle
6    50    Pre

Error in as.character.factor(x) : malformed factor
In addition: Warning message:
In Ops.factor(r, 2) : ‘^’ not meaningful for factors

我分析的实际输入：

> pupint <- lm(Calls ~ Pup, data = park2)
summary(pupint)

这是我从实际数据集中获得的输出：

Residuals:
Min     1Q Median     3Q    Max 
-66.40 -37.63 -26.02  -5.39 299.93 

Coefficients:
        Estimate Std. Error t value Pr(>|t|)  
 (Intercept)    66.54      35.82   1.858   0.0734 .
PupPost       -51.98      48.50  -1.072   0.2927  
PupPre        -26.47      39.86  -0.664   0.5118  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 80.1 on 29 degrees of freedom
Multiple R-squared:  0.03822,   Adjusted R-squared:  -0.02811 
F-statistic: 0.5762 on 2 and 29 DF,  p-value: 0.5683

总的来说，只是想知道为什么上面的输出没有显示“中间”。抱歉，我的示例数据集的结果不一样，但也许该错误消息有助于更好地理解问题。

Answer 1

为了让 R 正确理解虚拟变量，您必须使用 factor

指示 Pup 是一个虚拟（虚拟）变量

> Pup <- factor(Pup)
> q<-data.frame(Calls, Pup)
> q1<-lm(Calls~Pup, data=q)
> summary(q1)

Call:
lm(formula = Calls ~ Pup, data = q)

Residuals:
    1     2     3     4     5     6 
  2.5 -25.0  10.0 -10.0  25.0  -2.5 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)    85.00      15.61   5.444   0.0122 *
PupPost        85.00      22.08   3.850   0.0309 *
PupPre        -32.50      22.08  -1.472   0.2374  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 22.08 on 3 degrees of freedom
Multiple R-squared:  0.9097,    Adjusted R-squared:  0.8494 
F-statistic:  15.1 on 2 and 3 DF,  p-value: 0.02716

如果你想让R显示虚拟变量内的所有类别，那么你必须从回归中移除截距，否则，你将在variable dummy trap.

summary(lm(Calls~Pup-1, data=q))

Call:
lm(formula = Calls ~ Pup - 1, data = q)

Residuals:
    1     2     3     4     5     6 
  2.5 -25.0  10.0 -10.0  25.0  -2.5 

Coefficients:
          Estimate Std. Error t value Pr(>|t|)   
PupMiddle    85.00      15.61   5.444  0.01217 * 
PupPost     170.00      15.61  10.889  0.00166 **
PupPre       52.50      15.61   3.363  0.04365 * 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 22.08 on 3 degrees of freedom
Multiple R-squared:  0.9815,    Adjusted R-squared:  0.9631 
F-statistic: 53.17 on 3 and 3 DF,  p-value: 0.004234

Answer 2

如果您在回归中包含像 pup 这样的分类变量，那么它会为该变量中的每个值包含一个虚拟变量，默认情况下除外。如果您像这样省略截距系数，则可以显示 pupmiddle 的系数：

q1<-lm(Calls~Pup - 1, data=q)

为什么我在 R 中的摘要只包括我的一些变量？

Why is my summary in R only including some of my variables?

r

missing-data

p-value