GLM 对一个数值变量和一个分类变量的乘积进行回归的问题

Problem with GLM to regress on the product of one numeric variable and one categorical variable

我想对以下模型执行逻辑回归:

regression <- Y ~ 
netSales + size + CashAssetRatio + FRNG + 
  I(insolvency * countryCode)

使用以下代码:

tbmodel <- glm(regression, data=trainSplit, 
               weights=NULL, binomial(link = "logit"), 
           na.action=na.omit) 
###### REPRENDRE ICI APRES PAUSE

但是,当我计算回归时出现以下错误:

Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) : les contrastes ne peuvent être appliqués qu'aux facteurs ayant au moins deux niveaux In addition: Warning message: In Ops.factor(insolvency, countryIsoCode) : ‘*’ not meaningful for factors

事实是我不知道它可能来自哪里,因为我的变量 countryCode 是一个超过 2 个级别的因素,而且我没有 NA。以下是一些数据:

             countryCode insolvency  netSales Y size CashAssetRatio         FRNG
47091             FR       0.0491 -0.04042249 0  2       1.123095       -0.001679786
24460             IT       0.0115 -0.04343820 0  1       1.078720       -0.001130815
11921             FR       0.0029 -0.04227984 0  2       1.076595       -0.001097954
1657              FR       0.0016 -0.04242885 0  2       1.075237       -0.001075071
37572             IT       0.0006 -0.04355702 0  1       1.077884       -0.001122143
8155              FR       0.0270 -0.04058710 0  2       1.076638       -0.001067854

你有什么想法吗?谢谢

根据?公式

While formulae usually involve just variable and factor names, they can also involve arithmetic expressions. The formula log(y) ~ a + log(x) is quite legal. When such arithmetic expressions involve operators which are also used symbolically in model formulae, there can be confusion between arithmetic and symbolic operator use.

To avoid this confusion, the function I() can be used to bracket those portions of a model formula where the operators are used in their arithmetic sense. For example, in the formula y ~ a + I(b+c), the term b+c is to be interpreted as the sum of b and c.

所以你写的公式实际上是要求乘法。由于您想要的是迭代,因此删除 I().