为什么我得到的系数比我在 R 中使用 multinom() 的特征多?
Why I get more coefficients than I had features using multinom() in R?
我有一个包含大约 20 个样本和 4 个特征的数据集。enter image description here
我想使用 multinom() 创建一个模型。但是这个函数 returns 大约有 50 个名字奇怪的系数。
>model <- multinom(types ~ LD1+LD2+LD3+LD4, t)
> colnames(coef(model))
[1] "(Intercept)" "LD1-0.924675250911259" "LD1-0.996017404791012" "LD1-11.0091236817909" "LD1-11.0470069995094" "LD1-11.1382649674021" "LD1-11.1449776356607"
[8] "LD1-1.11507632119743" "LD1-11.4100167287132" "LD1-1.15405541868851" "LD1-1.42692764536373" "LD11.45075731787807" "LD1-1.562329638922" "LD1-2.03752025992806"
[15] "LD132.7387270807495" "LD133.0932516010117" "LD135.0760659080006" "LD1-3.57028123573125" "LD1-5.22424301205266" "LD1-5.95754635904308" "LD1-6.39430959506567"
[22] "LD1-6.8622462443044" "LD1-7.03073614006179" "LD1-8.00430359650879" "LD1-8.17057054273565" "LD1-9.02013723266161" "LD20.0761110897194115" "LD20.83307548406597"
[29] "LD210.9301821277818" "LD21.2118957034112" "LD2-1.7139684831726" "LD2-1.85478166588227" "LD2-2.11785431701449" "LD2-2.19678883756181" "LD2-2.43688626054258"
[36] "LD22.71656669882489" "LD23.17377132687911" "LD23.25781591451936" "LD2-3.4433493942635" "LD2-3.5203090034966" "LD2-3.71418994994738" "LD2-3.8380001046407"
[43] "LD2-3.87686665511689" "LD2-3.9100454768453" "LD2-3.95942532853135" "LD2-4.04744180009915" "LD2-4.12030177266551" "LD24.17412372599923" "LD24.75169238888003"
[50] "LD2-4.91414969791761" "LD29.19759557325694"
为什么会这样,这意味着什么?
多项式模型是逻辑回归的扩展,可预测每个响应级别的概率。因此,如果您有 11 个级别,您将获得 10 个预测方程,每个方程的每个预测变量都有 1 个系数。 (一个响应级别是基线。)
不过,在这种情况下,您可能遇到了另一个问题。 R 将您的 LD1 和 LD2 预测变量视为因素,即使它们看起来是数字。所以您应该检查您是否正确导入了数据。
我有一个包含大约 20 个样本和 4 个特征的数据集。enter image description here 我想使用 multinom() 创建一个模型。但是这个函数 returns 大约有 50 个名字奇怪的系数。
>model <- multinom(types ~ LD1+LD2+LD3+LD4, t)
> colnames(coef(model))
[1] "(Intercept)" "LD1-0.924675250911259" "LD1-0.996017404791012" "LD1-11.0091236817909" "LD1-11.0470069995094" "LD1-11.1382649674021" "LD1-11.1449776356607"
[8] "LD1-1.11507632119743" "LD1-11.4100167287132" "LD1-1.15405541868851" "LD1-1.42692764536373" "LD11.45075731787807" "LD1-1.562329638922" "LD1-2.03752025992806"
[15] "LD132.7387270807495" "LD133.0932516010117" "LD135.0760659080006" "LD1-3.57028123573125" "LD1-5.22424301205266" "LD1-5.95754635904308" "LD1-6.39430959506567"
[22] "LD1-6.8622462443044" "LD1-7.03073614006179" "LD1-8.00430359650879" "LD1-8.17057054273565" "LD1-9.02013723266161" "LD20.0761110897194115" "LD20.83307548406597"
[29] "LD210.9301821277818" "LD21.2118957034112" "LD2-1.7139684831726" "LD2-1.85478166588227" "LD2-2.11785431701449" "LD2-2.19678883756181" "LD2-2.43688626054258"
[36] "LD22.71656669882489" "LD23.17377132687911" "LD23.25781591451936" "LD2-3.4433493942635" "LD2-3.5203090034966" "LD2-3.71418994994738" "LD2-3.8380001046407"
[43] "LD2-3.87686665511689" "LD2-3.9100454768453" "LD2-3.95942532853135" "LD2-4.04744180009915" "LD2-4.12030177266551" "LD24.17412372599923" "LD24.75169238888003"
[50] "LD2-4.91414969791761" "LD29.19759557325694"
为什么会这样,这意味着什么?
多项式模型是逻辑回归的扩展,可预测每个响应级别的概率。因此,如果您有 11 个级别,您将获得 10 个预测方程,每个方程的每个预测变量都有 1 个系数。 (一个响应级别是基线。)
不过,在这种情况下,您可能遇到了另一个问题。 R 将您的 LD1 和 LD2 预测变量视为因素,即使它们看起来是数字。所以您应该检查您是否正确导入了数据。