r - 添加到数据框的级别，为什么？

Question

这个 post 是为了更好地理解 "levels" 在 R 中是如何工作的。事实上，其他答案并没有完全解释（例如参见 [=15=]）。

考虑以下简短脚本，我在其中计算随机数据帧的每一列的 RMSE df 并将该值存储为新数据帧的一行 bestcombo

df = as.data.frame(matrix(rbinom(10*1000, 1, .5), nrow = 10, ncol=5))

#generate empty dataframe and assign col names
bestcombo = data.frame(matrix(ncol = 2, nrow = 0))
colnames(bestcombo) = c("RMSE", "Row Number")

#for each col of df calculate RMSE and store together with col name
for (i in 1:5){
  RMSE = sqrt(mean(df[,i] ^ 2))
  row_num = i

  row = as.data.frame(cbind( RMSE, toString(row_num) ))
  colnames(row) = c("RMSE", "Row Number")
  bestcombo = rbind(bestcombo, row)
}

问题是生成了"Levels"。为什么？

bestcombo$RMSE
             RMSE              RMSE              RMSE              RMSE              RMSE 
0.547722557505166 0.774596669241483 0.707106781186548 0.836660026534076 0.707106781186548 
Levels: 0.547722557505166 0.774596669241483 0.707106781186548 0.836660026534076

bestcombo$RMSE[1]
             RMSE 
0.547722557505166 
Levels: 0.547722557505166 0.774596669241483 0.707106781186548 0.836660026534076

为什么会发生这种情况，如何避免？这是因为错误使用了 rbind() 吗？

这也会产生其他问题。例如，订单功能不起作用。

bestcombo[order(bestcombo$RMSE),]

               RMSE Random Vector
1 0.547722557505166             1
2 0.774596669241483             2
3 0.707106781186548             3
5 0.707106781186548             5
4 0.836660026534076             4

Answer 1

你想要更像这样的东西：

#for each col of df calculate RMSE and store together with col name
for (i in 1:5){
    RMSE = sqrt(mean(df[,i] ^ 2))
    row_num = i

    row = data.frame(RMSE = RMSE, `Row Number` = as.character(row_num) )
    #colnames(row) = c("RMSE", "Row Number")
    bestcombo = rbind(bestcombo, row)
}

或者，如果您真的想在第二行添加列名，您可以这样做：

for (i in 1:5){
    RMSE = sqrt(mean(df[,i] ^ 2))
    row_num = i

    row = data.frame(RMSE,as.character(row_num) )
    colnames(row) = c("RMSE", "Row Number")
    bestcombo = rbind(bestcombo, row)
}

为了完整起见，我要补充一点，虽然这不是您问题的重点，但像这样一次 rbindind 行增加数据帧将开始招致一旦数据帧变得相当大，显着速度损失。

r - 添加到数据框的级别，为什么？

r - levels added to dataframe, why?

r

levels

rbind