如何在 R 的数据框中创建具有三个级别的因子?

How do I create a factor with three levels in a dataframe in R?

我为 19 个不同的距离创建了一个系数,我需要确定三个级别,一个用于直接影响 (DirImp),另一个用于我各自的间接影响距离,(Dist="1km_","2km_","3km_ ","4km_","5km_","6km_","7km_","8km_","9km_","10km_","10km","20km","30km","40km","50km", “60km”,“70km”),以及其他到我的控制区域(Contrl),它从距离 0(DirImp)开始,每公里增加一公里,直到达到 10 公里,从这一点开始,它每十公里增加一次,直到达到70公里,最后一段距离控制。

所以,为了澄清,在我的 DataFrame 中,我有一个包含这些距离的列 (Dist) 和包含其他信息的其他列,我使用此代码将其转换为一个因子:

column Dist estructure:


levels(MY.DTAFRAME$Dist)
[1] "DirImp"   "10km"  "10km_" "1km_"  "20km"  "2km_"  "30km" 
[8] "3km_"  "40km"  "4km_"  "50km"  "5km_"  "60km"  "6km_" 
[15] "70km"  "7km_"  "8km_"  "9km_", "control" 

How I would like it to be:
level 1 = Direct impact ("DirImp")
level 2 = Distances ("1km_","2km_","3km_","4km_","5km_","6km_","7km_","8km_","9km_","10km_","10km","20km","30km","40km","50km","60km","70km")
level 3 = Contrl Area  ("Contrl")

Column Dist = ("DirImp", "1km_","2km_","3km_","4km_","5km_","6km_","7km_","8km_","9km_","10km_","10km","20km","30km","40km","50km","60km","70km", "control")

  MY.DATAFRAME$DistFact <- factor(MY.DATAFRAME$Dist, level ordered = TRUE)


  levels(MY.DTAFRAME$DistFact)
  [1] "DirImp"   "10km"  "10km_" "1km_"  "20km"  "2km_"  "30km" 
  [8] "3km_"  "40km"  "4km_"  "50km"  "5km_"  "60km"  "6km_" 
  [15] "70km"  "7km_"  "8km_"  "9km_", "control" 

问的是不是下面这样的问题?

forcats::fct_collapse(y, 
                      DirImp = grep("DirImp", y, ignore.case = TRUE, value = TRUE), 
                      Distances = grep("km", y, ignore.case = TRUE, value = TRUE),
                      Control = grep("control", y, ignore.case = TRUE, value = TRUE)
                      )
# [1] Distances Distances Distances Distances Distances Distances
# [7] Distances Distances Distances Distances Distances Distances
#[13] Distances Distances Distances Distances Distances Distances
#[19] Distances Distances Distances Distances Distances Distances
#[25] Distances Distances Distances Distances Control   Distances
#Levels: DirImp Distances Control

或者,也许更具可读性,

grep_tmp <- function(pattern, x){
  grep(pattern, x, ignore.case = TRUE, value = TRUE)
}

forcats::fct_collapse(y,
                      DirImp = grep_tmp("DirImp", y), 
                      Distances = grep_tmp("^\d+km", y),
                      Control = grep_tmp("control", y)
                      )

数据

在问题中发布了levels,这里是示例数据。

set.seed(1234)
x <- scan(text = '"DirImp"   "10km"  "10km_" "1km_"  "20km"  "2km_"  "30km" 
"3km_"  "40km"  "4km_"  "50km"  "5km_"  "60km"  "6km_" 
"70km"  "7km_"  "8km_"  "9km_" "control"', what = character())

y <- factor(sample(x, 30, TRUE), levels = x)