如何在 R 的数据框中创建具有三个级别的因子?
How do I create a factor with three levels in a dataframe in R?
我为 19 个不同的距离创建了一个系数,我需要确定三个级别,一个用于直接影响 (DirImp),另一个用于我各自的间接影响距离,(Dist="1km_","2km_","3km_ ","4km_","5km_","6km_","7km_","8km_","9km_","10km_","10km","20km","30km","40km","50km", “60km”,“70km”),以及其他到我的控制区域(Contrl),它从距离 0(DirImp)开始,每公里增加一公里,直到达到 10 公里,从这一点开始,它每十公里增加一次,直到达到70公里,最后一段距离控制。
所以,为了澄清,在我的 DataFrame
中,我有一个包含这些距离的列 (Dist) 和包含其他信息的其他列,我使用此代码将其转换为一个因子:
column Dist estructure:
levels(MY.DTAFRAME$Dist)
[1] "DirImp" "10km" "10km_" "1km_" "20km" "2km_" "30km"
[8] "3km_" "40km" "4km_" "50km" "5km_" "60km" "6km_"
[15] "70km" "7km_" "8km_" "9km_", "control"
How I would like it to be:
level 1 = Direct impact ("DirImp")
level 2 = Distances ("1km_","2km_","3km_","4km_","5km_","6km_","7km_","8km_","9km_","10km_","10km","20km","30km","40km","50km","60km","70km")
level 3 = Contrl Area ("Contrl")
Column Dist = ("DirImp", "1km_","2km_","3km_","4km_","5km_","6km_","7km_","8km_","9km_","10km_","10km","20km","30km","40km","50km","60km","70km", "control")
MY.DATAFRAME$DistFact <- factor(MY.DATAFRAME$Dist, level ordered = TRUE)
levels(MY.DTAFRAME$DistFact)
[1] "DirImp" "10km" "10km_" "1km_" "20km" "2km_" "30km"
[8] "3km_" "40km" "4km_" "50km" "5km_" "60km" "6km_"
[15] "70km" "7km_" "8km_" "9km_", "control"
问的是不是下面这样的问题?
forcats::fct_collapse(y,
DirImp = grep("DirImp", y, ignore.case = TRUE, value = TRUE),
Distances = grep("km", y, ignore.case = TRUE, value = TRUE),
Control = grep("control", y, ignore.case = TRUE, value = TRUE)
)
# [1] Distances Distances Distances Distances Distances Distances
# [7] Distances Distances Distances Distances Distances Distances
#[13] Distances Distances Distances Distances Distances Distances
#[19] Distances Distances Distances Distances Distances Distances
#[25] Distances Distances Distances Distances Control Distances
#Levels: DirImp Distances Control
或者,也许更具可读性,
grep_tmp <- function(pattern, x){
grep(pattern, x, ignore.case = TRUE, value = TRUE)
}
forcats::fct_collapse(y,
DirImp = grep_tmp("DirImp", y),
Distances = grep_tmp("^\d+km", y),
Control = grep_tmp("control", y)
)
数据
在问题中发布了levels
,这里是示例数据。
set.seed(1234)
x <- scan(text = '"DirImp" "10km" "10km_" "1km_" "20km" "2km_" "30km"
"3km_" "40km" "4km_" "50km" "5km_" "60km" "6km_"
"70km" "7km_" "8km_" "9km_" "control"', what = character())
y <- factor(sample(x, 30, TRUE), levels = x)
我为 19 个不同的距离创建了一个系数,我需要确定三个级别,一个用于直接影响 (DirImp),另一个用于我各自的间接影响距离,(Dist="1km_","2km_","3km_ ","4km_","5km_","6km_","7km_","8km_","9km_","10km_","10km","20km","30km","40km","50km", “60km”,“70km”),以及其他到我的控制区域(Contrl),它从距离 0(DirImp)开始,每公里增加一公里,直到达到 10 公里,从这一点开始,它每十公里增加一次,直到达到70公里,最后一段距离控制。
所以,为了澄清,在我的 DataFrame
中,我有一个包含这些距离的列 (Dist) 和包含其他信息的其他列,我使用此代码将其转换为一个因子:
column Dist estructure:
levels(MY.DTAFRAME$Dist)
[1] "DirImp" "10km" "10km_" "1km_" "20km" "2km_" "30km"
[8] "3km_" "40km" "4km_" "50km" "5km_" "60km" "6km_"
[15] "70km" "7km_" "8km_" "9km_", "control"
How I would like it to be:
level 1 = Direct impact ("DirImp")
level 2 = Distances ("1km_","2km_","3km_","4km_","5km_","6km_","7km_","8km_","9km_","10km_","10km","20km","30km","40km","50km","60km","70km")
level 3 = Contrl Area ("Contrl")
Column Dist = ("DirImp", "1km_","2km_","3km_","4km_","5km_","6km_","7km_","8km_","9km_","10km_","10km","20km","30km","40km","50km","60km","70km", "control")
MY.DATAFRAME$DistFact <- factor(MY.DATAFRAME$Dist, level ordered = TRUE)
levels(MY.DTAFRAME$DistFact)
[1] "DirImp" "10km" "10km_" "1km_" "20km" "2km_" "30km"
[8] "3km_" "40km" "4km_" "50km" "5km_" "60km" "6km_"
[15] "70km" "7km_" "8km_" "9km_", "control"
问的是不是下面这样的问题?
forcats::fct_collapse(y,
DirImp = grep("DirImp", y, ignore.case = TRUE, value = TRUE),
Distances = grep("km", y, ignore.case = TRUE, value = TRUE),
Control = grep("control", y, ignore.case = TRUE, value = TRUE)
)
# [1] Distances Distances Distances Distances Distances Distances
# [7] Distances Distances Distances Distances Distances Distances
#[13] Distances Distances Distances Distances Distances Distances
#[19] Distances Distances Distances Distances Distances Distances
#[25] Distances Distances Distances Distances Control Distances
#Levels: DirImp Distances Control
或者,也许更具可读性,
grep_tmp <- function(pattern, x){
grep(pattern, x, ignore.case = TRUE, value = TRUE)
}
forcats::fct_collapse(y,
DirImp = grep_tmp("DirImp", y),
Distances = grep_tmp("^\d+km", y),
Control = grep_tmp("control", y)
)
数据
在问题中发布了levels
,这里是示例数据。
set.seed(1234)
x <- scan(text = '"DirImp" "10km" "10km_" "1km_" "20km" "2km_" "30km"
"3km_" "40km" "4km_" "50km" "5km_" "60km" "6km_"
"70km" "7km_" "8km_" "9km_" "control"', what = character())
y <- factor(sample(x, 30, TRUE), levels = x)