删除R中数据框中的特定字符
Deleting specific characters in a data frame in R
我有如下数据框
>sample_df
dd_mav2_6541_0_10
dd_mav2_12567_0_2
dd_mav2_43_1_341
dd_mav2_19865_2_13
dd_mav2_1_0_1
我需要删除“_”后面的所有数字。我想要如下的输出
>sample_df
dd_mav2_6541_0
dd_mav2_12567_0
dd_mav2_43_1
dd_mav2_19865_2
dd_mav2_1_0
我尝试了以下代码,但它只删除了特定数量的字符,但与我上面提到的输出不同。
substr(sample_df,nchar(sample_df)-2,nchar(sample_df))
我怎样才能得到我的输出。
你可以试试这个:
gsub("_\d+$","",sample_df)
它将删除字符串末尾的下划线及其后面的任意数字(至少一个)。
你的数据:
sample_df <- c("dd_mav2_6541_0_10","dd_mav2_12567_0_2","dd_mav2_43_1_341","dd_mav2_19865_2_13","dd_mav2_1_0_1")
gsub("_\d+$","",sample_df)
#[1] "dd_mav2_6541_0" "dd_mav2_12567_0" "dd_mav2_43_1" "dd_mav2_19865_2" "dd_mav2_1_0"
# Create the vector (I added one more element
# at the end, with less than 4 pieces)
sample_df <- c("dd_mav2_6541_0_10",
"dd_mav2_12567_0_2",
"dd_mav2_43_1_341",
"dd_mav2_19865_2_13",
"dd_mav2_1_0_1",
"dd_mav2")
# Split by "_"
xx <- strsplit(x = sample_df, split = "_")
xx
[[1]]
[1] "dd_mav2_6541_0"
[[2]]
[1] "dd_mav2_12567_0"
[[3]]
[1] "dd_mav2_43_1"
[[4]]
[1] "dd_mav2_19865_2"
# Loop through each element and reconnect the pieces
yy <- lapply(xx, function(a) {
if(length(a) < 4) {
return(paste(a, collapse = "_"))
} else {
return(paste(a[1:4], collapse = "_"))
}
})
yy
[[1]]
[1] "dd_mav2_6541_0"
[[2]]
[1] "dd_mav2_12567_0"
[[3]]
[1] "dd_mav2_43_1"
[[4]]
[1] "dd_mav2_19865_2"
# Re-create teh vector
do.call("c", yy)
[1] "dd_mav2_6541_0" "dd_mav2_12567_0" "dd_mav2_43_1"
"dd_mav2_19865_2" "dd_mav2_1_0" "dd_mav2"
我有如下数据框
>sample_df
dd_mav2_6541_0_10
dd_mav2_12567_0_2
dd_mav2_43_1_341
dd_mav2_19865_2_13
dd_mav2_1_0_1
我需要删除“_”后面的所有数字。我想要如下的输出
>sample_df
dd_mav2_6541_0
dd_mav2_12567_0
dd_mav2_43_1
dd_mav2_19865_2
dd_mav2_1_0
我尝试了以下代码,但它只删除了特定数量的字符,但与我上面提到的输出不同。
substr(sample_df,nchar(sample_df)-2,nchar(sample_df))
我怎样才能得到我的输出。
你可以试试这个:
gsub("_\d+$","",sample_df)
它将删除字符串末尾的下划线及其后面的任意数字(至少一个)。
你的数据:
sample_df <- c("dd_mav2_6541_0_10","dd_mav2_12567_0_2","dd_mav2_43_1_341","dd_mav2_19865_2_13","dd_mav2_1_0_1")
gsub("_\d+$","",sample_df)
#[1] "dd_mav2_6541_0" "dd_mav2_12567_0" "dd_mav2_43_1" "dd_mav2_19865_2" "dd_mav2_1_0"
# Create the vector (I added one more element
# at the end, with less than 4 pieces)
sample_df <- c("dd_mav2_6541_0_10",
"dd_mav2_12567_0_2",
"dd_mav2_43_1_341",
"dd_mav2_19865_2_13",
"dd_mav2_1_0_1",
"dd_mav2")
# Split by "_"
xx <- strsplit(x = sample_df, split = "_")
xx
[[1]]
[1] "dd_mav2_6541_0"
[[2]]
[1] "dd_mav2_12567_0"
[[3]]
[1] "dd_mav2_43_1"
[[4]]
[1] "dd_mav2_19865_2"
# Loop through each element and reconnect the pieces
yy <- lapply(xx, function(a) {
if(length(a) < 4) {
return(paste(a, collapse = "_"))
} else {
return(paste(a[1:4], collapse = "_"))
}
})
yy
[[1]]
[1] "dd_mav2_6541_0"
[[2]]
[1] "dd_mav2_12567_0"
[[3]]
[1] "dd_mav2_43_1"
[[4]]
[1] "dd_mav2_19865_2"
# Re-create teh vector
do.call("c", yy)
[1] "dd_mav2_6541_0" "dd_mav2_12567_0" "dd_mav2_43_1"
"dd_mav2_19865_2" "dd_mav2_1_0" "dd_mav2"