在 R 中使用 gsub 进行多项更改
Using gsub in R for multiple changes
我有一个 data.frame,我想在其中 'clean' 列的名称:
>names(Data)
[1] "tBodyAcc.mean...X"
[2] "angle.X.gravityMean."
[3] "fBodyBodyGyroJerkMag.mean.."
[4] "fBodyAccMag.meanFreq.."
.
.
我正在使用以下代码:
names(Data)<-gsub('[mM]ean',' Mean ',names(Data))
names(Data)<-gsub('[Ff]req',' Frequency ',names(Data))
names(Data)<-gsub('^t','Time ',names(Data))
names(Data)<-gsub('\.',' ',names(Data))
获取以下内容:
[1] "Time BodyAcc Mean X"
[2] "angle X gravity Mean "
[3] "fBodyBodyGyroJerkMag Mean "
[4] "fBodyAccMag Mean Frequency "
有没有一种方法可以用一行或另一种比这行更优雅的方式来实现这一点?
由于您需要将每个正则表达式应用到整个向量,所以如果没有某种循环就无法做到这一点。在下面的示例中,n
是您的 names(Data)
向量:
n <- c("tBodyAcc.mean...X", "angle.X.gravityMean.", "fBodyBodyGyroJerkMag.mean..", "fBodyAccMag.meanFreq..")
for(i in seq_along(n)) {
p <- c('[mM]ean', '[Ff]req', '^t', '\.')
r <- c(' Mean ', ' Frequency ', 'Time ', ' ')
n <- gsub(p[i], r[i], n)
}
结果:
> n
[1] "Time BodyAcc Mean X" "angle X gravity Mean "
[3] "fBodyBodyGyroJerkMag Mean " "fBodyAccMag Mean Frequency "
您也可以尝试 stringi
包中的 stri_replace_all_regex
:
library(stringi)
stri_replace_all_regex(names(Data), c("mean", "freq", "^t", "\."), c(' Mean ', ' Frequency ', 'Time ', ' '), F, list(case_insensitive = TRUE))
# [1] "Time BodyAcc Mean X" "angle X gravity Mean "
# [3] "fBodyBodyGyroJerkMag Mean " "fBodyAccMag Mean Frequency "
你已经很不错了,但是前两个正则表达式可以使用 ignore.case = TRUE
稍微简化一下。此外,除了最后一个,我们只希望替换一次,所以最好使用 sub
而不是 gsub
:
nms <- c("tBodyAcc.mean...X", "angle.X.gravityMean.",
"fBodyBodyGyroJerkMag.mean..", "fBodyAccMag.meanFreq..")
nms <- sub('mean', ' Mean ', nms, ignore.case = TRUE)
nms <- sub('freq', ' Frequency ', nms, ignore.case = TRUE)
nms <- sub('^t', 'Time ', nms)
nms <- gsub('\.', ' ', nms)
我有一个 data.frame,我想在其中 'clean' 列的名称:
>names(Data)
[1] "tBodyAcc.mean...X"
[2] "angle.X.gravityMean."
[3] "fBodyBodyGyroJerkMag.mean.."
[4] "fBodyAccMag.meanFreq.."
.
.
我正在使用以下代码:
names(Data)<-gsub('[mM]ean',' Mean ',names(Data))
names(Data)<-gsub('[Ff]req',' Frequency ',names(Data))
names(Data)<-gsub('^t','Time ',names(Data))
names(Data)<-gsub('\.',' ',names(Data))
获取以下内容:
[1] "Time BodyAcc Mean X"
[2] "angle X gravity Mean "
[3] "fBodyBodyGyroJerkMag Mean "
[4] "fBodyAccMag Mean Frequency "
有没有一种方法可以用一行或另一种比这行更优雅的方式来实现这一点?
由于您需要将每个正则表达式应用到整个向量,所以如果没有某种循环就无法做到这一点。在下面的示例中,n
是您的 names(Data)
向量:
n <- c("tBodyAcc.mean...X", "angle.X.gravityMean.", "fBodyBodyGyroJerkMag.mean..", "fBodyAccMag.meanFreq..")
for(i in seq_along(n)) {
p <- c('[mM]ean', '[Ff]req', '^t', '\.')
r <- c(' Mean ', ' Frequency ', 'Time ', ' ')
n <- gsub(p[i], r[i], n)
}
结果:
> n
[1] "Time BodyAcc Mean X" "angle X gravity Mean "
[3] "fBodyBodyGyroJerkMag Mean " "fBodyAccMag Mean Frequency "
您也可以尝试 stringi
包中的 stri_replace_all_regex
:
library(stringi)
stri_replace_all_regex(names(Data), c("mean", "freq", "^t", "\."), c(' Mean ', ' Frequency ', 'Time ', ' '), F, list(case_insensitive = TRUE))
# [1] "Time BodyAcc Mean X" "angle X gravity Mean "
# [3] "fBodyBodyGyroJerkMag Mean " "fBodyAccMag Mean Frequency "
你已经很不错了,但是前两个正则表达式可以使用 ignore.case = TRUE
稍微简化一下。此外,除了最后一个,我们只希望替换一次,所以最好使用 sub
而不是 gsub
:
nms <- c("tBodyAcc.mean...X", "angle.X.gravityMean.",
"fBodyBodyGyroJerkMag.mean..", "fBodyAccMag.meanFreq..")
nms <- sub('mean', ' Mean ', nms, ignore.case = TRUE)
nms <- sub('freq', ' Frequency ', nms, ignore.case = TRUE)
nms <- sub('^t', 'Time ', nms)
nms <- gsub('\.', ' ', nms)