R中子串位置的条件移位
Conditional shift of substring position in R
> Df1
[1] "HM_004_T" "HM_004_T2" "HM_005_T" "HMFN_005_T2" "HM_007_T" "HM_007_T2" "HM_088_TR"
[8] "HM_088_T3"
参考。我有一个稍微不同的问题。我首先希望删除 _T
如果它自己出现,然后希望删除 _T2
、_T3
或 _TR
并将它们移到所有其他文本之前。
我的理想输出是:
Df1 <- c("HM_004", "T2_HM_004", "HM_005", "T2_HM_005", "HM_007", "T2_HM_007", "TR_HM_088", "T3_HM_088")
输入数据
Df1 <- c("HM_004_T", "HM_004_T2", "HM_005_T", "HM_005_T2", "HM_007_T", "HM_007_T2", "HM_088_TR", "HM_088_T3")
您可以使用包 stringr 以及函数 str_remove()
和 str_replace()
.
相对轻松地实现此目的
我假设感兴趣的模式总是出现在文本的末尾,并且它们总是以 _
开头。
请查看下面的更新代码。这将处理模式 _T*
,其中 *
现在可以是一个字母,作为目标因此是好的模式。
library(stringr)
Df1 <- c("HM_004_T", "HM_004_T2", "HM_005_T", "HM_005_T2",
"HM_007_T", "HM_007_T2", "HM_088_TR", "HM_088_T3")
# Here I remove the roots I don't want like "_T" and "_T*"
# where "*" can be a digit or a character
df2 <- str_remove(Df1, "_T$")
# Here I replace the patterns through the group reference
final <- str_replace( df2, "(^.*)_(T\d+$|T\w+$)", "\2_\1" )
final
#> [1] "HM_004" "T2_HM_004" "HM_005" "T2_HM_005" "HM_007" "T2_HM_007"
#> [7] "TR_HM_088" "T3_HM_088"
# A more coincise way would be the following where \w is the workhorse.
final <- str_replace( df2, "(^.*)_(T\w$)", "\2_\1" )
final
#> [1] "HM_004" "T2_HM_004" "HM_005" "T2_HM_005" "HM_007" "T2_HM_007"
#> [7] "TR_HM_088" "T3_HM_088"
由 reprex package (v1.0.0)
于 2021-02-16 创建
这对你有用吗?
您可以使用嵌套 sub
和反向引用来做到这一点:
DF1 <- sub("(.*)_(T\w)$", "\2_\1", sub("_T$", "", DF1))
这里你在第一个sub
操作中删除了string-final _T
,你将其结果传递给第二个sub
操作,它切换了(i)的顺序下划线 _
和 (ii) T
后跟数字或字母 (\w
) 之前的任何内容,通过使用反向引用 \1
和 \2
.
结果:
DF1
[1] "HM_004" "T2_HM_004" "HM_005" "T2_HM_005" "HM_007" "T2_HM_007" "TR_HM_088" "T3_HM_088"
数据:
DF1 <- c("HM_004_T", "HM_004_T2", "HM_005_T", "HM_005_T2",
"HM_007_T", "HM_007_T2", "HM_088_TR", "HM_088_T3")
> Df1
[1] "HM_004_T" "HM_004_T2" "HM_005_T" "HMFN_005_T2" "HM_007_T" "HM_007_T2" "HM_088_TR"
[8] "HM_088_T3"
参考_T
如果它自己出现,然后希望删除 _T2
、_T3
或 _TR
并将它们移到所有其他文本之前。
我的理想输出是:
Df1 <- c("HM_004", "T2_HM_004", "HM_005", "T2_HM_005", "HM_007", "T2_HM_007", "TR_HM_088", "T3_HM_088")
输入数据
Df1 <- c("HM_004_T", "HM_004_T2", "HM_005_T", "HM_005_T2", "HM_007_T", "HM_007_T2", "HM_088_TR", "HM_088_T3")
您可以使用包 stringr 以及函数 str_remove()
和 str_replace()
.
我假设感兴趣的模式总是出现在文本的末尾,并且它们总是以 _
开头。
请查看下面的更新代码。这将处理模式 _T*
,其中 *
现在可以是一个字母,作为目标因此是好的模式。
library(stringr)
Df1 <- c("HM_004_T", "HM_004_T2", "HM_005_T", "HM_005_T2",
"HM_007_T", "HM_007_T2", "HM_088_TR", "HM_088_T3")
# Here I remove the roots I don't want like "_T" and "_T*"
# where "*" can be a digit or a character
df2 <- str_remove(Df1, "_T$")
# Here I replace the patterns through the group reference
final <- str_replace( df2, "(^.*)_(T\d+$|T\w+$)", "\2_\1" )
final
#> [1] "HM_004" "T2_HM_004" "HM_005" "T2_HM_005" "HM_007" "T2_HM_007"
#> [7] "TR_HM_088" "T3_HM_088"
# A more coincise way would be the following where \w is the workhorse.
final <- str_replace( df2, "(^.*)_(T\w$)", "\2_\1" )
final
#> [1] "HM_004" "T2_HM_004" "HM_005" "T2_HM_005" "HM_007" "T2_HM_007"
#> [7] "TR_HM_088" "T3_HM_088"
由 reprex package (v1.0.0)
于 2021-02-16 创建这对你有用吗?
您可以使用嵌套 sub
和反向引用来做到这一点:
DF1 <- sub("(.*)_(T\w)$", "\2_\1", sub("_T$", "", DF1))
这里你在第一个sub
操作中删除了string-final _T
,你将其结果传递给第二个sub
操作,它切换了(i)的顺序下划线 _
和 (ii) T
后跟数字或字母 (\w
) 之前的任何内容,通过使用反向引用 \1
和 \2
.
结果:
DF1
[1] "HM_004" "T2_HM_004" "HM_005" "T2_HM_005" "HM_007" "T2_HM_007" "TR_HM_088" "T3_HM_088"
数据:
DF1 <- c("HM_004_T", "HM_004_T2", "HM_005_T", "HM_005_T2",
"HM_007_T", "HM_007_T2", "HM_088_TR", "HM_088_T3")