标准化多行字符串(R 或 Unix)
Standardizing the character strings in multiple rows (R or Unix)
我想在V1列中将所有那些_xxxxxx字符串标准化为xxxxxxH格式。
V1 V2 V3
122223H 20 Test kits
122224H 23 Test kits
122225H 42 Test kits
122227H 31 Test kits
_122228 23 Test kits
_122229 57 Test kits
_122231 21 Test kits
122232H 33 Test kits
122234H 22 Test kits
....... .. .... ....
....... .. .... ....
....... .. .... ....
122250H 33 Test kits
我试图用 R 中的 gsub 函数解决它,但无法制作我需要的确切格式。任何类型的建议,请!基于 Unix 的命令也很有用。
df <- gsub("_","H",c(file$V1))
输出;
"H1222228" "H1222229" "H1222231"
期望的输出;
V1 V2 V3
122223H 20 Test kits
122224H 23 Test kits
122225H 42 Test kits
122227H 31 Test kits
122228H 23 Test kits
122229H 57 Test kits
122231H 21 Test kits
122232H 33 Test kits
122234H 22 Test kits
....... .. .... ....
....... .. .... ....
....... .. .... ....
122250H 33 Test kits
在字符串以下划线开头的情况下,只需将数字替换为后跟 H 的数字即可:
file <- data.frame(v1 = c("122227H", "_122231"))
file$v1 <- gsub("_(\d.+)", "\1H", file$v1)
输出:
"122227H" "122231H"
尝试以下方法,但可能存在更优雅的解决方案:
df <- data.frame(v1 = c("122223H","122224H","122225H","122227H","_122228","_122229"),
v2 = c(21,23,42,31,23,57),
v3 = rep("Test Kits", times = 6))
df$newstring <- gsub("_","",c(df$v1))
df$newstring <- ifelse(grepl("H", df$newstring, fixed = TRUE), df$newstring, paste0(df$newstring,"H"))
# > df
# v1 v2 v3 newstring
# 1 122223H 21 Test Kits 122223H
# 2 122224H 23 Test Kits 122224H
# 3 122225H 42 Test Kits 122225H
# 4 122227H 31 Test Kits 122227H
# 5 _122228 23 Test Kits 122228H
# 6 _122229 57 Test Kits 122229H
$ sed 's/_\([0-9]*\)/H/' file
V1 V2 V3
122223H 20 Test kits
122224H 23 Test kits
122225H 42 Test kits
122227H 31 Test kits
122228H 23 Test kits
122229H 57 Test kits
122231H 21 Test kits
122232H 33 Test kits
122234H 22 Test kits
....... .. .... ....
....... .. .... ....
....... .. .... ....
122250H 33 Test kits
我想在V1列中将所有那些_xxxxxx字符串标准化为xxxxxxH格式。
V1 V2 V3
122223H 20 Test kits
122224H 23 Test kits
122225H 42 Test kits
122227H 31 Test kits
_122228 23 Test kits
_122229 57 Test kits
_122231 21 Test kits
122232H 33 Test kits
122234H 22 Test kits
....... .. .... ....
....... .. .... ....
....... .. .... ....
122250H 33 Test kits
我试图用 R 中的 gsub 函数解决它,但无法制作我需要的确切格式。任何类型的建议,请!基于 Unix 的命令也很有用。
df <- gsub("_","H",c(file$V1))
输出;
"H1222228" "H1222229" "H1222231"
期望的输出;
V1 V2 V3
122223H 20 Test kits
122224H 23 Test kits
122225H 42 Test kits
122227H 31 Test kits
122228H 23 Test kits
122229H 57 Test kits
122231H 21 Test kits
122232H 33 Test kits
122234H 22 Test kits
....... .. .... ....
....... .. .... ....
....... .. .... ....
122250H 33 Test kits
在字符串以下划线开头的情况下,只需将数字替换为后跟 H 的数字即可:
file <- data.frame(v1 = c("122227H", "_122231"))
file$v1 <- gsub("_(\d.+)", "\1H", file$v1)
输出:
"122227H" "122231H"
尝试以下方法,但可能存在更优雅的解决方案:
df <- data.frame(v1 = c("122223H","122224H","122225H","122227H","_122228","_122229"),
v2 = c(21,23,42,31,23,57),
v3 = rep("Test Kits", times = 6))
df$newstring <- gsub("_","",c(df$v1))
df$newstring <- ifelse(grepl("H", df$newstring, fixed = TRUE), df$newstring, paste0(df$newstring,"H"))
# > df
# v1 v2 v3 newstring
# 1 122223H 21 Test Kits 122223H
# 2 122224H 23 Test Kits 122224H
# 3 122225H 42 Test Kits 122225H
# 4 122227H 31 Test Kits 122227H
# 5 _122228 23 Test Kits 122228H
# 6 _122229 57 Test Kits 122229H
$ sed 's/_\([0-9]*\)/H/' file
V1 V2 V3
122223H 20 Test kits
122224H 23 Test kits
122225H 42 Test kits
122227H 31 Test kits
122228H 23 Test kits
122229H 57 Test kits
122231H 21 Test kits
122232H 33 Test kits
122234H 22 Test kits
....... .. .... ....
....... .. .... ....
....... .. .... ....
122250H 33 Test kits