Trim 数据框中字符串的一部分
Trim part of a string in dataframe
如果我有这样的数据帧结构:
AA1_123.zip
BB2_456.txt
CCC_789.doc
如何改成这样:
AA1
BB2
CCC
你可以试试sub
sub('_.*', '', df1$Col)
#[1] "AA1" "BB2" "CCC"
数据
df1 <- structure(list(Col = c("AA1_123.zip", "BB2_456.txt",
"CCC_789.doc"
)), .Names = "Col", class = "data.frame", row.names = c(NA, -3L))
如果字符串开头都是相同的样式,下划线前三个字符,这样可以:
df1 <- structure(list(Col = c("AA1_123.zip", "BB2_456.txt",
"CCC_789.doc"
)), .Names = "Col", class = "data.frame", row.names = c(NA, -3L))
> substr(df1$Col, 1, 3)
[1] "AA1" "BB2" "CCC"
您也可以再次阅读该专栏,使用 comment.char = "_"
刷新该行的其余部分。 Y
df <- data.frame(x = c("AA1_123.zip", "BB2_456.txt", "CCC_789.doc"))
read.table(text = as.character(df$x), comment.char="_")
# V1
# 1 AA1
# 2 BB2
# 3 CCC
或者您可以使用 scan()
scan(text = as.character(df$x), what = "", comment.char="_")
# Read 3 items
# [1] "AA1" "BB2" "CCC"
如果我有这样的数据帧结构:
AA1_123.zip
BB2_456.txt
CCC_789.doc
如何改成这样:
AA1
BB2
CCC
你可以试试sub
sub('_.*', '', df1$Col)
#[1] "AA1" "BB2" "CCC"
数据
df1 <- structure(list(Col = c("AA1_123.zip", "BB2_456.txt",
"CCC_789.doc"
)), .Names = "Col", class = "data.frame", row.names = c(NA, -3L))
如果字符串开头都是相同的样式,下划线前三个字符,这样可以:
df1 <- structure(list(Col = c("AA1_123.zip", "BB2_456.txt",
"CCC_789.doc"
)), .Names = "Col", class = "data.frame", row.names = c(NA, -3L))
> substr(df1$Col, 1, 3)
[1] "AA1" "BB2" "CCC"
您也可以再次阅读该专栏,使用 comment.char = "_"
刷新该行的其余部分。 Y
df <- data.frame(x = c("AA1_123.zip", "BB2_456.txt", "CCC_789.doc"))
read.table(text = as.character(df$x), comment.char="_")
# V1
# 1 AA1
# 2 BB2
# 3 CCC
或者您可以使用 scan()
scan(text = as.character(df$x), what = "", comment.char="_")
# Read 3 items
# [1] "AA1" "BB2" "CCC"