Trim 列中的结束连字符
Trim ending hyphens in a column
我有一个 data.frame 列,看起来像:
Lake-and-Peninsula--
Matanuska-Susitna---
Nome----
North-Slope---
Northwest-Arctic---
Prince-of-Wales-Outer-
Sitka----
Skagway-Hoonah-Angoon--
Southeast-Fairbanks---
Valdez-Cordova---
Wade-Hampton---
Wrangell-Petersburg---
Yakutat----
每个单元格都以一定数量的连字符结尾。我想删除单元格末尾的所有连字符,但保留单词之间的连字符。我怎样才能做到这一点?最多只有4个连字符,有时有none。
期望的输出:
Lake-and-Peninsula
Matanuska-Susitna
Nome
North-Slope
Northwest-Arctic
Prince-of-Wales-Outer
Sitka
Skagway-Hoonah-Angoon
Southeast-Fairbanks
Valdez-Cordova
Wade-Hampton
Wrangell-Petersburg
Yakutat
我们可以用sub
匹配字符串末尾的一个或多个-
(-+
)($
),用空白
df1$Col <- sub("-+$", "", df1$Col)
df1
# Col
#1 Lake-and-Peninsula
#2 Matanuska-Susitna
#3 Nome
#4 North-Slope
#5 Northwest-Arctic
#6 Prince-of-Wales-Outer
#7 Sitka
#8 Skagway-Hoonah-Angoon
#9 Southeast-Fairbanks
#10 Valdez-Cordova
#11 Wade-Hampton
#12 Wrangell-Petersburg
#13 Yakutat
数据
df1 <- structure(list(Col = c("Lake-and-Peninsula--", "Matanuska-Susitna---",
"Nome----", "North-Slope---", "Northwest-Arctic---", "Prince-of-Wales-Outer-",
"Sitka----", "Skagway-Hoonah-Angoon--", "Southeast-Fairbanks---",
"Valdez-Cordova---", "Wade-Hampton---", "Wrangell-Petersburg---",
"Yakutat----")), .Names = "Col", class = "data.frame", row.names = c(NA, -13L))
根据尾随连字符的数量,我猜测我们获取这些字符串的方式是因为初始数据框中有一些空白单元格。然后我们将列粘贴为一个连字符作为分隔符。
相反,在粘贴之前排除空白以避免出现这个额外的连字符问题,例如:
# data
x <- c("Lake", "and", "Peninsula", "", "")
# paste old
paste(x, collapse = "-")
# [1] "Lake-and-Peninsula--"
# paste after removing blanks
paste(x[ x != ""], collapse = "-")
# [1] "Lake-and-Peninsula"
我有一个 data.frame 列,看起来像:
Lake-and-Peninsula--
Matanuska-Susitna---
Nome----
North-Slope---
Northwest-Arctic---
Prince-of-Wales-Outer-
Sitka----
Skagway-Hoonah-Angoon--
Southeast-Fairbanks---
Valdez-Cordova---
Wade-Hampton---
Wrangell-Petersburg---
Yakutat----
每个单元格都以一定数量的连字符结尾。我想删除单元格末尾的所有连字符,但保留单词之间的连字符。我怎样才能做到这一点?最多只有4个连字符,有时有none。
期望的输出:
Lake-and-Peninsula
Matanuska-Susitna
Nome
North-Slope
Northwest-Arctic
Prince-of-Wales-Outer
Sitka
Skagway-Hoonah-Angoon
Southeast-Fairbanks
Valdez-Cordova
Wade-Hampton
Wrangell-Petersburg
Yakutat
我们可以用sub
匹配字符串末尾的一个或多个-
(-+
)($
),用空白
df1$Col <- sub("-+$", "", df1$Col)
df1
# Col
#1 Lake-and-Peninsula
#2 Matanuska-Susitna
#3 Nome
#4 North-Slope
#5 Northwest-Arctic
#6 Prince-of-Wales-Outer
#7 Sitka
#8 Skagway-Hoonah-Angoon
#9 Southeast-Fairbanks
#10 Valdez-Cordova
#11 Wade-Hampton
#12 Wrangell-Petersburg
#13 Yakutat
数据
df1 <- structure(list(Col = c("Lake-and-Peninsula--", "Matanuska-Susitna---",
"Nome----", "North-Slope---", "Northwest-Arctic---", "Prince-of-Wales-Outer-",
"Sitka----", "Skagway-Hoonah-Angoon--", "Southeast-Fairbanks---",
"Valdez-Cordova---", "Wade-Hampton---", "Wrangell-Petersburg---",
"Yakutat----")), .Names = "Col", class = "data.frame", row.names = c(NA, -13L))
根据尾随连字符的数量,我猜测我们获取这些字符串的方式是因为初始数据框中有一些空白单元格。然后我们将列粘贴为一个连字符作为分隔符。
相反,在粘贴之前排除空白以避免出现这个额外的连字符问题,例如:
# data
x <- c("Lake", "and", "Peninsula", "", "")
# paste old
paste(x, collapse = "-")
# [1] "Lake-and-Peninsula--"
# paste after removing blanks
paste(x[ x != ""], collapse = "-")
# [1] "Lake-and-Peninsula"