在 r 中添加前导 0
Adding leading 0s in r
我有一个大数据框,其中填充了以下字符:
x <- c("Y188","Y204" ,"Y221","EP121_1" ,"Y233" , "Y248" ,"Y268", "BB2","BB20",
"BB32" ,"BB044" ,"BB056" , "Y234" , "Y249" ,"Y271" ,"BB3", "BB21", "BB33",
"BB045","BB057" ,"Y236", "Y250", "Y272" , "BB4", "BB22" )
如您所见,某些标签(例如 BB20)只有两个整数。我希望整个字符列表至少有 3 个这样的整数(如果有帮助,问题只出现在 BB 标签中):
Y188, Y204, Y221, EP121_1, Y233, Y248, Y268, BB002, BB020, BB032, BB044,
BB056, Y234, Y249, Y271, BB003, BB021, BB033, BB045, BB057, Y236, Y250,
Y272, BB004, BB022
我研究了 sprintf
和 FormatC
函数,但仍然没有运气。
带有嵌套 gsub
调用的有力方法:
gsub("(.*[A-Z])(\d{1}$)", "\100\2",
gsub("(.*[A-Z])(\d{2}$)", "\10\2", x))
# [1] "Y188" "Y204" "Y221" "EP121_1" "Y233" "Y248" "Y268" "BB002" "BB020"
# [10] "BB032" "BB044" "BB056" "Y234" "Y249" "Y271" "BB003" "BB021" "BB033"
# [19] "BB045" "BB057" "Y236" "Y250" "Y272" "BB004" "BB022"
肯定有更通用的方法来执行此操作,但对于这样的本地化任务,两个简单的 sub
就足够了:为两位数字添加一个尾随零,为一位数字添加两个尾随零数字。
x <- sub("^BB(\d{1})$","BB00\1",x)
x <- sub("^BB(\d{2})$","BB0\1",x)
这可行,但会有边缘情况
# indicator for numeric of length less than three
num <- gsub("[^0-9]", "", x)
id <- nchar(num) < 3
# overwrite relevant values with the reformatted ones
x[id] <- paste0(gsub("[0-9]", "", x)[id],
formatC(as.numeric(num[id]), width = 3, flag = "0"))
[1] "Y188" "Y204" "Y221" "EP121_1" "Y233" "Y248" "Y268" "BB002" "BB020" "BB032"
[11] "BB044" "BB056" "Y234" "Y249" "Y271" "BB003" "BB021" "BB033" "BB045" "BB057"
[21] "Y236" "Y250" "Y272" "BB004" "BB022"
可以使用 sprintf 和 gsub function.This 步骤来提取数值并更改其格式。
num=sprintf("%03d",as.numeric(gsub("[^[:digit:]]", "", x)))
下一步是粘贴已更改格式的数字
x=paste(gsub("[^[:alpha:]]", "", x),num,sep="")
我有一个大数据框,其中填充了以下字符:
x <- c("Y188","Y204" ,"Y221","EP121_1" ,"Y233" , "Y248" ,"Y268", "BB2","BB20",
"BB32" ,"BB044" ,"BB056" , "Y234" , "Y249" ,"Y271" ,"BB3", "BB21", "BB33",
"BB045","BB057" ,"Y236", "Y250", "Y272" , "BB4", "BB22" )
如您所见,某些标签(例如 BB20)只有两个整数。我希望整个字符列表至少有 3 个这样的整数(如果有帮助,问题只出现在 BB 标签中):
Y188, Y204, Y221, EP121_1, Y233, Y248, Y268, BB002, BB020, BB032, BB044,
BB056, Y234, Y249, Y271, BB003, BB021, BB033, BB045, BB057, Y236, Y250,
Y272, BB004, BB022
我研究了 sprintf
和 FormatC
函数,但仍然没有运气。
带有嵌套 gsub
调用的有力方法:
gsub("(.*[A-Z])(\d{1}$)", "\100\2",
gsub("(.*[A-Z])(\d{2}$)", "\10\2", x))
# [1] "Y188" "Y204" "Y221" "EP121_1" "Y233" "Y248" "Y268" "BB002" "BB020"
# [10] "BB032" "BB044" "BB056" "Y234" "Y249" "Y271" "BB003" "BB021" "BB033"
# [19] "BB045" "BB057" "Y236" "Y250" "Y272" "BB004" "BB022"
肯定有更通用的方法来执行此操作,但对于这样的本地化任务,两个简单的 sub
就足够了:为两位数字添加一个尾随零,为一位数字添加两个尾随零数字。
x <- sub("^BB(\d{1})$","BB00\1",x)
x <- sub("^BB(\d{2})$","BB0\1",x)
这可行,但会有边缘情况
# indicator for numeric of length less than three
num <- gsub("[^0-9]", "", x)
id <- nchar(num) < 3
# overwrite relevant values with the reformatted ones
x[id] <- paste0(gsub("[0-9]", "", x)[id],
formatC(as.numeric(num[id]), width = 3, flag = "0"))
[1] "Y188" "Y204" "Y221" "EP121_1" "Y233" "Y248" "Y268" "BB002" "BB020" "BB032"
[11] "BB044" "BB056" "Y234" "Y249" "Y271" "BB003" "BB021" "BB033" "BB045" "BB057"
[21] "Y236" "Y250" "Y272" "BB004" "BB022"
可以使用 sprintf 和 gsub function.This 步骤来提取数值并更改其格式。
num=sprintf("%03d",as.numeric(gsub("[^[:digit:]]", "", x)))
下一步是粘贴已更改格式的数字
x=paste(gsub("[^[:alpha:]]", "", x),num,sep="")