统一使用字符串中的单位数和双位数
Homogenize use of single and double digit numbers in string
我有一个非常大的 data.table,其中(大量)项目由包括文本和数字的字符串定义。
library(data.table)
dd <- data.table(x = c("A4","A4","A4","A14","A14","A14","B4","B4","B4"),y = c("A4","A14","B4","A4","A14","B4","A4","A14","B4"), z = c(1,2,3,4,5,6,7,8,9))
x y z
A4 A4 1
A4 A14 2
A4 B4 3
A14 A4 4
A14 A14 5
A14 B4 6
B4 A4 7
B4 A14 8
B4 B4 9
数字可以是个位数或双位数,因此 R 将始终根据数字中的第一个数字(A14 在 A4 之前)对它们进行排序。混合排序可以处理这个。但是,当我将长数据重塑为宽数据时
wide <- dcast(dd, x ~ y, value.var = "z")
R 正在根据基本排序规则再次应用排序。
x A14 A4 B4
A14 5 4 6
A4 2 1 3
B4 8 7 9
但是我需要原始顺序来进行后续矩阵计算。有什么有效的方法可以将字符串+单个数字重命名为字符串+双位数(A4-> A04)或我错过的另一种方法吗?
您可以使用 sprintf()
为数字预填充 0
sprintf("%s%02.0d", "A", 1:20)
# [1] "A01" "A02" "A03" "A04" "A05" "A06" "A07" "A08" "A09" "A10" "A11" "A12" "A13" "A14" "A15" "A16" "A17" "A18" "A19" "A20"
您可以使用
将 0
添加到您的数据中
dd[nchar(x) == 2, x := paste0(substr(x, 1, 1), 0, substr(x, 2, 2))]
dd[nchar(y) == 2, y := paste0(substr(y, 1, 1), 0, substr(y, 2, 2))]
# x y z
# 1: A04 A04 1
# 2: A04 A14 2
# 3: A04 B04 3
# 4: A14 A04 4
# 5: A14 A14 5
# 6: A14 B04 6
# 7: B04 A04 7
# 8: B04 A14 8
# 9: B04 B04 9
或者,如果您需要申请更多栏目:
to.change <- c('x', 'y')
dd[, (to.change) := lapply(.SD, function(x) ifelse(nchar(x) > 2, x
, paste0(substr(x, 1, 1), 0, substr(x, 2, 2))))
, .SDcols = to.change]
此解决方案不需要额外的零。
# Data frame
df <- data.frame(x = c("A4","A4","A4","A14","A14","A14","B4","B4","B4"),
y = c("A4","A14","B4","A4","A14","B4","A4","A14","B4"),
z = c(1,2,3,4,5,6,7,8,9),
stringsAsFactors = FALSE)
# Reorder columns and rows using `mixedsort`.
wide <- dcast(df, x ~ y,value.var = "z") %>%
select(x, mixedsort(unique(df$x))) %>%
slice(match(x, mixedsort(unique(df$x))))
给予,
# x A4 A14 B4
# 1 A4 1 2 3
# 2 A14 4 5 6
# 3 B4 7 8 9
另一个可能是最简单的选项是使用 gtools
-package 中的 mixedorder
:
wide <- dcast(dd, x ~ y, value.var = "z")[gtools::mixedorder(x)]
给出:
> wide
x A14 A4 B4
1: A4 2 1 3
2: A14 5 4 6
3: B4 8 7 9
如果您还想以相同的方式设置列顺序,您还可以使用 setcolorder
:
setcolorder(wide, c(1, gtools::mixedorder(names(wide)[-1]) + 1))
然后给出:
> wide
x A4 A14 B4
1: A4 1 2 3
2: A14 4 5 6
3: B4 7 8 9
您可能需要考虑通过因子直接在数据中实现此顺序,这样您以后就不必通过数据整理来修复它。
如果您已经将这些唯一值排序到某处,您将不需要 mixedorder
而不是 mixedsort
,然后将它们转换为因子。
否则您可以取回订单:
library(gtools)
dd[,1:2] <- lapply(dd[,1:2],function(x) factor(x, mixedsort(unique(x))))
并正常进行:
dcast(dd, x ~ y, value.var = "z")
# x A4 A14 B4
# 1: A4 1 2 3
# 2: A14 4 5 6
# 3: B4 7 8 9
我有一个非常大的 data.table,其中(大量)项目由包括文本和数字的字符串定义。
library(data.table)
dd <- data.table(x = c("A4","A4","A4","A14","A14","A14","B4","B4","B4"),y = c("A4","A14","B4","A4","A14","B4","A4","A14","B4"), z = c(1,2,3,4,5,6,7,8,9))
x y z
A4 A4 1
A4 A14 2
A4 B4 3
A14 A4 4
A14 A14 5
A14 B4 6
B4 A4 7
B4 A14 8
B4 B4 9
数字可以是个位数或双位数,因此 R 将始终根据数字中的第一个数字(A14 在 A4 之前)对它们进行排序。混合排序可以处理这个。但是,当我将长数据重塑为宽数据时
wide <- dcast(dd, x ~ y, value.var = "z")
R 正在根据基本排序规则再次应用排序。
x A14 A4 B4
A14 5 4 6
A4 2 1 3
B4 8 7 9
但是我需要原始顺序来进行后续矩阵计算。有什么有效的方法可以将字符串+单个数字重命名为字符串+双位数(A4-> A04)或我错过的另一种方法吗?
您可以使用 sprintf()
为数字预填充 0
sprintf("%s%02.0d", "A", 1:20)
# [1] "A01" "A02" "A03" "A04" "A05" "A06" "A07" "A08" "A09" "A10" "A11" "A12" "A13" "A14" "A15" "A16" "A17" "A18" "A19" "A20"
您可以使用
将0
添加到您的数据中
dd[nchar(x) == 2, x := paste0(substr(x, 1, 1), 0, substr(x, 2, 2))]
dd[nchar(y) == 2, y := paste0(substr(y, 1, 1), 0, substr(y, 2, 2))]
# x y z
# 1: A04 A04 1
# 2: A04 A14 2
# 3: A04 B04 3
# 4: A14 A04 4
# 5: A14 A14 5
# 6: A14 B04 6
# 7: B04 A04 7
# 8: B04 A14 8
# 9: B04 B04 9
或者,如果您需要申请更多栏目:
to.change <- c('x', 'y')
dd[, (to.change) := lapply(.SD, function(x) ifelse(nchar(x) > 2, x
, paste0(substr(x, 1, 1), 0, substr(x, 2, 2))))
, .SDcols = to.change]
此解决方案不需要额外的零。
# Data frame
df <- data.frame(x = c("A4","A4","A4","A14","A14","A14","B4","B4","B4"),
y = c("A4","A14","B4","A4","A14","B4","A4","A14","B4"),
z = c(1,2,3,4,5,6,7,8,9),
stringsAsFactors = FALSE)
# Reorder columns and rows using `mixedsort`.
wide <- dcast(df, x ~ y,value.var = "z") %>%
select(x, mixedsort(unique(df$x))) %>%
slice(match(x, mixedsort(unique(df$x))))
给予,
# x A4 A14 B4
# 1 A4 1 2 3
# 2 A14 4 5 6
# 3 B4 7 8 9
另一个可能是最简单的选项是使用 gtools
-package 中的 mixedorder
:
wide <- dcast(dd, x ~ y, value.var = "z")[gtools::mixedorder(x)]
给出:
> wide x A14 A4 B4 1: A4 2 1 3 2: A14 5 4 6 3: B4 8 7 9
如果您还想以相同的方式设置列顺序,您还可以使用 setcolorder
:
setcolorder(wide, c(1, gtools::mixedorder(names(wide)[-1]) + 1))
然后给出:
> wide x A4 A14 B4 1: A4 1 2 3 2: A14 4 5 6 3: B4 7 8 9
您可能需要考虑通过因子直接在数据中实现此顺序,这样您以后就不必通过数据整理来修复它。
如果您已经将这些唯一值排序到某处,您将不需要 mixedorder
而不是 mixedsort
,然后将它们转换为因子。
否则您可以取回订单:
library(gtools)
dd[,1:2] <- lapply(dd[,1:2],function(x) factor(x, mixedsort(unique(x))))
并正常进行:
dcast(dd, x ~ y, value.var = "z")
# x A4 A14 B4
# 1: A4 1 2 3
# 2: A14 4 5 6
# 3: B4 7 8 9