用千位后的逗号分隔重新排列复杂因子向量
Reformarring complex factor vector with comma separation after thousand
我想重新格式化一个因子向量,以便它包含的数字有一个千位分隔符。该向量包含整数和实数,在值或顺序方面没有任何特定规则。
数据
特别是,我使用的向量 vec
类似于下面生成的向量:
content <- c("0 - 100", "0 - 100", "0 - 100", "0 - 100",
"150.22 - 170.33",
"1000 - 2000","1000 - 2000", "1000 - 2000", "1000 - 2000",
"7000 - 10000", "7000 - 10000", "7000 - 10000", "7000 - 10000",
"7000 - 10000", "1000000 - 22000000", "1000000 - 22000000",
"1000000 - 22000000",
"44000000 - 66000000.8989898989")
vec <- factor(x = content, levels = unique(content))
想要的结果
我的目标是重新格式化此矢量,使数字包含 Excel-like 1,000 分隔符,如下面的示例:
100.00
1,000.00
1,000,000.00
1,000,000.56
24,564,000,000.56
尝试过的方法
我正在考虑使用 gsubfn
和一个可以传递数字的原型对象。然后也许用 3 位数字创建另一个原型对象并替换。如以下代码所示:
gsubfn(pattern = "[0-9][0-9][0-9]", replacement = ~paste0(x, ','),
x = as.character(vec))
这仅部分有效,因为逗号被插入:
"150,.22 - 170,.33"
这显然是错误的。我还必须将字符向量转换为因子。因此,我的问题归结为两个要素:
- 如何解决逗号问题?
- 如何保持因子的原始结构? - 我需要一个因子向量以与原始向量相同的方式排序,但正确位置的逗号。
使用基于正前瞻的正则表达式...
content <- c("0 - 100", "0 - 100", "0 - 100", "0 - 100",
"1000 - 2000","1000 - 2000", "1000 - 2000", "1000 - 2000",
"7000 - 10000", "7000 - 10000", "7000 - 10000", "7000 - 10000",
"7000 - 10000", "1000000 - 22000000", "1000000 - 22000000",
"1000000 - 22000000")
gsub("(\d)(?=(?:\d{3})+\b)", "\1,", content, perl=T)
# [1] "0 - 100" "0 - 100" "0 - 100"
# [4] "0 - 100" "1,000 - 2,000" "1,000 - 2,000"
# [7] "1,000 - 2,000" "1,000 - 2,000" "7,000 - 10,000"
# [10] "7,000 - 10,000" "7,000 - 10,000" "7,000 - 10,000"
# [13] "7,000 - 10,000" "1,000,000 - 22,000,000" "1,000,000 - 22,000,000"
# [16] "1,000,000 - 22,000,000"
也许你可以使用 formatC
:
sapply(
X = lapply(
X = strsplit(x = content, split = " - "),
FUN = function(x) {
formatC(x = as.numeric(x), format = "f", flag = "#", big.mark = ",",
decimal.mark = ".", digits = 2, drop0trailing = FALSE)
}
),
FUN = paste, collapse = " - "
)
# [1] "0.00 - 100.00" "0.00 - 100.00" "0.00 - 100.00"
# [4] "0.00 - 100.00" "150.22 - 170.33" "1,000.00 - 2,000.00"
# [7] "1,000.00 - 2,000.00" "1,000.00 - 2,000.00" "1,000.00 - 2,000.00"
# [10] "7,000.00 - 10,000.00" "7,000.00 - 10,000.00" "7,000.00 - 10,000.00"
# [13] "7,000.00 - 10,000.00" "7,000.00 - 10,000.00" "1,000,000.00 - 22,000,000.00"
# [16] "1,000,000.00 - 22,000,000.00" "1,000,000.00 - 22,000,000.00" "44,000,000.00 - 66,000,000.90"
仅在 levels
上运行 似乎可以保持您的精度水平,而不是将您的向量转换为 character
向量并且效率更高,因为它正在减少您操作的数据的大小仅对唯一值(而不是整个向量)
levels(vec) <- sapply(strsplit(levels(vec), " - "),
function(x) paste(prettyNum(x,
big.mark = ",",
preserve.width = "none"),
collapse = " - "))
vec
# [1] 0 - 100 0 - 100 0 - 100 0 - 100 150.22 - 170.33
# [6] 1,000 - 2,000 1,000 - 2,000 1,000 - 2,000 1,000 - 2,000 7,000 - 10,000
# [11] 7,000 - 10,000 7,000 - 10,000 7,000 - 10,000 7,000 - 10,000 1,000,000 - 22,000,000
# [16] 1,000,000 - 22,000,000 1,000,000 - 22,000,000 44,000,000 - 66,000,000.8989898989
# Levels: 0 - 100 150.22 - 170.33 1,000 - 2,000 7,000 - 10,000 1,000,000 - 22,000,000 44,000,000 - 66,000,000.8989898989
我想重新格式化一个因子向量,以便它包含的数字有一个千位分隔符。该向量包含整数和实数,在值或顺序方面没有任何特定规则。
数据
特别是,我使用的向量 vec
类似于下面生成的向量:
content <- c("0 - 100", "0 - 100", "0 - 100", "0 - 100",
"150.22 - 170.33",
"1000 - 2000","1000 - 2000", "1000 - 2000", "1000 - 2000",
"7000 - 10000", "7000 - 10000", "7000 - 10000", "7000 - 10000",
"7000 - 10000", "1000000 - 22000000", "1000000 - 22000000",
"1000000 - 22000000",
"44000000 - 66000000.8989898989")
vec <- factor(x = content, levels = unique(content))
想要的结果
我的目标是重新格式化此矢量,使数字包含 Excel-like 1,000 分隔符,如下面的示例:
100.00 1,000.00
1,000,000.00
1,000,000.56
24,564,000,000.56
尝试过的方法
我正在考虑使用 gsubfn
和一个可以传递数字的原型对象。然后也许用 3 位数字创建另一个原型对象并替换。如以下代码所示:
gsubfn(pattern = "[0-9][0-9][0-9]", replacement = ~paste0(x, ','),
x = as.character(vec))
这仅部分有效,因为逗号被插入:
"150,.22 - 170,.33"
这显然是错误的。我还必须将字符向量转换为因子。因此,我的问题归结为两个要素:
- 如何解决逗号问题?
- 如何保持因子的原始结构? - 我需要一个因子向量以与原始向量相同的方式排序,但正确位置的逗号。
使用基于正前瞻的正则表达式...
content <- c("0 - 100", "0 - 100", "0 - 100", "0 - 100",
"1000 - 2000","1000 - 2000", "1000 - 2000", "1000 - 2000",
"7000 - 10000", "7000 - 10000", "7000 - 10000", "7000 - 10000",
"7000 - 10000", "1000000 - 22000000", "1000000 - 22000000",
"1000000 - 22000000")
gsub("(\d)(?=(?:\d{3})+\b)", "\1,", content, perl=T)
# [1] "0 - 100" "0 - 100" "0 - 100"
# [4] "0 - 100" "1,000 - 2,000" "1,000 - 2,000"
# [7] "1,000 - 2,000" "1,000 - 2,000" "7,000 - 10,000"
# [10] "7,000 - 10,000" "7,000 - 10,000" "7,000 - 10,000"
# [13] "7,000 - 10,000" "1,000,000 - 22,000,000" "1,000,000 - 22,000,000"
# [16] "1,000,000 - 22,000,000"
也许你可以使用 formatC
:
sapply(
X = lapply(
X = strsplit(x = content, split = " - "),
FUN = function(x) {
formatC(x = as.numeric(x), format = "f", flag = "#", big.mark = ",",
decimal.mark = ".", digits = 2, drop0trailing = FALSE)
}
),
FUN = paste, collapse = " - "
)
# [1] "0.00 - 100.00" "0.00 - 100.00" "0.00 - 100.00"
# [4] "0.00 - 100.00" "150.22 - 170.33" "1,000.00 - 2,000.00"
# [7] "1,000.00 - 2,000.00" "1,000.00 - 2,000.00" "1,000.00 - 2,000.00"
# [10] "7,000.00 - 10,000.00" "7,000.00 - 10,000.00" "7,000.00 - 10,000.00"
# [13] "7,000.00 - 10,000.00" "7,000.00 - 10,000.00" "1,000,000.00 - 22,000,000.00"
# [16] "1,000,000.00 - 22,000,000.00" "1,000,000.00 - 22,000,000.00" "44,000,000.00 - 66,000,000.90"
仅在 levels
上运行 似乎可以保持您的精度水平,而不是将您的向量转换为 character
向量并且效率更高,因为它正在减少您操作的数据的大小仅对唯一值(而不是整个向量)
levels(vec) <- sapply(strsplit(levels(vec), " - "),
function(x) paste(prettyNum(x,
big.mark = ",",
preserve.width = "none"),
collapse = " - "))
vec
# [1] 0 - 100 0 - 100 0 - 100 0 - 100 150.22 - 170.33
# [6] 1,000 - 2,000 1,000 - 2,000 1,000 - 2,000 1,000 - 2,000 7,000 - 10,000
# [11] 7,000 - 10,000 7,000 - 10,000 7,000 - 10,000 7,000 - 10,000 1,000,000 - 22,000,000
# [16] 1,000,000 - 22,000,000 1,000,000 - 22,000,000 44,000,000 - 66,000,000.8989898989
# Levels: 0 - 100 150.22 - 170.33 1,000 - 2,000 7,000 - 10,000 1,000,000 - 22,000,000 44,000,000 - 66,000,000.8989898989