格式化带有百万 (M) 和十亿 (B) 后缀的数字
Format numbers with million (M) and billion (B) suffixes
我有很多数字,例如货币或美元:
1 6,000,000
2 75,000,400
3 743,450,000
4 340,000
5 4,300,000
我想使用后缀对它们进行格式化,例如 M
(百万)和 B
(十亿):
1 6.0 M
2 75.0 M
3 743.5 M
4 0.3 M
5 4.3 M
如果您以此数字向量开头 x
,
x <- c(6e+06, 75000400, 743450000, 340000, 4300000)
您可以执行以下操作。
paste(format(round(x / 1e6, 1), trim = TRUE), "M")
# [1] "6.0 M" "75.0 M" "743.5 M" "0.3 M" "4.3 M"
如果您不关心尾随零,只需删除 format()
调用即可。
paste(round(x / 1e6, 1), "M")
# [1] "6 M" "75 M" "743.5 M" "0.3 M" "4.3 M"
或者,您可以使用 print 方法分配一个 S3 class,并在下方保留 y
作为数字。这里我使用 paste0()
使结果更清晰。
print.million <- function(x, quote = FALSE, ...) {
x <- paste0(round(x / 1e6, 1), "M")
NextMethod(x, quote = quote, ...)
}
## assign the 'million' class to 'x'
class(x) <- "million"
x
# [1] 6M 75M 743.5M 0.3M 4.3M
x[]
# [1] 6000000 75000400 743450000 340000 4300000
你也可以为数十亿和数万亿做同样的事情。有关如何将其放入数据框中的信息,请参阅 ,因为您需要 format()
和 as.data.frame()
方法。
另一个选项,以数字(而不是字符)开始,适用于数百万和数十亿(及以下)。您可以将更多参数传递给 formatC
以自定义输出,并在需要时扩展到万亿。
m_b_format = function(x) {
b.index = x >= 1e9
m.index = x >= 1e5 & x < 1e9
output = formatC(x, format = "d", big.mark = ",")
output[b.index] = paste(formatC(x[b.index] / 1e9, digits = 1, format = "f"), "B")
output[m.index] = paste(formatC(x[m.index] / 1e6, digits = 1, format = "f"), "M")
return(output)
}
your_x = c(6e6, 75e6 + 400, 743450000, 340000, 43e6)
> m_b_format(your_x)
[1] "6.0 M" "75.0 M" "743.5 M" "0.3 M" "43.0 M"
big_x = c(123, 500, 999, 1050, 9000, 49000, 105400, 998000,
1.5e6, 2e7, 313402182, 453123634432)
> m_b_format(big_x)
[1] "123" "500" "999" "1,050" "9,000" "49,000"
[7] "0.1 M" "1.0 M" "1.5 M" "20.0 M" "313.4 M" "453.1 B"
显然,您首先需要去掉格式化数字中的逗号,gsub("\,", ...)
是解决之道。这使用 findInterval
到 select 适当的后缀来标记并确定更紧凑显示的分母。如果想要低于 1.0 或高于 1 万亿,可以轻松地向任一方向扩展:
comprss <- function(tx) {
div <- findInterval(as.numeric(gsub("\,", "", tx)),
c(0, 1e3, 1e6, 1e9, 1e12) ) # modify this if negative numbers are possible
paste(round( as.numeric(gsub("\,","",tx))/10^(3*(div-1)), 2),
c("","K","M","B","T")[div] )}
如果输入的是数字,则无需删除 as.numeric
或 gsub
。诚然,这是多余的,但会成功。这是 Gregor 示例的结果:
> comprss (big_x)
[1] "123 " "500 " "999 " "1.05 K" "9 K"
[6] "49 K" "105.4 K" "998 K" "1.5 M" "20 M"
[11] "313.4 M" "453.12 B"
并使用原始输入(如果使用 read.table
、read.csv
输入或使用 data.frame
创建,则可能是因子变量)
comprss (dat$V2)
[1] "6 M" "75 M" "743.45 M" "340 K" "4.3 M"
当然,可以使用 quotes=FALSE
的显式 print
命令或使用 cat
.
不带引号打印这些内容
从其他答案中借用并添加到它们中,主要目的是为 ggplot2 轴生成漂亮的标签。是的,只有正值(负值将保留原样),因为通常我只希望这些后缀用于正数。容易扩展到负数。
# Format numbers with suffixes K, M, B, T and optional rounding. Vectorized
# Main purpose: pretty formatting axes for plots produced by ggplot2
#
# Usage in ggplot2: scale_x_continuous(labels = suffix_formatter)
suffix_formatter <- function(x, digits = NULL)
{
intl <- c(1e3, 1e6, 1e9, 1e12);
suffixes <- c('K', 'M', 'B', 'T');
i <- findInterval(x, intl);
result <- character(length(x));
# Note: for ggplot2 the last label element of x is NA, so we need to handle it
ind_format <- !is.na(x) & i > 0;
# Format only the elements that need to be formatted
# with suffixes and possible rounding
result[ind_format] <- paste0(
formatC(x[ind_format]/intl[i[ind_format]], format = "f", digits = digits)
,suffixes[i[ind_format]]
);
# And leave the rest with no changes
result[!ind_format] <- as.character(x[!ind_format]);
return(invisible(result));
}
和用法示例。
x <- seq(1:10);
d <- data.frame(x = x, y = 10^x);
ggplot(aes(x=x, y=y), data = d) + geom_line() + scale_y_log10()
without suffix formatter
ggplot(aes(x=x, y=y), data = d) + geom_line() + scale_y_log10(labels = suffix_formatter)
with suffix formatter
我重写 @42- 函数以容纳 % 数字,像这样
compress <- function(tx) {
tx <- as.numeric(gsub("\,", "", tx))
int <- c(1e-2, 1, 1e3, 1e6, 1e9, 1e12)
div <- findInterval(tx, int)
paste(round( tx/int[div], 2), c("%","", "K","M","B","T")[div] )
}
>tx
total_reads total_bases q20_rate q30_rate gc_content
3.504660e+05 1.051398e+08 6.648160e-01 4.810370e-01 5.111660e-01
> compress(tx)
[1] "350.47 K" "105.14 M" "66.48 %" "48.1 %" "51.12 %"
这可能对类似问题有用
与@Alex Poklonskiy 类似,我需要一个图表格式化程序。但我也需要一个支持负数的版本。这是他调整后的功能(虽然我不是R编程专家):
number_format <- function(x, digits = NULL)
{
intl <- c(1e3, 1e6, 1e9, 1e12)
suffixes <- c(' K', ' M', ' B', ' T')
i <- findInterval(x, intl)
i_neg <- findInterval(-x, intl)
result <- character(length(x))
# Note: for ggplot2 the last label element of x is NA, so we need to handle it
ind_format <- !is.na(x) & i > 0
neg_format <- !is.na(x) & i_neg > 0
# Format only the elements that need to be formatted
# with suffixes and possible rounding
result[ind_format] <- paste0(
formatC(x[ind_format] / intl[i[ind_format]], format = "f", digits = digits),
suffixes[i[ind_format]]
)
# Format negative numbers
result[neg_format] <- paste0(
formatC(x[neg_format] / intl[i_neg[neg_format]], format = "f", digits = digits),
suffixes[i_neg[neg_format]]
)
# To the rest only apply rounding
result[!ind_format & !neg_format] <- as.character(
formatC(x[!ind_format & !neg_format], format = "f", digits = digits)
)
return(invisible(result))
}
我还调整了 digits
参数用于舍入没有后缀的值(例如 1.23434546
)
用法示例:
> print( number_format(c(1.2325353, 500, 132364584563, 5.67e+9, -2.45e+7, -1.2333, -55)) )
[1] "1.2325" "500.0000" "132.3646 B" "5.6700 B" "-24.5000 M" "-1.2333" "-55.0000"
> print( number_format(c(1.2325353, 500, 132364584563, 5.67e+9, -2.45e+7, -1.2333, -55), digits = 2) )
[1] "1.23" "500.00" "132.36 B" "5.67 B" "-24.50 M" "-1.23" "-55.00"
dplyr 的 case_when
现在为此提供了更友好的解决方案 - 例如:
format_bignum = function(n){
case_when(
n >= 1e12 ~ paste(round(n/1e12), 'Tn'),
n >= 1e9 ~ paste(round(n/1e9), 'Bn'),
n >= 1e6 ~ paste(round(n/1e6), 'M'),
n >= 1e3 ~ paste(round(n/1e3), 'K'),
TRUE ~ as.character(n))
}
或者,您可以将 case_when
位嵌入到 mutate
调用中。
scales
软件包的最新版本包括打印可读标签的功能。如果您使用的是 ggplot 或 tidyverse,scales
可能已经安装。不过,您可能必须更新软件包。
在这种情况下,可以使用label_number_si
:
> library(scales)
> inp <- c(6000000, 75000400, 743450000, 340000, 4300000)
> label_number_si(accuracy=0.1)(inp)
[1] "6.0M" "75.0M" "743.4M" "340.0K" "4.3M"
scales
包的另一个选择是使用 unit_format
:
inp <- c(6000000, 75000400, 743450000, 340000, 4300000)
scales::unit_format(unit = 'M', scale = 1e-6)(inp)
# "6.0 M" "75.0 M" "743.4 M" "0.3 M" "4.3 M"
我有很多数字,例如货币或美元:
1 6,000,000
2 75,000,400
3 743,450,000
4 340,000
5 4,300,000
我想使用后缀对它们进行格式化,例如 M
(百万)和 B
(十亿):
1 6.0 M
2 75.0 M
3 743.5 M
4 0.3 M
5 4.3 M
如果您以此数字向量开头 x
,
x <- c(6e+06, 75000400, 743450000, 340000, 4300000)
您可以执行以下操作。
paste(format(round(x / 1e6, 1), trim = TRUE), "M")
# [1] "6.0 M" "75.0 M" "743.5 M" "0.3 M" "4.3 M"
如果您不关心尾随零,只需删除 format()
调用即可。
paste(round(x / 1e6, 1), "M")
# [1] "6 M" "75 M" "743.5 M" "0.3 M" "4.3 M"
或者,您可以使用 print 方法分配一个 S3 class,并在下方保留 y
作为数字。这里我使用 paste0()
使结果更清晰。
print.million <- function(x, quote = FALSE, ...) {
x <- paste0(round(x / 1e6, 1), "M")
NextMethod(x, quote = quote, ...)
}
## assign the 'million' class to 'x'
class(x) <- "million"
x
# [1] 6M 75M 743.5M 0.3M 4.3M
x[]
# [1] 6000000 75000400 743450000 340000 4300000
你也可以为数十亿和数万亿做同样的事情。有关如何将其放入数据框中的信息,请参阅 format()
和 as.data.frame()
方法。
另一个选项,以数字(而不是字符)开始,适用于数百万和数十亿(及以下)。您可以将更多参数传递给 formatC
以自定义输出,并在需要时扩展到万亿。
m_b_format = function(x) {
b.index = x >= 1e9
m.index = x >= 1e5 & x < 1e9
output = formatC(x, format = "d", big.mark = ",")
output[b.index] = paste(formatC(x[b.index] / 1e9, digits = 1, format = "f"), "B")
output[m.index] = paste(formatC(x[m.index] / 1e6, digits = 1, format = "f"), "M")
return(output)
}
your_x = c(6e6, 75e6 + 400, 743450000, 340000, 43e6)
> m_b_format(your_x)
[1] "6.0 M" "75.0 M" "743.5 M" "0.3 M" "43.0 M"
big_x = c(123, 500, 999, 1050, 9000, 49000, 105400, 998000,
1.5e6, 2e7, 313402182, 453123634432)
> m_b_format(big_x)
[1] "123" "500" "999" "1,050" "9,000" "49,000"
[7] "0.1 M" "1.0 M" "1.5 M" "20.0 M" "313.4 M" "453.1 B"
显然,您首先需要去掉格式化数字中的逗号,gsub("\,", ...)
是解决之道。这使用 findInterval
到 select 适当的后缀来标记并确定更紧凑显示的分母。如果想要低于 1.0 或高于 1 万亿,可以轻松地向任一方向扩展:
comprss <- function(tx) {
div <- findInterval(as.numeric(gsub("\,", "", tx)),
c(0, 1e3, 1e6, 1e9, 1e12) ) # modify this if negative numbers are possible
paste(round( as.numeric(gsub("\,","",tx))/10^(3*(div-1)), 2),
c("","K","M","B","T")[div] )}
如果输入的是数字,则无需删除 as.numeric
或 gsub
。诚然,这是多余的,但会成功。这是 Gregor 示例的结果:
> comprss (big_x)
[1] "123 " "500 " "999 " "1.05 K" "9 K"
[6] "49 K" "105.4 K" "998 K" "1.5 M" "20 M"
[11] "313.4 M" "453.12 B"
并使用原始输入(如果使用 read.table
、read.csv
输入或使用 data.frame
创建,则可能是因子变量)
comprss (dat$V2)
[1] "6 M" "75 M" "743.45 M" "340 K" "4.3 M"
当然,可以使用 quotes=FALSE
的显式 print
命令或使用 cat
.
从其他答案中借用并添加到它们中,主要目的是为 ggplot2 轴生成漂亮的标签。是的,只有正值(负值将保留原样),因为通常我只希望这些后缀用于正数。容易扩展到负数。
# Format numbers with suffixes K, M, B, T and optional rounding. Vectorized
# Main purpose: pretty formatting axes for plots produced by ggplot2
#
# Usage in ggplot2: scale_x_continuous(labels = suffix_formatter)
suffix_formatter <- function(x, digits = NULL)
{
intl <- c(1e3, 1e6, 1e9, 1e12);
suffixes <- c('K', 'M', 'B', 'T');
i <- findInterval(x, intl);
result <- character(length(x));
# Note: for ggplot2 the last label element of x is NA, so we need to handle it
ind_format <- !is.na(x) & i > 0;
# Format only the elements that need to be formatted
# with suffixes and possible rounding
result[ind_format] <- paste0(
formatC(x[ind_format]/intl[i[ind_format]], format = "f", digits = digits)
,suffixes[i[ind_format]]
);
# And leave the rest with no changes
result[!ind_format] <- as.character(x[!ind_format]);
return(invisible(result));
}
和用法示例。
x <- seq(1:10);
d <- data.frame(x = x, y = 10^x);
ggplot(aes(x=x, y=y), data = d) + geom_line() + scale_y_log10()
without suffix formatter
ggplot(aes(x=x, y=y), data = d) + geom_line() + scale_y_log10(labels = suffix_formatter)
with suffix formatter
我重写 @42- 函数以容纳 % 数字,像这样
compress <- function(tx) {
tx <- as.numeric(gsub("\,", "", tx))
int <- c(1e-2, 1, 1e3, 1e6, 1e9, 1e12)
div <- findInterval(tx, int)
paste(round( tx/int[div], 2), c("%","", "K","M","B","T")[div] )
}
>tx
total_reads total_bases q20_rate q30_rate gc_content
3.504660e+05 1.051398e+08 6.648160e-01 4.810370e-01 5.111660e-01
> compress(tx)
[1] "350.47 K" "105.14 M" "66.48 %" "48.1 %" "51.12 %"
这可能对类似问题有用
与@Alex Poklonskiy 类似,我需要一个图表格式化程序。但我也需要一个支持负数的版本。这是他调整后的功能(虽然我不是R编程专家):
number_format <- function(x, digits = NULL)
{
intl <- c(1e3, 1e6, 1e9, 1e12)
suffixes <- c(' K', ' M', ' B', ' T')
i <- findInterval(x, intl)
i_neg <- findInterval(-x, intl)
result <- character(length(x))
# Note: for ggplot2 the last label element of x is NA, so we need to handle it
ind_format <- !is.na(x) & i > 0
neg_format <- !is.na(x) & i_neg > 0
# Format only the elements that need to be formatted
# with suffixes and possible rounding
result[ind_format] <- paste0(
formatC(x[ind_format] / intl[i[ind_format]], format = "f", digits = digits),
suffixes[i[ind_format]]
)
# Format negative numbers
result[neg_format] <- paste0(
formatC(x[neg_format] / intl[i_neg[neg_format]], format = "f", digits = digits),
suffixes[i_neg[neg_format]]
)
# To the rest only apply rounding
result[!ind_format & !neg_format] <- as.character(
formatC(x[!ind_format & !neg_format], format = "f", digits = digits)
)
return(invisible(result))
}
我还调整了 digits
参数用于舍入没有后缀的值(例如 1.23434546
)
用法示例:
> print( number_format(c(1.2325353, 500, 132364584563, 5.67e+9, -2.45e+7, -1.2333, -55)) )
[1] "1.2325" "500.0000" "132.3646 B" "5.6700 B" "-24.5000 M" "-1.2333" "-55.0000"
> print( number_format(c(1.2325353, 500, 132364584563, 5.67e+9, -2.45e+7, -1.2333, -55), digits = 2) )
[1] "1.23" "500.00" "132.36 B" "5.67 B" "-24.50 M" "-1.23" "-55.00"
dplyr 的 case_when
现在为此提供了更友好的解决方案 - 例如:
format_bignum = function(n){
case_when(
n >= 1e12 ~ paste(round(n/1e12), 'Tn'),
n >= 1e9 ~ paste(round(n/1e9), 'Bn'),
n >= 1e6 ~ paste(round(n/1e6), 'M'),
n >= 1e3 ~ paste(round(n/1e3), 'K'),
TRUE ~ as.character(n))
}
或者,您可以将 case_when
位嵌入到 mutate
调用中。
scales
软件包的最新版本包括打印可读标签的功能。如果您使用的是 ggplot 或 tidyverse,scales
可能已经安装。不过,您可能必须更新软件包。
在这种情况下,可以使用label_number_si
:
> library(scales)
> inp <- c(6000000, 75000400, 743450000, 340000, 4300000)
> label_number_si(accuracy=0.1)(inp)
[1] "6.0M" "75.0M" "743.4M" "340.0K" "4.3M"
scales
包的另一个选择是使用 unit_format
:
inp <- c(6000000, 75000400, 743450000, 340000, 4300000)
scales::unit_format(unit = 'M', scale = 1e-6)(inp)
# "6.0 M" "75.0 M" "743.4 M" "0.3 M" "4.3 M"