导入 R 后添加前导零

Question

我有一个包含 Reference 列的数据框。这是一个 10 位数字，可以从零开始。导入 R 时，前导零消失，我想重新添加。

我尝试过使用 sprintf 和 formatC，但我遇到的问题各不相同。

DF=data.frame(Reference=c(102030405,2567894562,235648759), Data=c(10,20,30))

我得到的输出如下：

> sprintf('%010d', DF$Reference)
[1] "0102030405" "        NA" "0235648759"
Warning message:
In sprintf("%010d", DF$Reference) : NAs introduced by coercion
> formatC(DF$Reference, width=10, flag="0")
[1] "001.02e+08" "02.568e+09" "02.356e+08"

当数字已经有 10 位时，第一个输出给出 NA，第二个输出以标准格式存储结果。

我需要的是：

[1]  0102030405 2567894562  0235648759

Answer 1

library(stringi)
DF = data.frame(Reference = c(102030405,2567894562,235648759), Data = c(10,20,30))
DF$Reference = stri_pad_left(DF$Reference, 10, "0")
DF
#    Reference Data
# 1 0102030405   10
# 2 2567894562   20
# 3 0235648759   30

替代解决方案：Adding leading zeros using R。

When importing into R, the leading zeros disappear, which I would like to add back in.

以字符形式读取列可以完全避免这个问题。您可以将 readr::read_csv() 与 col_types 参数一起使用。

Answer 2

格式C

您可以使用

formatC(DF$Reference, digits = 0,  width = 10, format ="f", flag="0")
# [1] "0102030405" "2567894562" "0235648759"

sprintf

在sprintf中使用d意味着你的值是整数（或者它们必须用as.integer()转换）。 help(integer) 解释说：

"the range of representable integers is restricted to about +/-2*10^9: doubles can hold much larger integers exactly."

这就是为什么 as.integer(2567894562) returns NA。

另一种解决方法是在 sprintf:

中使用字符格式 s

sprintf('%010s',DF$Reference)
# [1] " 102030405" "2567894562" " 235648759"

但这给出了空格而不是前导零。 gsub() 可以通过用零替换空格来加回零：

gsub(" ","0",sprintf('%010s',DF$Reference))
# [1] "0102030405" "2567894562" "0235648759"

导入 R 后添加前导零

Adding leading zero once imported into R

r

zero

dataframe

leading-zero

read.csv

格式C

sprintf