使用 readxlsb 和 cellranger::cell_limits() 获取所有小数位
get all decimal places using readxlsb and cellranger::cell_limits()
我正在使用 readxlsb and cell_limits()
, from cellranger 从一系列 Excel 二进制工作簿 (.xlsb) 中导入一些杂乱数据。我正在努力获得足够的(所有)小数位。
这可以用 readxlsb 包提供的数据集来说明。在示例数据中,TestBook.xlsb
,在 sheet Sheet3.1.1
中,单元格 E5
。此单元格包含 e^1,具有一系列基本小数位 (2,71828182845905),但仅导入六位小数 (2.718282)。
在我现实生活中的数据中,我在很多顶行中都有文本,这些文本将数据转换为章程,例如下面的 column.4
,E5
所在的位置,原始数据为 ~16小数位。有没有一种方法可以调整代码(下面)以获取所有小数位而不会丢失 cellranger::cell_limits()
?
# install.packages(c("readxlsb", "tidyverse"), dependencies = TRUE)
library(readxlsb); library(tidyverse)
as_tibble(
read_xlsb(path = system.file("extdata", "TestBook.xlsb", package = "readxlsb"),
sheet = "Sheet3.1.1",range = cellranger::cell_limits())
)
# A tibble: 5 x 7
Some column.2 column.3 column.4 column.5 column.6 column.7
<date> <chr> <chr> <chr> <chr> <chr> <dbl>
1 NA "data" "" "2.718282" "" "" 3.14
2 NA "" "in" "" "" "" NA
3 2021-05-21 "" "" "a" "" "" NA
4 NA "" "" "" "third" "" 43972
5 NA "" "" "" "" "sheet" NA
一个简单的解决方案可能是在导入时强制列类型加倍,即 col_types = c("double")
。
首先调整小标题中显示的数字,
options(pillar.sigfig = 20)
现在,您将获得包含 Excel.
中所有数字的单元格 E5
library(tidyverse); library(readxlsb)
as_tibble(
read_xlsb(path = system.file("extdata", "TestBook.xlsb", package = "readxlsb"),
sheet = "Sheet3.1.1",col_types = c("double"),
range = cellranger::cell_limits())
)
# A tibble: 5 x 7
Some column.2 column.3 column.4 column.5 column.6 column.7
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 NA NA NA 2.718281828459045e0 NA NA 3.141592653589793e0
2 NA NA NA NA NA NA NA
3 44337 NA NA NA NA NA NA
4 NA NA NA NA NA NA 4.3972 e4
5 NA NA NA NA
根据read_xlsb
的vignette默认设置为猜测的col类型,从底层数据根据:
When implying types from the underlying spreadsheet data, the
resultant type is the regarded as the ‘least fragile’.
Effectively the order is logical – datetime – integer – double –
string
If 99 rows are of type ‘integer’ and 1 row is of type ‘double’, then all cells are regarded as ‘double’ in that column.
If 99 rows are of type ‘date’ and 1 row is of type ‘string’, then all cells are promoted to ‘string’
可能的调整(绕过自动猜测的列类型),cave:将所有内容都视为字符
library(tidyverse)
library(readxlsb)
# read everything as character
as_tibble(
read_xlsb(path = system.file("extdata", "TestBook.xlsb", package = "readxlsb"),
sheet = "Sheet3.1.1",col_types = c("character"), cellranger::cell_limits())
) ->test.char
# read everything as double
as_tibble(
read_xlsb(path = system.file("extdata", "TestBook.xlsb", package = "readxlsb"),
sheet = "Sheet3.1.1",col_types = c("double"), cellranger::cell_limits())
) ->test.dbl
# make a function that checks if a string is a date
is.date <- function(x) inherits(x, 'Date')
# combine character and double, has to be adjusted according to your real data
cbind(test.char %>%
gather(key.character,value=character),
test.dbl %>%
gather(key=key.numeric,value=numeric)) %>%
tibble() %>%
rowwise() %>%
mutate(numeric=case_when(is.date(try(as.Date(character),silent=TRUE))==TRUE ~ NA_real_, TRUE ~ numeric)) %>% #set double to NA if character is date
mutate(character=case_when(!is.na(numeric)~as.character(numeric), TRUE ~ character)) %>% #keep all remaining double
select(key.character,character) %>%
pivot_wider(names_from = key.character, values_from = character) %>%
unnest(cols = c(Some, column.2, column.3, column.4, column.5, column.6, column.7))
#> # A tibble: 5 x 7
#> Some column.2 column.3 column.4 column.5 column.6 column.7
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 "" "data" "" "2.71828182845~ "" "" "3.14159265358~
#> 2 "" "" "in" "" "" "" ""
#> 3 "2019-08-~ "" "" "a" "" "" ""
#> 4 "" "" "" "" "third" "" "2018-08-25"
#> 5 "" "" "" "" "" "sheet" ""
由 reprex package (v2.0.0)
于 2021-05-24 创建
我正在使用 readxlsb and cell_limits()
, from cellranger 从一系列 Excel 二进制工作簿 (.xlsb) 中导入一些杂乱数据。我正在努力获得足够的(所有)小数位。
这可以用 readxlsb 包提供的数据集来说明。在示例数据中,TestBook.xlsb
,在 sheet Sheet3.1.1
中,单元格 E5
。此单元格包含 e^1,具有一系列基本小数位 (2,71828182845905),但仅导入六位小数 (2.718282)。
在我现实生活中的数据中,我在很多顶行中都有文本,这些文本将数据转换为章程,例如下面的 column.4
,E5
所在的位置,原始数据为 ~16小数位。有没有一种方法可以调整代码(下面)以获取所有小数位而不会丢失 cellranger::cell_limits()
?
# install.packages(c("readxlsb", "tidyverse"), dependencies = TRUE)
library(readxlsb); library(tidyverse)
as_tibble(
read_xlsb(path = system.file("extdata", "TestBook.xlsb", package = "readxlsb"),
sheet = "Sheet3.1.1",range = cellranger::cell_limits())
)
# A tibble: 5 x 7
Some column.2 column.3 column.4 column.5 column.6 column.7
<date> <chr> <chr> <chr> <chr> <chr> <dbl>
1 NA "data" "" "2.718282" "" "" 3.14
2 NA "" "in" "" "" "" NA
3 2021-05-21 "" "" "a" "" "" NA
4 NA "" "" "" "third" "" 43972
5 NA "" "" "" "" "sheet" NA
一个简单的解决方案可能是在导入时强制列类型加倍,即 col_types = c("double")
。
首先调整小标题中显示的数字,
options(pillar.sigfig = 20)
现在,您将获得包含 Excel.
中所有数字的单元格E5
library(tidyverse); library(readxlsb)
as_tibble(
read_xlsb(path = system.file("extdata", "TestBook.xlsb", package = "readxlsb"),
sheet = "Sheet3.1.1",col_types = c("double"),
range = cellranger::cell_limits())
)
# A tibble: 5 x 7
Some column.2 column.3 column.4 column.5 column.6 column.7
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 NA NA NA 2.718281828459045e0 NA NA 3.141592653589793e0
2 NA NA NA NA NA NA NA
3 44337 NA NA NA NA NA NA
4 NA NA NA NA NA NA 4.3972 e4
5 NA NA NA NA
根据read_xlsb
的vignette默认设置为猜测的col类型,从底层数据根据:
When implying types from the underlying spreadsheet data, the resultant type is the regarded as the ‘least fragile’.
Effectively the order is logical – datetime – integer – double – string
If 99 rows are of type ‘integer’ and 1 row is of type ‘double’, then all cells are regarded as ‘double’ in that column. If 99 rows are of type ‘date’ and 1 row is of type ‘string’, then all cells are promoted to ‘string’
可能的调整(绕过自动猜测的列类型),cave:将所有内容都视为字符
library(tidyverse)
library(readxlsb)
# read everything as character
as_tibble(
read_xlsb(path = system.file("extdata", "TestBook.xlsb", package = "readxlsb"),
sheet = "Sheet3.1.1",col_types = c("character"), cellranger::cell_limits())
) ->test.char
# read everything as double
as_tibble(
read_xlsb(path = system.file("extdata", "TestBook.xlsb", package = "readxlsb"),
sheet = "Sheet3.1.1",col_types = c("double"), cellranger::cell_limits())
) ->test.dbl
# make a function that checks if a string is a date
is.date <- function(x) inherits(x, 'Date')
# combine character and double, has to be adjusted according to your real data
cbind(test.char %>%
gather(key.character,value=character),
test.dbl %>%
gather(key=key.numeric,value=numeric)) %>%
tibble() %>%
rowwise() %>%
mutate(numeric=case_when(is.date(try(as.Date(character),silent=TRUE))==TRUE ~ NA_real_, TRUE ~ numeric)) %>% #set double to NA if character is date
mutate(character=case_when(!is.na(numeric)~as.character(numeric), TRUE ~ character)) %>% #keep all remaining double
select(key.character,character) %>%
pivot_wider(names_from = key.character, values_from = character) %>%
unnest(cols = c(Some, column.2, column.3, column.4, column.5, column.6, column.7))
#> # A tibble: 5 x 7
#> Some column.2 column.3 column.4 column.5 column.6 column.7
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 "" "data" "" "2.71828182845~ "" "" "3.14159265358~
#> 2 "" "" "in" "" "" "" ""
#> 3 "2019-08-~ "" "" "a" "" "" ""
#> 4 "" "" "" "" "third" "" "2018-08-25"
#> 5 "" "" "" "" "" "sheet" ""
由 reprex package (v2.0.0)
于 2021-05-24 创建