根据前导空格的数量将一列分隔成新的列
Separate a column into new columns based on the number of leading spaces
这些报告来自 quickbooks,下载为 Excel 个文件。请注意,左列是基于左间距的嵌套层次结构。
我需要根据左侧前导空格的数量将 描述 列分隔成单独的列。
因为我最近一直在处理财务报告,所以这些报告非常常见并且非常难以处理。有导入此类数据的包或函数吗?
这是可重现输入的示例 dataframe:
df1 <- structure(list(Description = c("asset", " current asset", " bank acc",
" banner", " clearing",
" total bank accounts",
" total current assets"),
Total = c(NA, NA, NA, 10L, 20L, 30L, 30L)),
.Names = c("Description", "Total"),
class = "data.frame",
row.names = c(NA, -7L))
您可以尝试 tidyxl
和 unpivotr
来完成这些 Excel 争论任务。这是文档:
这是一个很好的教程:https://blog.davisvaughan.com/2018/02/16/tidying-excel-cash-flow-spreadsheets-in-r/
我认为真正的问题是:
- "How do I treat number of leading spaces to indicate nth column?"
如果是这样,那么试试这个例子,代码可以改进,但想法是每个前导 space 表示 nth 列。
# example input, we will have similar input after reading in
# the Excel sheet into R.
df1 <- data.frame(x = c("x1", " x2", " x2", " x3", "x1", " x2"),
y = c(NA, 22, 33, 44, 55, 66),
stringsAsFactors = FALSE)
library(dplyr)
cbind(
bind_rows(
lapply(df1$x, function(i){
x <- data.frame(t(strsplit(i, split = " ")[[1]]), stringsAsFactors = FALSE)
colnames(x) <- paste0("col", 1:ncol(x))
x
})
),
df1[, "y", drop = FALSE])
# col1 col2 col3 y
# 1 x1 <NA> <NA> NA
# 2 x2 <NA> 22
# 3 x2 <NA> 33
# 4 x3 44
# 5 x1 <NA> <NA> 55
# 6 x2 <NA> 66
这些报告来自 quickbooks,下载为 Excel 个文件。请注意,左列是基于左间距的嵌套层次结构。
我需要根据左侧前导空格的数量将 描述 列分隔成单独的列。
因为我最近一直在处理财务报告,所以这些报告非常常见并且非常难以处理。有导入此类数据的包或函数吗?
这是可重现输入的示例 dataframe:
df1 <- structure(list(Description = c("asset", " current asset", " bank acc",
" banner", " clearing",
" total bank accounts",
" total current assets"),
Total = c(NA, NA, NA, 10L, 20L, 30L, 30L)),
.Names = c("Description", "Total"),
class = "data.frame",
row.names = c(NA, -7L))
您可以尝试 tidyxl
和 unpivotr
来完成这些 Excel 争论任务。这是文档:
这是一个很好的教程:https://blog.davisvaughan.com/2018/02/16/tidying-excel-cash-flow-spreadsheets-in-r/
我认为真正的问题是:
- "How do I treat number of leading spaces to indicate nth column?"
如果是这样,那么试试这个例子,代码可以改进,但想法是每个前导 space 表示 nth 列。
# example input, we will have similar input after reading in
# the Excel sheet into R.
df1 <- data.frame(x = c("x1", " x2", " x2", " x3", "x1", " x2"),
y = c(NA, 22, 33, 44, 55, 66),
stringsAsFactors = FALSE)
library(dplyr)
cbind(
bind_rows(
lapply(df1$x, function(i){
x <- data.frame(t(strsplit(i, split = " ")[[1]]), stringsAsFactors = FALSE)
colnames(x) <- paste0("col", 1:ncol(x))
x
})
),
df1[, "y", drop = FALSE])
# col1 col2 col3 y
# 1 x1 <NA> <NA> NA
# 2 x2 <NA> 22
# 3 x2 <NA> 33
# 4 x3 44
# 5 x1 <NA> <NA> 55
# 6 x2 <NA> 66