根据前导空格的数量将一列分隔成新的列

Separate a column into new columns based on the number of leading spaces

这些报告来自 quickbooks,下载为 Excel 个文件。请注意,左列是基于左间距的嵌套层次结构。

我需要根据左侧前导空格的数量将 描述 列分隔成单独的列。

因为我最近一直在处理财务报告,所以这些报告非常常见并且非常难以处理。有导入此类数据的包或函数吗?

这是可重现输入的示例 dataframe:

df1 <- structure(list(Description = c("asset", " current asset", "   bank acc", 
                                      "    banner", "    clearing",
                                      "   total bank accounts",
                                      " total current assets"),
                 Total = c(NA, NA, NA, 10L, 20L, 30L, 30L)),
            .Names = c("Description", "Total"), 
            class = "data.frame", 
            row.names = c(NA, -7L))

您可以尝试 tidyxlunpivotr 来完成这些 Excel 争论任务。这是文档:

这是一个很好的教程:https://blog.davisvaughan.com/2018/02/16/tidying-excel-cash-flow-spreadsheets-in-r/

我认为真正的问题是:

  • "How do I treat number of leading spaces to indicate nth column?"

如果是这样,那么试试这个例子,代码可以改进,但想法是每个前导 space 表示 nth 列。

# example input, we will have similar input after reading in
# the Excel sheet into R.
df1 <- data.frame(x = c("x1", " x2", " x2", "  x3", "x1", " x2"),
                  y = c(NA,      22,    33,      44,   55,   66),
                  stringsAsFactors = FALSE)

library(dplyr)

cbind(
  bind_rows(
  lapply(df1$x, function(i){
    x <- data.frame(t(strsplit(i, split = " ")[[1]]), stringsAsFactors = FALSE)
    colnames(x) <- paste0("col", 1:ncol(x))
    x
    })
  ),
  df1[, "y", drop = FALSE])

#   col1 col2 col3  y
# 1   x1 <NA> <NA> NA
# 2        x2 <NA> 22
# 3        x2 <NA> 33
# 4             x3 44
# 5   x1 <NA> <NA> 55
# 6        x2 <NA> 66