R - 将列表列表转换为数据框时如何保留数据类型和标题

R - How to preserve data types and titles when converting list of lists to data frame

我正在处理 receipts 个列表。 receipts 中的每个条目都包含一个表示收据的列表。收据的结构是一致的,看起来像这样。

> str(receipts[[1]])
List of 6
 $ receipt_type  : chr "SALESPERSON_ACTIVITY"
 $ timestamp     : POSIXct[1:1], format: "2020-01-01 09:29:00"
 $ receipt_number: int 1195
 $ POS           : int 1
 $ KNo           : int 12
 $ shift_number  : int 9

receipt_number 也可能包含 NA 个值。

我想将此列表转换为具有相应列(receipt_typetimestampreceipt_number 等)的数据框。目前我正在使用这个

receipts_as_df <- as.data.frame(matrix(unlist(receipts), byrow=TRUE, ncol=length(receipts[[1]])))

这会将数据放入数据框中。可悲的是,unlist 删除了有关数据类型的所有信息(我认为一切都被强制为 character)。此外,列名也会丢失。因此,我有一个包含所有数据的数据框,但类型和 column-names 丢失了。

我知道我可以手动重命名列和数据类型,但想知道是否有更舒适的方法来处理这种情况。

示例:当前数据框如下所示

> head(receipts_as_df)
                        V1         V2   V3 V4 V5 V6
1     SALESPERSON_ACTIVITY 1577867340 1195  1 12  9
2 CASH_REGISTER_MONITORING 1577867340 <NA>  1 12  9
3      PAYOUT_NOTIFICATION 1577867340 1196  1 12  9
4             TSE_ACTIVITY 1577869080 <NA>  1 12  9
5   BUSINESS_MODE_ACTIVITY 1577869080 <NA>  1 12  9
6             ZERO_RECEIPT 1577869140 1197  1 12  9

基础 R,需要做更多的工作:

receipts <- replicate(3, list(
  receipt_type   = "SALESPERSON_ACTIVITY",
  timestamp      = as.POSIXct("2020-01-01 09:29:00", tz = "UTC"),
  receipt_number = 1195,
  POS            = 1,
  KNo            = 12,
  shift_number   = 9
), simplify = FALSE)

out <- do.call(rbind.data.frame, c(receipts, list(stringsAsFactors = FALSE)))
out
#            receipt_type  timestamp receipt_number POS KNo shift_number
# 2  SALESPERSON_ACTIVITY 1577870940           1195   1  12            9
# 21 SALESPERSON_ACTIVITY 1577870940           1195   1  12            9
# 3  SALESPERSON_ACTIVITY 1577870940           1195   1  12            9
str(out)
# 'data.frame': 3 obs. of  6 variables:
#  $ receipt_type  : chr  "SALESPERSON_ACTIVITY" "SALESPERSON_ACTIVITY" "SALESPERSON_ACTIVITY"
#  $ timestamp     : num  1.58e+09 1.58e+09 1.58e+09
#  $ receipt_number: num  1195 1195 1195
#  $ POS           : num  1 1 1
#  $ KNo           : num  12 12 12
#  $ shift_number  : num  9 9 9
out$timestamp <- as.POSIXct(out$timestamp, origin = "1970-01-01")
out
#            receipt_type           timestamp receipt_number POS KNo shift_number
# 2  SALESPERSON_ACTIVITY 2020-01-01 01:29:00           1195   1  12            9
# 21 SALESPERSON_ACTIVITY 2020-01-01 01:29:00           1195   1  12            9
# 3  SALESPERSON_ACTIVITY 2020-01-01 01:29:00           1195   1  12            9

dplyrdata.table 无需额外工作:

dplyr::bind_rows(receipts)
# # A tibble: 3 x 6
#   receipt_type         timestamp           receipt_number   POS   KNo shift_number
#   <chr>                <dttm>                       <dbl> <dbl> <dbl>        <dbl>
# 1 SALESPERSON_ACTIVITY 2020-01-01 09:29:00           1195     1    12            9
# 2 SALESPERSON_ACTIVITY 2020-01-01 09:29:00           1195     1    12            9
# 3 SALESPERSON_ACTIVITY 2020-01-01 09:29:00           1195     1    12            9
data.table::rbindlist(receipts)
#            receipt_type           timestamp receipt_number POS KNo shift_number
# 1: SALESPERSON_ACTIVITY 2020-01-01 09:29:00           1195   1  12            9
# 2: SALESPERSON_ACTIVITY 2020-01-01 09:29:00           1195   1  12            9
# 3: SALESPERSON_ACTIVITY 2020-01-01 09:29:00           1195   1  12            9