R - 将列表列表转换为数据框时如何保留数据类型和标题
R - How to preserve data types and titles when converting list of lists to data frame
我正在处理 receipts
个列表。 receipts
中的每个条目都包含一个表示收据的列表。收据的结构是一致的,看起来像这样。
> str(receipts[[1]])
List of 6
$ receipt_type : chr "SALESPERSON_ACTIVITY"
$ timestamp : POSIXct[1:1], format: "2020-01-01 09:29:00"
$ receipt_number: int 1195
$ POS : int 1
$ KNo : int 12
$ shift_number : int 9
receipt_number
也可能包含 NA
个值。
我想将此列表转换为具有相应列(receipt_type
、timestamp
、receipt_number
等)的数据框。目前我正在使用这个
receipts_as_df <- as.data.frame(matrix(unlist(receipts), byrow=TRUE, ncol=length(receipts[[1]])))
这会将数据放入数据框中。可悲的是,unlist
删除了有关数据类型的所有信息(我认为一切都被强制为 character
)。此外,列名也会丢失。因此,我有一个包含所有数据的数据框,但类型和 column-names 丢失了。
我知道我可以手动重命名列和数据类型,但想知道是否有更舒适的方法来处理这种情况。
示例:当前数据框如下所示
> head(receipts_as_df)
V1 V2 V3 V4 V5 V6
1 SALESPERSON_ACTIVITY 1577867340 1195 1 12 9
2 CASH_REGISTER_MONITORING 1577867340 <NA> 1 12 9
3 PAYOUT_NOTIFICATION 1577867340 1196 1 12 9
4 TSE_ACTIVITY 1577869080 <NA> 1 12 9
5 BUSINESS_MODE_ACTIVITY 1577869080 <NA> 1 12 9
6 ZERO_RECEIPT 1577869140 1197 1 12 9
基础 R,需要做更多的工作:
receipts <- replicate(3, list(
receipt_type = "SALESPERSON_ACTIVITY",
timestamp = as.POSIXct("2020-01-01 09:29:00", tz = "UTC"),
receipt_number = 1195,
POS = 1,
KNo = 12,
shift_number = 9
), simplify = FALSE)
out <- do.call(rbind.data.frame, c(receipts, list(stringsAsFactors = FALSE)))
out
# receipt_type timestamp receipt_number POS KNo shift_number
# 2 SALESPERSON_ACTIVITY 1577870940 1195 1 12 9
# 21 SALESPERSON_ACTIVITY 1577870940 1195 1 12 9
# 3 SALESPERSON_ACTIVITY 1577870940 1195 1 12 9
str(out)
# 'data.frame': 3 obs. of 6 variables:
# $ receipt_type : chr "SALESPERSON_ACTIVITY" "SALESPERSON_ACTIVITY" "SALESPERSON_ACTIVITY"
# $ timestamp : num 1.58e+09 1.58e+09 1.58e+09
# $ receipt_number: num 1195 1195 1195
# $ POS : num 1 1 1
# $ KNo : num 12 12 12
# $ shift_number : num 9 9 9
out$timestamp <- as.POSIXct(out$timestamp, origin = "1970-01-01")
out
# receipt_type timestamp receipt_number POS KNo shift_number
# 2 SALESPERSON_ACTIVITY 2020-01-01 01:29:00 1195 1 12 9
# 21 SALESPERSON_ACTIVITY 2020-01-01 01:29:00 1195 1 12 9
# 3 SALESPERSON_ACTIVITY 2020-01-01 01:29:00 1195 1 12 9
dplyr
和 data.table
无需额外工作:
dplyr::bind_rows(receipts)
# # A tibble: 3 x 6
# receipt_type timestamp receipt_number POS KNo shift_number
# <chr> <dttm> <dbl> <dbl> <dbl> <dbl>
# 1 SALESPERSON_ACTIVITY 2020-01-01 09:29:00 1195 1 12 9
# 2 SALESPERSON_ACTIVITY 2020-01-01 09:29:00 1195 1 12 9
# 3 SALESPERSON_ACTIVITY 2020-01-01 09:29:00 1195 1 12 9
data.table::rbindlist(receipts)
# receipt_type timestamp receipt_number POS KNo shift_number
# 1: SALESPERSON_ACTIVITY 2020-01-01 09:29:00 1195 1 12 9
# 2: SALESPERSON_ACTIVITY 2020-01-01 09:29:00 1195 1 12 9
# 3: SALESPERSON_ACTIVITY 2020-01-01 09:29:00 1195 1 12 9
我正在处理 receipts
个列表。 receipts
中的每个条目都包含一个表示收据的列表。收据的结构是一致的,看起来像这样。
> str(receipts[[1]])
List of 6
$ receipt_type : chr "SALESPERSON_ACTIVITY"
$ timestamp : POSIXct[1:1], format: "2020-01-01 09:29:00"
$ receipt_number: int 1195
$ POS : int 1
$ KNo : int 12
$ shift_number : int 9
receipt_number
也可能包含 NA
个值。
我想将此列表转换为具有相应列(receipt_type
、timestamp
、receipt_number
等)的数据框。目前我正在使用这个
receipts_as_df <- as.data.frame(matrix(unlist(receipts), byrow=TRUE, ncol=length(receipts[[1]])))
这会将数据放入数据框中。可悲的是,unlist
删除了有关数据类型的所有信息(我认为一切都被强制为 character
)。此外,列名也会丢失。因此,我有一个包含所有数据的数据框,但类型和 column-names 丢失了。
我知道我可以手动重命名列和数据类型,但想知道是否有更舒适的方法来处理这种情况。
示例:当前数据框如下所示
> head(receipts_as_df)
V1 V2 V3 V4 V5 V6
1 SALESPERSON_ACTIVITY 1577867340 1195 1 12 9
2 CASH_REGISTER_MONITORING 1577867340 <NA> 1 12 9
3 PAYOUT_NOTIFICATION 1577867340 1196 1 12 9
4 TSE_ACTIVITY 1577869080 <NA> 1 12 9
5 BUSINESS_MODE_ACTIVITY 1577869080 <NA> 1 12 9
6 ZERO_RECEIPT 1577869140 1197 1 12 9
基础 R,需要做更多的工作:
receipts <- replicate(3, list(
receipt_type = "SALESPERSON_ACTIVITY",
timestamp = as.POSIXct("2020-01-01 09:29:00", tz = "UTC"),
receipt_number = 1195,
POS = 1,
KNo = 12,
shift_number = 9
), simplify = FALSE)
out <- do.call(rbind.data.frame, c(receipts, list(stringsAsFactors = FALSE)))
out
# receipt_type timestamp receipt_number POS KNo shift_number
# 2 SALESPERSON_ACTIVITY 1577870940 1195 1 12 9
# 21 SALESPERSON_ACTIVITY 1577870940 1195 1 12 9
# 3 SALESPERSON_ACTIVITY 1577870940 1195 1 12 9
str(out)
# 'data.frame': 3 obs. of 6 variables:
# $ receipt_type : chr "SALESPERSON_ACTIVITY" "SALESPERSON_ACTIVITY" "SALESPERSON_ACTIVITY"
# $ timestamp : num 1.58e+09 1.58e+09 1.58e+09
# $ receipt_number: num 1195 1195 1195
# $ POS : num 1 1 1
# $ KNo : num 12 12 12
# $ shift_number : num 9 9 9
out$timestamp <- as.POSIXct(out$timestamp, origin = "1970-01-01")
out
# receipt_type timestamp receipt_number POS KNo shift_number
# 2 SALESPERSON_ACTIVITY 2020-01-01 01:29:00 1195 1 12 9
# 21 SALESPERSON_ACTIVITY 2020-01-01 01:29:00 1195 1 12 9
# 3 SALESPERSON_ACTIVITY 2020-01-01 01:29:00 1195 1 12 9
dplyr
和 data.table
无需额外工作:
dplyr::bind_rows(receipts)
# # A tibble: 3 x 6
# receipt_type timestamp receipt_number POS KNo shift_number
# <chr> <dttm> <dbl> <dbl> <dbl> <dbl>
# 1 SALESPERSON_ACTIVITY 2020-01-01 09:29:00 1195 1 12 9
# 2 SALESPERSON_ACTIVITY 2020-01-01 09:29:00 1195 1 12 9
# 3 SALESPERSON_ACTIVITY 2020-01-01 09:29:00 1195 1 12 9
data.table::rbindlist(receipts)
# receipt_type timestamp receipt_number POS KNo shift_number
# 1: SALESPERSON_ACTIVITY 2020-01-01 09:29:00 1195 1 12 9
# 2: SALESPERSON_ACTIVITY 2020-01-01 09:29:00 1195 1 12 9
# 3: SALESPERSON_ACTIVITY 2020-01-01 09:29:00 1195 1 12 9