将 hashtable/dictionary/array 格式数据转换为基于 data.frame 的常规列

Convert hashtable/dictionary/array format data into regular column based data.frame

我是 R 的初学者,以前从未处理过这些类型的数据。我有以下两种类型的示例数据集(df1 和 df2),如下所示:

df1 <- c("{\"\"Wednesday\"\":4,\"\"Monday\"\":5,\"\"Saturday\"\":4,\"\"Thursday\"\":4,\"\"Tuesday\"\":5,\"\"Friday\"\":1,\"\"Sunday\"\":5,\"\"Missing day\"\":2}",
                "{\"\"Wednesday\"\":6,\"\"Monday\"\":5,\"\"Saturday\"\":2,\"\"Thursday\"\":6,\"\"Tuesday\"\":0,\"\"Friday\"\":2,\"\"Sunday\"\":4,\"\"Missing day\"\":1}",
                "{\"\"Wednesday\"\":5,\"\"Monday\"\":5,\"\"Saturday\"\":3,\"\"Thursday\"\":8,\"\"Tuesday\"\":4,\"\"Friday\"\":3,\"\"Sunday\"\":6,\"\"Missing day\"\":4}",
                "{\"\"Wednesday\"\":3,\"\"Monday\"\":5,\"\"Saturday\"\":4,\"\"Thursday\"\":1,\"\"Tuesday\"\":5,\"\"Friday\"\":4,\"\"Sunday\"\":4,\"\"Missing day\"\":6}")

df2 <- c("[373,357,382,411,310,315,330,385,367,396,402,348,354,343,392,395,392,401,376,448,341,373,369,304,298,332,366,287,334,222]",
         "[319,347,284,313,300,292,228,322,291,275,278,289,323,342,272,242,295,347,290,343,337,309,268,251,256,266,346,260,232,160]",
         "[165,154,161,152,164,152,156,150,137,170,147,210,235,190,176,175,191,186,209,157,210,199,162,149,162,165,174,171,178,126]",
         "[253,274,240,258,264,231,296,233,230,252,210,233,233,295,235,229,270,275,278,297,255,253,250,252,299,305,310,308,263,141]")

现在,我需要将 df1 转换为 df1_final,将 df2 转换为 df2_final。最终数据集应该是这样的:

df1_final <- data.frame("Day"=c("Wednesday","Monday", "Saturday", "Thursday", "Tuesday", "Friday", "Sunday", "Missing day"),
                "Count1"=c(4,5,4,4,5,1,5,2),
                "Count2"=c(6,5,2,6,0,2,4,1),
                "Count3"=c(5,5,3,8,4,3,6,4),
                "Count4"=c(3,5,4,1,5,4,4,6))

df2_final <- data.frame("group1"=c(373,357,382,411,310,315,330,385,367,396,402,348,354,343,392,395,392,401,376,448,341,373,369,304,298,332,366,287,334,222),                     "group2"=c(319,347,284,313,300,292,228,322,291,275,278,289,323,342,272,242,295,347,290,343,337,309,268,251,256,266,346,260,232,160),                        "group3"=c(165,154,161,152,164,152,156,150,137,170,147,210,235,190,176,175,191,186,209,157,210,199,162,149,162,165,174,171,178,126),                        "group4"=c(253,274,240,258,264,231,296,233,230,252,210,233,233,295,235,229,270,275,278,297,255,253,250,252,299,305,310,308,263,141))

有人可以帮我解决这个问题吗?感谢您的帮助。谢谢!!

因此您可以使用 reticulate 或 jsonlite。我将按如下方式使用 Jsonlite:

对于 df1:

df1_f <- jsonlite::fromJSON(gsub('"+','"',sprintf("[%s]", paste0(df1, collapse = ","))))

data.frame(Day = names(df1_f), `colnames<-`(t(df1_f), paste0("count",1:4)), row.names = NULL)

          Day count1 count2 count3 count4
1   Wednesday      4      6      5      3
2      Monday      5      5      5      5
3    Saturday      4      2      3      4
4    Thursday      4      6      8      1
5     Tuesday      5      0      4      5
6      Friday      1      2      3      4
7      Sunday      5      4      6      4
8 Missing day      2      1      4      6

对于 df2 因为列表不在 {} 中,我们将不得不手动将其转换为数据框:

df2_fin <- jsonlite::fromJSON(sprintf("[%s]",paste0(df2, collapse = ",")))
(df2_final <- setNames(data.frame(t(df2_fin)), paste0("group",1:4)))

   group1 group2 group3 group4
1     373    319    165    253
2     357    347    154    274
3     382    284    161    240
4     411    313    152    258
5     310    300    164    264
6     315    292    152    231
7     330    228    156    296
8     385    322    150    233
9     367    291    137    230
10    396    275    170    252
11    402    278    147    210
12    348    289    210    233
13    354    323    235    233
:
: