从每个单元格中单独提取一个字符串并创建新的数据框

Question

我目前正在运行进行一项实验，检查某些博彩公司的赔率。问题是我将整个数据框作为字符串插入到每个单元格中。一列是赛程 ID，第二列是各个博彩公司针对该特定赛程的所有赔率，但我想将这个数据框转换成许多数据框，分别显示每个博彩公司的数据。我正在考虑制作某种 for 循环以使用“提取”命令一次性完成所有操作，但由于字符串在单元格中的格式，R 不喜欢它。

并非所有单元格都包含相同数量的数据。一个单元格中的数据的示例如下所示：

[{'bookmakerId': 22, 'updated': '2021-10-01T14:46:39.890Z', 'homePrice': 1.952, 'drawPrice': 3.2, 'awayPrice': 3.9}, {'bookmakerId': 83, 'updated': '2021-10-01T03:00:51.760Z', 'homePrice': 2.01, 'drawPrice': 3.3, 'awayPrice': 4.15}]

我当前的数据框如下所示：

Fixture	Data
1	Data1
2	Data2

我要的格式是这样的：

对于博彩公司ID：K，其中K是[1..n]

中的一个元素

Fixture	Home	draw	away
1	a	b	c
2	d	e	f

非常感谢，伙计们。

编辑：输出数据

structure(list(fixtureId = c("runningball-adaptor-1510023", "runningball-adaptor-1510018", 
"runningball-adaptor-1510019", "isd-adaptor-8191632", "runningball-adaptor-1510026", 
"runningball-adaptor-1510020"), oneXTwoBookmakers = c("[{'bookmakerId': 22, 'updated': '2021-10-01T14:46:39.890Z', 'homePrice': 1.952, 'drawPrice': 3.2, 'awayPrice': 3.9}, {'bookmakerId': 83, 'updated': '2021-10-01T03:00:51.760Z', 'homePrice': 2.01, 'drawPrice': 3.3, 'awayPrice': 4.15}, {'bookmakerId': 37, 'updated': '2021-10-01T18:08:53.723Z', 'homePrice': 1.97, 'drawPrice': 3.3, 'awayPrice': 4.1}, {'bookmakerId': 17, 'updated': '2021-09-30T19:59:19.163Z', 'homePrice': 1.91, 'drawPrice': 3.0, 'awayPrice': 3.75}, {'bookmakerId': 340, 'updated': '2021-09-30T20:20:33.470Z', 'homePrice': 1.95, 'drawPrice': 3.1, 'awayPrice': 4.2}]", "[{'bookmakerId': 22, 'updated': '2021-10-02T09:09:55.190Z', 'homePrice': 2.625, 'drawPrice': 3.0, 'awayPrice': 2.75}, {'bookmakerId': 83, 'updated': '2021-10-02T09:02:45.117Z', 'homePrice': 2.74, 'drawPrice': 3.11, 'awayPrice': 2.84}, {'bookmakerId': 37, 'updated': '2021-10-02T09:21:07.150Z', 'homePrice': 2.7, 'drawPrice': 3.1, 'awayPrice': 2.8}, {'bookmakerId': 17, 'updated': '2021-10-02T09:05:07.353Z', 'homePrice': 2.55, 'drawPrice': 2.9, 'awayPrice': 2.62}, {'bookmakerId': 340, 'updated': '2021-10-02T09:47:39.697Z', 'homePrice': 2.62, 'drawPrice': 3.0, 'awayPrice': 2.8}]", 
"[{'bookmakerId': 22, 'updated': '2021-10-01T23:32:46.563Z', 'homePrice': 3.3, 'drawPrice': 3.1, 'awayPrice': 2.2}, {'bookmakerId': 83, 'updated': '2021-10-01T14:05:38.270Z', 'homePrice': 3.56, 'drawPrice': 3.02,  'awayPrice': 2.25}, {'bookmakerId': 37, 'updated': '2021-10-01T18:09:33.740Z', 'homePrice': 3.55, 'drawPrice': 3.1, 'awayPrice': 2.25}, {'bookmakerId': 17, 'updated': '2021-10-01T14:11:34.050Z', 'homePrice': 3.2, 'drawPrice': 3.0, 'awayPrice': 2.15}, {'bookmakerId': 340, 'updated': '2021-10-01T15:50:45.820Z', 'homePrice': 3.4, 'drawPrice': 3.0, 'awayPrice': 2.25}]", 
"[{'bookmakerId': 17, 'updated': '2021-10-02T13:42:23.827Z', 'homePrice': 3.1, 'drawPrice': 3.2, 'awayPrice': 2.3}]", "[{'bookmakerId': 22, 'updated': '2021-10-02T16:05:45.170Z', 'homePrice': 1.727, 'drawPrice': 3.4, 'awayPrice': 4.75}, {'bookmakerId': 83, 'updated': '2021-10-01T16:45:18.623Z', 'homePrice': 1.757, 'drawPrice': 3.49, 'awayPrice': 4.91}, {'bookmakerId': 37, 'updated': '2021-10-02T15:25:06.367Z', 'homePrice': 1.75, 'drawPrice': 3.55, 'awayPrice': 5.0}, {'bookmakerId': 17, 'updated': '2021-10-01T17:31:02.897Z', 'homePrice': 1.7, 'drawPrice': 3.25, 'awayPrice': 4.4}, {'bookmakerId': 340, 'updated': '2021-10-01T19:57:13.193Z', 'homePrice': 1.77, 'drawPrice': 3.4, 'awayPrice': 4.75}]", "[{'bookmakerId': 385, 'updated': '2021-10-02T18:55:06.670Z', 'homePrice': 2.59, 'drawPrice': 2.98, 'awayPrice': 2.64}, {'bookmakerId': 22, 'updated': '2021-10-02T17:50:13.473Z', 'homePrice': 2.6, 'drawPrice': 3.0, 'awayPrice': 2.75}, {'bookmakerId': 37, 'updated': '2021-10-02T15:25:06.477Z', 'homePrice': 2.6, 'drawPrice': 3.1, 'awayPrice': 2.9}, {'bookmakerId': 17, 'updated': '2021-10-01T19:28:28.587Z', 'homePrice': 2.45, 'drawPrice': 2.9, 'awayPrice': 2.7}, {'bookmakerId': 327, 'updated': '2021-10-02T18:49:56.213Z', 'homePrice': 2.47, 'drawPrice': 3.05, 'awayPrice': 2.57}, {'bookmakerId': 83, 'updated': '2021-10-01T19:26:07.253Z', 'homePrice': 2.65, 'drawPrice': 3.02, 'awayPrice': 2.89}, {'bookmakerId': 42, 'updated': '2021-10-02T17:49:44.437Z', 'homePrice': 2.6, 'drawPrice': 3.1, 'awayPrice': 2.8}, {'bookmakerId': 340, 'updated': '2021-10-01T23:40:53.500Z', 'homePrice': 2.62, 'drawPrice': 3.0, 'awayPrice': 2.8}, {'bookmakerId': 285, 'updated': '2021-10-02T18:55:03.520Z', 'homePrice': 2.59, 'drawPrice': 2.98, 'awayPrice': 2.64}]"
)), row.names = c(NA, 6L), class = c("tbl_df", "tbl", "data.frame"
))

建议的代码 Ak运行非常有效：

df <- One_X_Two_Data %>% 
+     mutate(oneXTwoBookmakers = map(oneXTwoBookmakers, ~ py_eval(.x) %>% 
+                                        bind_rows))  %>% 
+     unnest(oneXTwoBookmakers)
Error in `mutate()`:

然而，当我稍后返回时，我在尝试重新运行代码时收到此错误：

Error in `mutate()`:
! Problem while computing `oneXTwoBookmakers = map(oneXTwoBookmakers,
  ~py_eval(.x) %>% bind_rows)`.
Caused by error:
! invalid version specification ‘'\.\tm1174\w2k'’, ‘..EXE was started with the above path as the current directory.’, ‘. paths are not supported.  Defaulting to Windows directory.’
Run `rlang::last_error()` to see where the error occurred.
Warning message:
Problem while computing `oneXTwoBookmakers = map(oneXTwoBookmakers, ~py_eval(.x) %>% bind_rows)`.
i the condition has length > 1 and only the first element will be used

Answer 1

这是评估字符串 (py_eval) 然后 unnest list 列

的一种方法

library(dplyr)
library(reticulate)
library(tidyr)
library(purrr)
df %>% 
  mutate(oneXTwoBookmakers = map(oneXTwoBookmakers, ~ py_eval(.x) %>% 
        bind_rows))  %>% 
  unnest(oneXTwoBookmakers)

-输出

# A tibble: 30 × 6
   fixtureId                   bookmakerId updated                  homePrice drawPrice awayPrice
   <chr>                             <int> <chr>                        <dbl>     <dbl>     <dbl>
 1 runningball-adaptor-1510023          22 2021-10-01T14:46:39.890Z      1.95      3.2       3.9 
 2 runningball-adaptor-1510023          83 2021-10-01T03:00:51.760Z      2.01      3.3       4.15
 3 runningball-adaptor-1510023          37 2021-10-01T18:08:53.723Z      1.97      3.3       4.1 
 4 runningball-adaptor-1510023          17 2021-09-30T19:59:19.163Z      1.91      3         3.75
 5 runningball-adaptor-1510023         340 2021-09-30T20:20:33.470Z      1.95      3.1       4.2 
 6 runningball-adaptor-1510018          22 2021-10-02T09:09:55.190Z      2.62      3         2.75
 7 runningball-adaptor-1510018          83 2021-10-02T09:02:45.117Z      2.74      3.11      2.84
 8 runningball-adaptor-1510018          37 2021-10-02T09:21:07.150Z      2.7       3.1       2.8 
 9 runningball-adaptor-1510018          17 2021-10-02T09:05:07.353Z      2.55      2.9       2.62
10 runningball-adaptor-1510018         340 2021-10-02T09:47:39.697Z      2.62      3         2.8 
# … with 20 more rows

或者另一种选择是 jsonlite

library(jsonlite)
df %>% 
  mutate(oneXTwoBookmakers = map(oneXTwoBookmakers, 
     ~ fromJSON(chartr("'", '"', .x) ))) %>% 
   unnest(oneXTwoBookmakers)

-输出

# A tibble: 30 × 6
   fixtureId                   bookmakerId updated                  homePrice drawPrice awayPrice
   <chr>                             <int> <chr>                        <dbl>     <dbl>     <dbl>
 1 runningball-adaptor-1510023          22 2021-10-01T14:46:39.890Z      1.95      3.2       3.9 
 2 runningball-adaptor-1510023          83 2021-10-01T03:00:51.760Z      2.01      3.3       4.15
 3 runningball-adaptor-1510023          37 2021-10-01T18:08:53.723Z      1.97      3.3       4.1 
 4 runningball-adaptor-1510023          17 2021-09-30T19:59:19.163Z      1.91      3         3.75
 5 runningball-adaptor-1510023         340 2021-09-30T20:20:33.470Z      1.95      3.1       4.2 
 6 runningball-adaptor-1510018          22 2021-10-02T09:09:55.190Z      2.62      3         2.75
 7 runningball-adaptor-1510018          83 2021-10-02T09:02:45.117Z      2.74      3.11      2.84
 8 runningball-adaptor-1510018          37 2021-10-02T09:21:07.150Z      2.7       3.1       2.8 
 9 runningball-adaptor-1510018          17 2021-10-02T09:05:07.353Z      2.55      2.9       2.62
10 runningball-adaptor-1510018         340 2021-10-02T09:47:39.697Z      2.62      3         2.8 
# … with 20 more rows

从每个单元格中单独提取一个字符串并创建新的数据框

Extract a string from each cell individually and create new data frames

string

r

dataframe

data-extraction