从每个单元格中单独提取一个字符串并创建新的数据框
Extract a string from each cell individually and create new data frames
我目前正在 运行 进行一项实验,检查某些博彩公司的赔率。问题是我将整个数据框作为字符串插入到每个单元格中。一列是赛程 ID,第二列是各个博彩公司针对该特定赛程的所有赔率,但我想将这个数据框转换成许多数据框,分别显示每个博彩公司的数据。我正在考虑制作某种 for 循环以使用“提取”命令一次性完成所有操作,但由于字符串在单元格中的格式,R 不喜欢它。
并非所有单元格都包含相同数量的数据。一个单元格中的数据的示例如下所示:
[{'bookmakerId': 22, 'updated': '2021-10-01T14:46:39.890Z', 'homePrice': 1.952, 'drawPrice': 3.2, 'awayPrice': 3.9}, {'bookmakerId': 83, 'updated': '2021-10-01T03:00:51.760Z', 'homePrice': 2.01, 'drawPrice': 3.3, 'awayPrice': 4.15}]
我当前的数据框如下所示:
Fixture
Data
1
Data1
2
Data2
我要的格式是这样的:
对于博彩公司ID:K,其中K是[1..n]
中的一个元素
Fixture
Home
draw
away
1
a
b
c
2
d
e
f
非常感谢,伙计们。
编辑:输出数据
structure(list(fixtureId = c("runningball-adaptor-1510023", "runningball-adaptor-1510018",
"runningball-adaptor-1510019", "isd-adaptor-8191632", "runningball-adaptor-1510026",
"runningball-adaptor-1510020"), oneXTwoBookmakers = c("[{'bookmakerId': 22, 'updated': '2021-10-01T14:46:39.890Z', 'homePrice': 1.952, 'drawPrice': 3.2, 'awayPrice': 3.9}, {'bookmakerId': 83, 'updated': '2021-10-01T03:00:51.760Z', 'homePrice': 2.01, 'drawPrice': 3.3, 'awayPrice': 4.15}, {'bookmakerId': 37, 'updated': '2021-10-01T18:08:53.723Z', 'homePrice': 1.97, 'drawPrice': 3.3, 'awayPrice': 4.1}, {'bookmakerId': 17, 'updated': '2021-09-30T19:59:19.163Z', 'homePrice': 1.91, 'drawPrice': 3.0, 'awayPrice': 3.75}, {'bookmakerId': 340, 'updated': '2021-09-30T20:20:33.470Z', 'homePrice': 1.95, 'drawPrice': 3.1, 'awayPrice': 4.2}]", "[{'bookmakerId': 22, 'updated': '2021-10-02T09:09:55.190Z', 'homePrice': 2.625, 'drawPrice': 3.0, 'awayPrice': 2.75}, {'bookmakerId': 83, 'updated': '2021-10-02T09:02:45.117Z', 'homePrice': 2.74, 'drawPrice': 3.11, 'awayPrice': 2.84}, {'bookmakerId': 37, 'updated': '2021-10-02T09:21:07.150Z', 'homePrice': 2.7, 'drawPrice': 3.1, 'awayPrice': 2.8}, {'bookmakerId': 17, 'updated': '2021-10-02T09:05:07.353Z', 'homePrice': 2.55, 'drawPrice': 2.9, 'awayPrice': 2.62}, {'bookmakerId': 340, 'updated': '2021-10-02T09:47:39.697Z', 'homePrice': 2.62, 'drawPrice': 3.0, 'awayPrice': 2.8}]",
"[{'bookmakerId': 22, 'updated': '2021-10-01T23:32:46.563Z', 'homePrice': 3.3, 'drawPrice': 3.1, 'awayPrice': 2.2}, {'bookmakerId': 83, 'updated': '2021-10-01T14:05:38.270Z', 'homePrice': 3.56, 'drawPrice': 3.02, 'awayPrice': 2.25}, {'bookmakerId': 37, 'updated': '2021-10-01T18:09:33.740Z', 'homePrice': 3.55, 'drawPrice': 3.1, 'awayPrice': 2.25}, {'bookmakerId': 17, 'updated': '2021-10-01T14:11:34.050Z', 'homePrice': 3.2, 'drawPrice': 3.0, 'awayPrice': 2.15}, {'bookmakerId': 340, 'updated': '2021-10-01T15:50:45.820Z', 'homePrice': 3.4, 'drawPrice': 3.0, 'awayPrice': 2.25}]",
"[{'bookmakerId': 17, 'updated': '2021-10-02T13:42:23.827Z', 'homePrice': 3.1, 'drawPrice': 3.2, 'awayPrice': 2.3}]", "[{'bookmakerId': 22, 'updated': '2021-10-02T16:05:45.170Z', 'homePrice': 1.727, 'drawPrice': 3.4, 'awayPrice': 4.75}, {'bookmakerId': 83, 'updated': '2021-10-01T16:45:18.623Z', 'homePrice': 1.757, 'drawPrice': 3.49, 'awayPrice': 4.91}, {'bookmakerId': 37, 'updated': '2021-10-02T15:25:06.367Z', 'homePrice': 1.75, 'drawPrice': 3.55, 'awayPrice': 5.0}, {'bookmakerId': 17, 'updated': '2021-10-01T17:31:02.897Z', 'homePrice': 1.7, 'drawPrice': 3.25, 'awayPrice': 4.4}, {'bookmakerId': 340, 'updated': '2021-10-01T19:57:13.193Z', 'homePrice': 1.77, 'drawPrice': 3.4, 'awayPrice': 4.75}]", "[{'bookmakerId': 385, 'updated': '2021-10-02T18:55:06.670Z', 'homePrice': 2.59, 'drawPrice': 2.98, 'awayPrice': 2.64}, {'bookmakerId': 22, 'updated': '2021-10-02T17:50:13.473Z', 'homePrice': 2.6, 'drawPrice': 3.0, 'awayPrice': 2.75}, {'bookmakerId': 37, 'updated': '2021-10-02T15:25:06.477Z', 'homePrice': 2.6, 'drawPrice': 3.1, 'awayPrice': 2.9}, {'bookmakerId': 17, 'updated': '2021-10-01T19:28:28.587Z', 'homePrice': 2.45, 'drawPrice': 2.9, 'awayPrice': 2.7}, {'bookmakerId': 327, 'updated': '2021-10-02T18:49:56.213Z', 'homePrice': 2.47, 'drawPrice': 3.05, 'awayPrice': 2.57}, {'bookmakerId': 83, 'updated': '2021-10-01T19:26:07.253Z', 'homePrice': 2.65, 'drawPrice': 3.02, 'awayPrice': 2.89}, {'bookmakerId': 42, 'updated': '2021-10-02T17:49:44.437Z', 'homePrice': 2.6, 'drawPrice': 3.1, 'awayPrice': 2.8}, {'bookmakerId': 340, 'updated': '2021-10-01T23:40:53.500Z', 'homePrice': 2.62, 'drawPrice': 3.0, 'awayPrice': 2.8}, {'bookmakerId': 285, 'updated': '2021-10-02T18:55:03.520Z', 'homePrice': 2.59, 'drawPrice': 2.98, 'awayPrice': 2.64}]"
)), row.names = c(NA, 6L), class = c("tbl_df", "tbl", "data.frame"
))
建议的代码 Ak运行 非常有效:
df <- One_X_Two_Data %>%
+ mutate(oneXTwoBookmakers = map(oneXTwoBookmakers, ~ py_eval(.x) %>%
+ bind_rows)) %>%
+ unnest(oneXTwoBookmakers)
Error in `mutate()`:
然而,当我稍后返回时,我在尝试重新 运行 代码时收到此错误:
Error in `mutate()`:
! Problem while computing `oneXTwoBookmakers = map(oneXTwoBookmakers,
~py_eval(.x) %>% bind_rows)`.
Caused by error:
! invalid version specification ‘'\.\tm1174\w2k'’, ‘..EXE was started with the above path as the current directory.’, ‘. paths are not supported. Defaulting to Windows directory.’
Run `rlang::last_error()` to see where the error occurred.
Warning message:
Problem while computing `oneXTwoBookmakers = map(oneXTwoBookmakers, ~py_eval(.x) %>% bind_rows)`.
i the condition has length > 1 and only the first element will be used
这是评估字符串 (py_eval
) 然后 unnest
list
列
的一种方法
library(dplyr)
library(reticulate)
library(tidyr)
library(purrr)
df %>%
mutate(oneXTwoBookmakers = map(oneXTwoBookmakers, ~ py_eval(.x) %>%
bind_rows)) %>%
unnest(oneXTwoBookmakers)
-输出
# A tibble: 30 × 6
fixtureId bookmakerId updated homePrice drawPrice awayPrice
<chr> <int> <chr> <dbl> <dbl> <dbl>
1 runningball-adaptor-1510023 22 2021-10-01T14:46:39.890Z 1.95 3.2 3.9
2 runningball-adaptor-1510023 83 2021-10-01T03:00:51.760Z 2.01 3.3 4.15
3 runningball-adaptor-1510023 37 2021-10-01T18:08:53.723Z 1.97 3.3 4.1
4 runningball-adaptor-1510023 17 2021-09-30T19:59:19.163Z 1.91 3 3.75
5 runningball-adaptor-1510023 340 2021-09-30T20:20:33.470Z 1.95 3.1 4.2
6 runningball-adaptor-1510018 22 2021-10-02T09:09:55.190Z 2.62 3 2.75
7 runningball-adaptor-1510018 83 2021-10-02T09:02:45.117Z 2.74 3.11 2.84
8 runningball-adaptor-1510018 37 2021-10-02T09:21:07.150Z 2.7 3.1 2.8
9 runningball-adaptor-1510018 17 2021-10-02T09:05:07.353Z 2.55 2.9 2.62
10 runningball-adaptor-1510018 340 2021-10-02T09:47:39.697Z 2.62 3 2.8
# … with 20 more rows
或者另一种选择是 jsonlite
library(jsonlite)
df %>%
mutate(oneXTwoBookmakers = map(oneXTwoBookmakers,
~ fromJSON(chartr("'", '"', .x) ))) %>%
unnest(oneXTwoBookmakers)
-输出
# A tibble: 30 × 6
fixtureId bookmakerId updated homePrice drawPrice awayPrice
<chr> <int> <chr> <dbl> <dbl> <dbl>
1 runningball-adaptor-1510023 22 2021-10-01T14:46:39.890Z 1.95 3.2 3.9
2 runningball-adaptor-1510023 83 2021-10-01T03:00:51.760Z 2.01 3.3 4.15
3 runningball-adaptor-1510023 37 2021-10-01T18:08:53.723Z 1.97 3.3 4.1
4 runningball-adaptor-1510023 17 2021-09-30T19:59:19.163Z 1.91 3 3.75
5 runningball-adaptor-1510023 340 2021-09-30T20:20:33.470Z 1.95 3.1 4.2
6 runningball-adaptor-1510018 22 2021-10-02T09:09:55.190Z 2.62 3 2.75
7 runningball-adaptor-1510018 83 2021-10-02T09:02:45.117Z 2.74 3.11 2.84
8 runningball-adaptor-1510018 37 2021-10-02T09:21:07.150Z 2.7 3.1 2.8
9 runningball-adaptor-1510018 17 2021-10-02T09:05:07.353Z 2.55 2.9 2.62
10 runningball-adaptor-1510018 340 2021-10-02T09:47:39.697Z 2.62 3 2.8
# … with 20 more rows
我目前正在 运行 进行一项实验,检查某些博彩公司的赔率。问题是我将整个数据框作为字符串插入到每个单元格中。一列是赛程 ID,第二列是各个博彩公司针对该特定赛程的所有赔率,但我想将这个数据框转换成许多数据框,分别显示每个博彩公司的数据。我正在考虑制作某种 for 循环以使用“提取”命令一次性完成所有操作,但由于字符串在单元格中的格式,R 不喜欢它。
并非所有单元格都包含相同数量的数据。一个单元格中的数据的示例如下所示:
[{'bookmakerId': 22, 'updated': '2021-10-01T14:46:39.890Z', 'homePrice': 1.952, 'drawPrice': 3.2, 'awayPrice': 3.9}, {'bookmakerId': 83, 'updated': '2021-10-01T03:00:51.760Z', 'homePrice': 2.01, 'drawPrice': 3.3, 'awayPrice': 4.15}]
我当前的数据框如下所示:
Fixture | Data |
---|---|
1 | Data1 |
2 | Data2 |
我要的格式是这样的:
对于博彩公司ID:K,其中K是[1..n]
中的一个元素Fixture | Home | draw | away |
---|---|---|---|
1 | a | b | c |
2 | d | e | f |
非常感谢,伙计们。
编辑:输出数据
structure(list(fixtureId = c("runningball-adaptor-1510023", "runningball-adaptor-1510018",
"runningball-adaptor-1510019", "isd-adaptor-8191632", "runningball-adaptor-1510026",
"runningball-adaptor-1510020"), oneXTwoBookmakers = c("[{'bookmakerId': 22, 'updated': '2021-10-01T14:46:39.890Z', 'homePrice': 1.952, 'drawPrice': 3.2, 'awayPrice': 3.9}, {'bookmakerId': 83, 'updated': '2021-10-01T03:00:51.760Z', 'homePrice': 2.01, 'drawPrice': 3.3, 'awayPrice': 4.15}, {'bookmakerId': 37, 'updated': '2021-10-01T18:08:53.723Z', 'homePrice': 1.97, 'drawPrice': 3.3, 'awayPrice': 4.1}, {'bookmakerId': 17, 'updated': '2021-09-30T19:59:19.163Z', 'homePrice': 1.91, 'drawPrice': 3.0, 'awayPrice': 3.75}, {'bookmakerId': 340, 'updated': '2021-09-30T20:20:33.470Z', 'homePrice': 1.95, 'drawPrice': 3.1, 'awayPrice': 4.2}]", "[{'bookmakerId': 22, 'updated': '2021-10-02T09:09:55.190Z', 'homePrice': 2.625, 'drawPrice': 3.0, 'awayPrice': 2.75}, {'bookmakerId': 83, 'updated': '2021-10-02T09:02:45.117Z', 'homePrice': 2.74, 'drawPrice': 3.11, 'awayPrice': 2.84}, {'bookmakerId': 37, 'updated': '2021-10-02T09:21:07.150Z', 'homePrice': 2.7, 'drawPrice': 3.1, 'awayPrice': 2.8}, {'bookmakerId': 17, 'updated': '2021-10-02T09:05:07.353Z', 'homePrice': 2.55, 'drawPrice': 2.9, 'awayPrice': 2.62}, {'bookmakerId': 340, 'updated': '2021-10-02T09:47:39.697Z', 'homePrice': 2.62, 'drawPrice': 3.0, 'awayPrice': 2.8}]",
"[{'bookmakerId': 22, 'updated': '2021-10-01T23:32:46.563Z', 'homePrice': 3.3, 'drawPrice': 3.1, 'awayPrice': 2.2}, {'bookmakerId': 83, 'updated': '2021-10-01T14:05:38.270Z', 'homePrice': 3.56, 'drawPrice': 3.02, 'awayPrice': 2.25}, {'bookmakerId': 37, 'updated': '2021-10-01T18:09:33.740Z', 'homePrice': 3.55, 'drawPrice': 3.1, 'awayPrice': 2.25}, {'bookmakerId': 17, 'updated': '2021-10-01T14:11:34.050Z', 'homePrice': 3.2, 'drawPrice': 3.0, 'awayPrice': 2.15}, {'bookmakerId': 340, 'updated': '2021-10-01T15:50:45.820Z', 'homePrice': 3.4, 'drawPrice': 3.0, 'awayPrice': 2.25}]",
"[{'bookmakerId': 17, 'updated': '2021-10-02T13:42:23.827Z', 'homePrice': 3.1, 'drawPrice': 3.2, 'awayPrice': 2.3}]", "[{'bookmakerId': 22, 'updated': '2021-10-02T16:05:45.170Z', 'homePrice': 1.727, 'drawPrice': 3.4, 'awayPrice': 4.75}, {'bookmakerId': 83, 'updated': '2021-10-01T16:45:18.623Z', 'homePrice': 1.757, 'drawPrice': 3.49, 'awayPrice': 4.91}, {'bookmakerId': 37, 'updated': '2021-10-02T15:25:06.367Z', 'homePrice': 1.75, 'drawPrice': 3.55, 'awayPrice': 5.0}, {'bookmakerId': 17, 'updated': '2021-10-01T17:31:02.897Z', 'homePrice': 1.7, 'drawPrice': 3.25, 'awayPrice': 4.4}, {'bookmakerId': 340, 'updated': '2021-10-01T19:57:13.193Z', 'homePrice': 1.77, 'drawPrice': 3.4, 'awayPrice': 4.75}]", "[{'bookmakerId': 385, 'updated': '2021-10-02T18:55:06.670Z', 'homePrice': 2.59, 'drawPrice': 2.98, 'awayPrice': 2.64}, {'bookmakerId': 22, 'updated': '2021-10-02T17:50:13.473Z', 'homePrice': 2.6, 'drawPrice': 3.0, 'awayPrice': 2.75}, {'bookmakerId': 37, 'updated': '2021-10-02T15:25:06.477Z', 'homePrice': 2.6, 'drawPrice': 3.1, 'awayPrice': 2.9}, {'bookmakerId': 17, 'updated': '2021-10-01T19:28:28.587Z', 'homePrice': 2.45, 'drawPrice': 2.9, 'awayPrice': 2.7}, {'bookmakerId': 327, 'updated': '2021-10-02T18:49:56.213Z', 'homePrice': 2.47, 'drawPrice': 3.05, 'awayPrice': 2.57}, {'bookmakerId': 83, 'updated': '2021-10-01T19:26:07.253Z', 'homePrice': 2.65, 'drawPrice': 3.02, 'awayPrice': 2.89}, {'bookmakerId': 42, 'updated': '2021-10-02T17:49:44.437Z', 'homePrice': 2.6, 'drawPrice': 3.1, 'awayPrice': 2.8}, {'bookmakerId': 340, 'updated': '2021-10-01T23:40:53.500Z', 'homePrice': 2.62, 'drawPrice': 3.0, 'awayPrice': 2.8}, {'bookmakerId': 285, 'updated': '2021-10-02T18:55:03.520Z', 'homePrice': 2.59, 'drawPrice': 2.98, 'awayPrice': 2.64}]"
)), row.names = c(NA, 6L), class = c("tbl_df", "tbl", "data.frame"
))
建议的代码 Ak运行 非常有效:
df <- One_X_Two_Data %>%
+ mutate(oneXTwoBookmakers = map(oneXTwoBookmakers, ~ py_eval(.x) %>%
+ bind_rows)) %>%
+ unnest(oneXTwoBookmakers)
Error in `mutate()`:
然而,当我稍后返回时,我在尝试重新 运行 代码时收到此错误:
Error in `mutate()`:
! Problem while computing `oneXTwoBookmakers = map(oneXTwoBookmakers,
~py_eval(.x) %>% bind_rows)`.
Caused by error:
! invalid version specification ‘'\.\tm1174\w2k'’, ‘..EXE was started with the above path as the current directory.’, ‘. paths are not supported. Defaulting to Windows directory.’
Run `rlang::last_error()` to see where the error occurred.
Warning message:
Problem while computing `oneXTwoBookmakers = map(oneXTwoBookmakers, ~py_eval(.x) %>% bind_rows)`.
i the condition has length > 1 and only the first element will be used
这是评估字符串 (py_eval
) 然后 unnest
list
列
library(dplyr)
library(reticulate)
library(tidyr)
library(purrr)
df %>%
mutate(oneXTwoBookmakers = map(oneXTwoBookmakers, ~ py_eval(.x) %>%
bind_rows)) %>%
unnest(oneXTwoBookmakers)
-输出
# A tibble: 30 × 6
fixtureId bookmakerId updated homePrice drawPrice awayPrice
<chr> <int> <chr> <dbl> <dbl> <dbl>
1 runningball-adaptor-1510023 22 2021-10-01T14:46:39.890Z 1.95 3.2 3.9
2 runningball-adaptor-1510023 83 2021-10-01T03:00:51.760Z 2.01 3.3 4.15
3 runningball-adaptor-1510023 37 2021-10-01T18:08:53.723Z 1.97 3.3 4.1
4 runningball-adaptor-1510023 17 2021-09-30T19:59:19.163Z 1.91 3 3.75
5 runningball-adaptor-1510023 340 2021-09-30T20:20:33.470Z 1.95 3.1 4.2
6 runningball-adaptor-1510018 22 2021-10-02T09:09:55.190Z 2.62 3 2.75
7 runningball-adaptor-1510018 83 2021-10-02T09:02:45.117Z 2.74 3.11 2.84
8 runningball-adaptor-1510018 37 2021-10-02T09:21:07.150Z 2.7 3.1 2.8
9 runningball-adaptor-1510018 17 2021-10-02T09:05:07.353Z 2.55 2.9 2.62
10 runningball-adaptor-1510018 340 2021-10-02T09:47:39.697Z 2.62 3 2.8
# … with 20 more rows
或者另一种选择是 jsonlite
library(jsonlite)
df %>%
mutate(oneXTwoBookmakers = map(oneXTwoBookmakers,
~ fromJSON(chartr("'", '"', .x) ))) %>%
unnest(oneXTwoBookmakers)
-输出
# A tibble: 30 × 6
fixtureId bookmakerId updated homePrice drawPrice awayPrice
<chr> <int> <chr> <dbl> <dbl> <dbl>
1 runningball-adaptor-1510023 22 2021-10-01T14:46:39.890Z 1.95 3.2 3.9
2 runningball-adaptor-1510023 83 2021-10-01T03:00:51.760Z 2.01 3.3 4.15
3 runningball-adaptor-1510023 37 2021-10-01T18:08:53.723Z 1.97 3.3 4.1
4 runningball-adaptor-1510023 17 2021-09-30T19:59:19.163Z 1.91 3 3.75
5 runningball-adaptor-1510023 340 2021-09-30T20:20:33.470Z 1.95 3.1 4.2
6 runningball-adaptor-1510018 22 2021-10-02T09:09:55.190Z 2.62 3 2.75
7 runningball-adaptor-1510018 83 2021-10-02T09:02:45.117Z 2.74 3.11 2.84
8 runningball-adaptor-1510018 37 2021-10-02T09:21:07.150Z 2.7 3.1 2.8
9 runningball-adaptor-1510018 17 2021-10-02T09:05:07.353Z 2.55 2.9 2.62
10 runningball-adaptor-1510018 340 2021-10-02T09:47:39.697Z 2.62 3 2.8
# … with 20 more rows