如何解析 R 中的 python 列表?
how to parse a python list in R?
考虑这个简单的例子
tibble(mylist = c("['this is some text from Python!', 'and this is another one!']",
"['this is also some cool stuff', 'and this is awesome!']"))
# A tibble: 2 x 1
mylist
<chr>
1 ['this is some text from Python!', 'and this is another one!']
2 ['this is also some cool stuff', 'and this is awesome!']
我想解析 python-list-like 以便 dplyr 理解这是一个句子列表(字符变量)。也就是说,像
> tibble(mylist = list(list('this is some text from Python!', 'and this is another one!'),
+ list('this is also some cool stuff', 'and this is awesome!')))
# A tibble: 2 x 1
mylist
<list>
1 <list [2]>
2 <list [2]>
我该怎么做?
谢谢!
有
tb <- tibble(mylist = c("['this is some text from Python!', 'and this is another one!]",
"['this is also some cool stuff', 'and this is awesome!]"))
lapply(tb$mylist, function(pyString) strsplit(gsub("\]","",gsub("\[","",pyString)),", ")[[1]])
解决了你的问题?
想法如下:我们要删除方括号,这可以通过用空字符替换它们来完成。请注意 "\["
和 "\]"
是必需的,因为方括号在正则表达式中起作用(关键字:转义字符)。然后,我们在逗号+space 处拆分字符串,因为它看起来像是列表的 python 分隔符。重要提示:如果您知道逗号+space 也是您想要获取的 python 字符串的一部分,则此解决方案不起作用。
删除括号后,您可以使用 read.table() 读取每一行。
library(dplyr)
library(purrr)
library(stringr)
df %>%
mutate(mylist = str_remove(mylist, "^\["),
mylist = str_remove(mylist, "]$")) %>%
mutate(mylist = map(mylist, ~ as.list(read.table(textConnection(.x),
sep = ",", quote = "'",
stringsAsFactors = FALSE))))
# # A tibble: 2 x 1
# mylist
# <list>
# 1 <named list [2]>
# 2 <named list [2]>
对于有效的 python 值,您可以使用 reticulate
包:
res<- tb %>%
rowwise() %>%
mutate(mylist=list(reticulate::py_eval(mylist)))
mylist
<list>
1 <chr [2]>
2 <chr [2]>
输出:
res$mylist
[[1]]
[1] "this is some text from Python!" "and this is another one!"
[[2]]
[1] "this is also some cool stuff" "and this is awesome!"
如果您想要与您的输出完全相似,请包含 as.list
res1<- tb %>%
rowwise() %>%
mutate(mylist=list(as.list(reticulate::py_eval(mylist))))%>%
ungroup()
mylist
<list>
1 <list [2]>
2 <list [2]>
all.equal(tb_res, res1)
[1] TRUE
考虑这个简单的例子
tibble(mylist = c("['this is some text from Python!', 'and this is another one!']",
"['this is also some cool stuff', 'and this is awesome!']"))
# A tibble: 2 x 1
mylist
<chr>
1 ['this is some text from Python!', 'and this is another one!']
2 ['this is also some cool stuff', 'and this is awesome!']
我想解析 python-list-like 以便 dplyr 理解这是一个句子列表(字符变量)。也就是说,像
> tibble(mylist = list(list('this is some text from Python!', 'and this is another one!'),
+ list('this is also some cool stuff', 'and this is awesome!')))
# A tibble: 2 x 1
mylist
<list>
1 <list [2]>
2 <list [2]>
我该怎么做? 谢谢!
有
tb <- tibble(mylist = c("['this is some text from Python!', 'and this is another one!]",
"['this is also some cool stuff', 'and this is awesome!]"))
lapply(tb$mylist, function(pyString) strsplit(gsub("\]","",gsub("\[","",pyString)),", ")[[1]])
解决了你的问题?
想法如下:我们要删除方括号,这可以通过用空字符替换它们来完成。请注意 "\["
和 "\]"
是必需的,因为方括号在正则表达式中起作用(关键字:转义字符)。然后,我们在逗号+space 处拆分字符串,因为它看起来像是列表的 python 分隔符。重要提示:如果您知道逗号+space 也是您想要获取的 python 字符串的一部分,则此解决方案不起作用。
删除括号后,您可以使用 read.table() 读取每一行。
library(dplyr)
library(purrr)
library(stringr)
df %>%
mutate(mylist = str_remove(mylist, "^\["),
mylist = str_remove(mylist, "]$")) %>%
mutate(mylist = map(mylist, ~ as.list(read.table(textConnection(.x),
sep = ",", quote = "'",
stringsAsFactors = FALSE))))
# # A tibble: 2 x 1
# mylist
# <list>
# 1 <named list [2]>
# 2 <named list [2]>
对于有效的 python 值,您可以使用 reticulate
包:
res<- tb %>%
rowwise() %>%
mutate(mylist=list(reticulate::py_eval(mylist)))
mylist
<list>
1 <chr [2]>
2 <chr [2]>
输出:
res$mylist
[[1]]
[1] "this is some text from Python!" "and this is another one!"
[[2]]
[1] "this is also some cool stuff" "and this is awesome!"
如果您想要与您的输出完全相似,请包含 as.list
res1<- tb %>%
rowwise() %>%
mutate(mylist=list(as.list(reticulate::py_eval(mylist))))%>%
ungroup()
mylist
<list>
1 <list [2]>
2 <list [2]>
all.equal(tb_res, res1)
[1] TRUE