如何解析 R 中的 python 列表？

Question

考虑这个简单的例子

tibble(mylist = c("['this is some text from Python!', 'and this is another one!']",
                  "['this is also some cool stuff', 'and this is awesome!']"))
# A tibble: 2 x 1
 mylist                                                        
  <chr>                                                         
1 ['this is some text from Python!', 'and this is another one!']
2 ['this is also some cool stuff', 'and this is awesome!']

我想解析 python-list-like 以便 dplyr 理解这是一个句子列表（字符变量）。也就是说，像

> tibble(mylist = list(list('this is some text from Python!', 'and this is another one!'),
+                      list('this is also some cool stuff', 'and this is awesome!'))) 
# A tibble: 2 x 1
  mylist    
  <list>    
1 <list [2]>
2 <list [2]>

我该怎么做？谢谢！

Answer 1

有

tb <- tibble(mylist = c("['this is some text from Python!', 'and this is another one!]",
                        "['this is also some cool stuff', 'and this is awesome!]"))

lapply(tb$mylist, function(pyString) strsplit(gsub("\]","",gsub("\[","",pyString)),", ")[[1]])

解决了你的问题？想法如下：我们要删除方括号，这可以通过用空字符替换它们来完成。请注意 "\[" 和 "\]" 是必需的，因为方括号在正则表达式中起作用（关键字：转义字符）。然后，我们在逗号+space 处拆分字符串，因为它看起来像是列表的 python 分隔符。重要提示：如果您知道逗号+space 也是您想要获取的 python 字符串的一部分，则此解决方案不起作用。

Answer 2

删除括号后，您可以使用 read.table() 读取每一行。

library(dplyr)
library(purrr)
library(stringr)

df %>% 
  mutate(mylist = str_remove(mylist, "^\["),
         mylist = str_remove(mylist, "]$")) %>% 
  mutate(mylist = map(mylist, ~ as.list(read.table(textConnection(.x),
                                                   sep = ",", quote = "'",
                                                   stringsAsFactors = FALSE))))

# # A tibble: 2 x 1
#   mylist          
#   <list>          
# 1 <named list [2]>
# 2 <named list [2]>

Answer 3

对于有效的 python 值，您可以使用 reticulate 包：

res<- tb %>%
  rowwise() %>%
  mutate(mylist=list(reticulate::py_eval(mylist)))

  mylist   
  <list>   
1 <chr [2]>
2 <chr [2]>

输出：

res$mylist
[[1]]
[1] "this is some text from Python!" "and this is another one!"      

[[2]]
[1] "this is also some cool stuff" "and this is awesome!"

如果您想要与您的输出完全相似，请包含 as.list

res1<- tb %>%
      rowwise() %>%
      mutate(mylist=list(as.list(reticulate::py_eval(mylist))))%>%
      ungroup()
  mylist    
  <list>    
1 <list [2]>
2 <list [2]>

all.equal(tb_res, res1)
[1] TRUE

如何解析 R 中的 python 列表？

how to parse a python list in R?

r

dplyr

tibble