如何在 R 中拆分 Python 列表

How to split up a Python list in R

我在 Python 中创建了一个列表,并将其嵌入到 csv 的单元格中。我试图将元素强制转换为 R 中的数据表,但我被困在一个包含文本的特定向量上。原因是,虽然 strsplit() 通过拆分“,”可以很好地处理数值,但文本中任何嵌入的逗号都会导致一个向量比其他向量长。下面我附上了一个可重现的例子。感谢您提供的任何帮助!

x <- c("['SPOSORSHIP FOR CONVENTION']", "['GENERAL CONTRIBUTION', 'GENERAL CONTRIBUTION']", 
"['WOMEN & POPULATION']", "['PROGRAM SUPPORT', 'PROGRAM SUPPORT']", 
"['MULTIPLE GRANTS FOR MULTIPLE PURPOSES']", "['IMPROVING NATIONAL PARKS']", 
"['general operating support']", "['Civic Engagement', 'Animal Welfare', 'Religion']", 
"['RESEARCH SUBAWARD']", "['OPERATIONAL SUPPORT', 'OPERATIONAL SUPPORT']", 
"['PROMOTE FILM INDUSTRY']", "['TO SUPPORT PUBLIC AFFAIRS PROGRAMS', 'TO SUPPORT PUBLIC AFFAIRS PROGRAMS', 'TO SUPPORT PUBLIC AFFAIRS PROGRAMS', 'TO SUPPORT PUBLIC AFFAIRS PROGRAMS', 'TO SUPPORT PUBLIC AFFAIRS PROGRAMS', 'TO SUPPORT PUBLIC AFFAIRS PROGRAMS']", 
"['10TH ANNUAL GREAT LAKES RESTORATION CONFERENCE AND PETER WEGE TRIBUTE LUNCHEON']", 
"['Conservation', 'Conservation']", "['FOR GENERAL OPERATING SUPPORT']"
)

也许这会有所帮助。我首先删除 '[ 和 '] 然后拆分 ', '

cleeaned_text = gsub("(^\['+)|('\]\b)",'',x) #remove '[ and ]'
unlist( strsplit(cleeaned_text, "', '") ) #split on ', '
 [1] "SPOSORSHIP FOR CONVENTION"                                                     
 [2] "GENERAL CONTRIBUTION"                                                          
 [3] "GENERAL CONTRIBUTION"                                                          
 [4] "WOMEN & POPULATION"                                                            
 [5] "PROGRAM SUPPORT"                                                               
 [6] "PROGRAM SUPPORT"                                                               
 [7] "MULTIPLE GRANTS FOR MULTIPLE PURPOSES"                                         
 [8] "IMPROVING NATIONAL PARKS"                                                      
 [9] "general operating support"                                                     
[10] "Civic Engagement"                                                              
[11] "Animal Welfare"                                                                
[12] "Religion"                                                                      
[13] "RESEARCH SUBAWARD"                                                             
[14] "OPERATIONAL SUPPORT"                                                           
[15] "OPERATIONAL SUPPORT"                                                           
[16] "PROMOTE FILM INDUSTRY"                                                         
[17] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"                                            
[18] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"                                            
[19] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"                                            
[20] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"                                            
[21] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"                                            
[22] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"                                            
[23] "10TH ANNUAL GREAT LAKES RESTORATION CONFERENCE AND PETER WEGE TRIBUTE LUNCHEON"
[24] "Conservation"                                                                  
[25] "Conservation"                                                                  
[26] "FOR GENERAL OPERATING SUPPORT"  

两种解决方案:

# with stringr
library(stringr)
a <- str_replace_all(x, "\['|'\]", "") %>%
  str_split("', '") %>%
  unlist

# with base
b <- unlist(strsplit(gsub("\['|'\]", "", x), "', '"))

identical(a, b)

结果:

[1] "SPOSORSHIP FOR CONVENTION"
[2] "GENERAL CONTRIBUTION" "GENERAL CONTRIBUTION"
[3] "WOMEN & POPULATION"
...

诀窍是先 trim 字符串,然后用 ', ' 分隔,而不仅仅是逗号。