如何在 R 中拆分 Python 列表
How to split up a Python list in R
我在 Python 中创建了一个列表,并将其嵌入到 csv 的单元格中。我试图将元素强制转换为 R 中的数据表,但我被困在一个包含文本的特定向量上。原因是,虽然 strsplit() 通过拆分“,”可以很好地处理数值,但文本中任何嵌入的逗号都会导致一个向量比其他向量长。下面我附上了一个可重现的例子。感谢您提供的任何帮助!
x <- c("['SPOSORSHIP FOR CONVENTION']", "['GENERAL CONTRIBUTION', 'GENERAL CONTRIBUTION']",
"['WOMEN & POPULATION']", "['PROGRAM SUPPORT', 'PROGRAM SUPPORT']",
"['MULTIPLE GRANTS FOR MULTIPLE PURPOSES']", "['IMPROVING NATIONAL PARKS']",
"['general operating support']", "['Civic Engagement', 'Animal Welfare', 'Religion']",
"['RESEARCH SUBAWARD']", "['OPERATIONAL SUPPORT', 'OPERATIONAL SUPPORT']",
"['PROMOTE FILM INDUSTRY']", "['TO SUPPORT PUBLIC AFFAIRS PROGRAMS', 'TO SUPPORT PUBLIC AFFAIRS PROGRAMS', 'TO SUPPORT PUBLIC AFFAIRS PROGRAMS', 'TO SUPPORT PUBLIC AFFAIRS PROGRAMS', 'TO SUPPORT PUBLIC AFFAIRS PROGRAMS', 'TO SUPPORT PUBLIC AFFAIRS PROGRAMS']",
"['10TH ANNUAL GREAT LAKES RESTORATION CONFERENCE AND PETER WEGE TRIBUTE LUNCHEON']",
"['Conservation', 'Conservation']", "['FOR GENERAL OPERATING SUPPORT']"
)
也许这会有所帮助。我首先删除 '[ 和 '] 然后拆分 ', '
cleeaned_text = gsub("(^\['+)|('\]\b)",'',x) #remove '[ and ]'
unlist( strsplit(cleeaned_text, "', '") ) #split on ', '
[1] "SPOSORSHIP FOR CONVENTION"
[2] "GENERAL CONTRIBUTION"
[3] "GENERAL CONTRIBUTION"
[4] "WOMEN & POPULATION"
[5] "PROGRAM SUPPORT"
[6] "PROGRAM SUPPORT"
[7] "MULTIPLE GRANTS FOR MULTIPLE PURPOSES"
[8] "IMPROVING NATIONAL PARKS"
[9] "general operating support"
[10] "Civic Engagement"
[11] "Animal Welfare"
[12] "Religion"
[13] "RESEARCH SUBAWARD"
[14] "OPERATIONAL SUPPORT"
[15] "OPERATIONAL SUPPORT"
[16] "PROMOTE FILM INDUSTRY"
[17] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"
[18] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"
[19] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"
[20] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"
[21] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"
[22] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"
[23] "10TH ANNUAL GREAT LAKES RESTORATION CONFERENCE AND PETER WEGE TRIBUTE LUNCHEON"
[24] "Conservation"
[25] "Conservation"
[26] "FOR GENERAL OPERATING SUPPORT"
两种解决方案:
# with stringr
library(stringr)
a <- str_replace_all(x, "\['|'\]", "") %>%
str_split("', '") %>%
unlist
# with base
b <- unlist(strsplit(gsub("\['|'\]", "", x), "', '"))
identical(a, b)
结果:
[1] "SPOSORSHIP FOR CONVENTION"
[2] "GENERAL CONTRIBUTION" "GENERAL CONTRIBUTION"
[3] "WOMEN & POPULATION"
...
诀窍是先 trim 字符串,然后用 ', '
分隔,而不仅仅是逗号。
我在 Python 中创建了一个列表,并将其嵌入到 csv 的单元格中。我试图将元素强制转换为 R 中的数据表,但我被困在一个包含文本的特定向量上。原因是,虽然 strsplit() 通过拆分“,”可以很好地处理数值,但文本中任何嵌入的逗号都会导致一个向量比其他向量长。下面我附上了一个可重现的例子。感谢您提供的任何帮助!
x <- c("['SPOSORSHIP FOR CONVENTION']", "['GENERAL CONTRIBUTION', 'GENERAL CONTRIBUTION']",
"['WOMEN & POPULATION']", "['PROGRAM SUPPORT', 'PROGRAM SUPPORT']",
"['MULTIPLE GRANTS FOR MULTIPLE PURPOSES']", "['IMPROVING NATIONAL PARKS']",
"['general operating support']", "['Civic Engagement', 'Animal Welfare', 'Religion']",
"['RESEARCH SUBAWARD']", "['OPERATIONAL SUPPORT', 'OPERATIONAL SUPPORT']",
"['PROMOTE FILM INDUSTRY']", "['TO SUPPORT PUBLIC AFFAIRS PROGRAMS', 'TO SUPPORT PUBLIC AFFAIRS PROGRAMS', 'TO SUPPORT PUBLIC AFFAIRS PROGRAMS', 'TO SUPPORT PUBLIC AFFAIRS PROGRAMS', 'TO SUPPORT PUBLIC AFFAIRS PROGRAMS', 'TO SUPPORT PUBLIC AFFAIRS PROGRAMS']",
"['10TH ANNUAL GREAT LAKES RESTORATION CONFERENCE AND PETER WEGE TRIBUTE LUNCHEON']",
"['Conservation', 'Conservation']", "['FOR GENERAL OPERATING SUPPORT']"
)
也许这会有所帮助。我首先删除 '[ 和 '] 然后拆分 ', '
cleeaned_text = gsub("(^\['+)|('\]\b)",'',x) #remove '[ and ]'
unlist( strsplit(cleeaned_text, "', '") ) #split on ', '
[1] "SPOSORSHIP FOR CONVENTION"
[2] "GENERAL CONTRIBUTION"
[3] "GENERAL CONTRIBUTION"
[4] "WOMEN & POPULATION"
[5] "PROGRAM SUPPORT"
[6] "PROGRAM SUPPORT"
[7] "MULTIPLE GRANTS FOR MULTIPLE PURPOSES"
[8] "IMPROVING NATIONAL PARKS"
[9] "general operating support"
[10] "Civic Engagement"
[11] "Animal Welfare"
[12] "Religion"
[13] "RESEARCH SUBAWARD"
[14] "OPERATIONAL SUPPORT"
[15] "OPERATIONAL SUPPORT"
[16] "PROMOTE FILM INDUSTRY"
[17] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"
[18] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"
[19] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"
[20] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"
[21] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"
[22] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"
[23] "10TH ANNUAL GREAT LAKES RESTORATION CONFERENCE AND PETER WEGE TRIBUTE LUNCHEON"
[24] "Conservation"
[25] "Conservation"
[26] "FOR GENERAL OPERATING SUPPORT"
两种解决方案:
# with stringr
library(stringr)
a <- str_replace_all(x, "\['|'\]", "") %>%
str_split("', '") %>%
unlist
# with base
b <- unlist(strsplit(gsub("\['|'\]", "", x), "', '"))
identical(a, b)
结果:
[1] "SPOSORSHIP FOR CONVENTION"
[2] "GENERAL CONTRIBUTION" "GENERAL CONTRIBUTION"
[3] "WOMEN & POPULATION"
...
诀窍是先 trim 字符串,然后用 ', '
分隔,而不仅仅是逗号。