R - 拆分字符向量，以便将每个唯一元素添加到新的字符向量

Question

我有一个字符向量，其中单个元素包含多个以逗号分隔的字符串。我通过从数据框中提取它获得了这个列表，它看起来像这样：

 [1] "Acworth, Crescent Lake, East Acworth, Lynn, South Acworth"                                                                              
 [2] "Ferncroft, Passaconaway, Paugus Mill"                                                                                                   
 [3] "Alexandria, South Alexandria"                                                                                                           
 [4] "Allenstown, Blodgett, Kenison Corner, Suncook (part)"                                                                                   
 [5] "Alstead, Alstead Center, East Alstead, Forristalls Corner, Mill Hollow"                                                                 
 [6] "Alton, Alton Bay, Brookhurst, East Alton, Loon Cove, Mount Major, South Alton, Spring Haven, Stockbridge Corners, West Alton, Woodlands"
 [7] "Amherst, Baboosic Lake, Cricket Corner, Ponemah"                                                                                        
 [8] "Andover, Cilleyville, East Andover, Halcyon Station, Potter Place, West Andover"                                                        
 [9] "Antrim, Antrim Center, Clinton Village, Loverens Mill, North Branch"                                                                    
[10] "Ashland"

我想获得一个新的字符向量，其中每个字符串都是该字符向量中的一个元素，即：

 [1] "Acworth", "Crescent Lake", "East Acworth", "Lynn", "South Acworth"                                                                              
 [6] "Ferncroft", "Passaconaway", "Paugus Mill", "Alexandria", "South Alexandria"

我使用了 strsplit() 函数，但是这个 returns 是一个列表。当我尝试将其转换为字符向量时，它会恢复到原来的状态。

我确定这是一个非常简单的问题 - 将不胜感激任何帮助！谢谢！

Answer 1

您的 post 标题表明您需要唯一的字符串，因此

unique(unlist(strsplit(myvec, split=",")))

或

unique(unlist(strsplit(myvec, split=", ")))

如果您总是在逗号后跟一个 space。

Answer 2

您可以删除 space 并使用 "\s*,\s*" 正则表达式拆分字符向量，然后 unlist 结果：

v <- c("Acworth, Crescent Lake, East Acworth, Lynn, South Acworth", "Ferncroft, Passaconaway, Paugus Mill", "Alexandria, South Alexandria",  "Allenstown, Blodgett, Kenison Corner, Suncook (part)", "Alstead, Alstead Center, East Alstead, Forristalls Corner, Mill Hollow", "Alton, Alton Bay, Brookhurst, East Alton, Loon Cove, Mount Major, South Alton, Spring Haven, Stockbridge Corners, West Alton, Woodlands", "Amherst, Baboosic Lake, Cricket Corner, Ponemah",  "Andover, Cilleyville, East Andover, Halcyon Station, Potter Place, West Andover",  "Antrim, Antrim Center, Clinton Village, Loverens Mill, North Branch",  "Ashland" )
s <- unlist(strsplit(v, "\s*,\s*"))

见IDEONE demo

正则表达式匹配 , 两侧的零个或多个白色 space 符号 (\s*)，从而修剪值。即使在 "wild" space before 初始字符向量中有一个逗号，这也会处理这种情况。

Answer 3

作为替代方案，您也可以使用 scan，如下所示：

unique(scan(what = "", text = v, sep = ",", strip.white = TRUE))

strip.white = TRUE 部分处理您可能拥有的任何前导或尾随空格。

注意："v" 来自。

R - 拆分字符向量，以便将每个唯一元素添加到新的字符向量

R - Splitting character vector so that every unique element is added to a new character vector

regex

r

vector

strsplit