使用应用或其他矢量化方法合并列的内容
Merging contents of columns using apply or other vectorized approach
我有一个几乎完全是空白的 data.frame,但每一行都有一个值。我如何使用向量化或其他 r-vernacular 方法将每一行的内容合并到一个向量中?
示例数据:
raw_data <- structure(
list(
col1 = c("", "", "", "", ""),
col2 = c("", "", "", "", ""),
col3 = c("", "", "", "", ""),
col4 = c("", "", "", "Millburn - Union", ""),
col5 = c("", "", "Cranston (aka Garden City Center)", "",""),
col6 = c("", "", "", "", ""),
col7 = c("", "", "", "", ""),
col8 = c("", "", "", "", "Colorado Blvd"),
col9 = c("", "", "", "", ""),
col10 = c("", "", "", "", ""),
col11 = c("Palo Alto", "Castro (aka Market St)", "", "", "")
),
.Names = c("col1", "col2", "col3", "col4", "col5", "col6", "col7", "col8", "col9", "col10", "col11"),
row.names = c(5L, 4L, 3L, 2L, 1L),
class = "data.frame"
)
这是我尝试过的但失败了,因为它 returns 一个二维矩阵而不是所需的向量:
raw_data$test <- apply(raw_data, MAR=1, FUN=paste0)
您对 apply
的直觉是正确的。您只需要将 collapse
参数传递给 paste
:
apply( raw_data, 1, paste0, collapse = "" )
5 4 3
"Palo Alto" "Castro (aka Market St)" "Cranston (aka Garden City Center)"
2 1
"Millburn - Union" "Colorado Blvd"
您可以通过单个索引操作非常简单地完成此操作:
raw_data[raw_data!='']
演示:
R> raw_data[raw_data!=''];
[1] "Millburn - Union" "Cranston (aka Garden City Center)" "Colorado Blvd" "Palo Alto" "Castro (aka Market St)"
如果您关心矢量顺序是从上到下(而不是从左到右然后从上到下,这是上述操作所做的),您可以转置输入 data.frame:
R> t(raw_data)[t(raw_data)!=''];
[1] "Palo Alto" "Castro (aka Market St)" "Cranston (aka Garden City Center)" "Millburn - Union" "Colorado Blvd"
在此示例中,每行只有一个元素不是 ''
。这是将 paste
与 do.call
一起使用的另一种方法
do.call(paste, c(raw_data, sep=''))
#[1] "Palo Alto" "Castro (aka Market St)"
#[3] "Cranston (aka Garden City Center)" "Millburn - Union"
#[5] "Colorado Blvd"
假设,如果 'raw_data' 中每行有多个元素不是 ''
,在这种情况下,使用 sep=';' or
、`.[=19 可能会更好=]
raw_data[1,1] <- 'Millburn'
raw_data[1,3] <- 'Something'
gsub('^;+|;+$|(;);+', '\1', do.call(paste, c(raw_data, sep=';')))
#[1] "Millburn;Something;Palo Alto" "Castro (aka Market St)"
#[3] "Cranston (aka Garden City Center)" "Millburn - Union"
#[5] "Colorado Blvd"
apply
得到与上面相同的结果
unname(apply(raw_data, 1, FUN=function(x) paste(x[x!=''],collapse=';')))
#[1] "Millburn;Something;Palo Alto" "Castro (aka Market St)"
#[3] "Cranston (aka Garden City Center)" "Millburn - Union"
#[5] "Colorado Blvd"
我有一个几乎完全是空白的 data.frame,但每一行都有一个值。我如何使用向量化或其他 r-vernacular 方法将每一行的内容合并到一个向量中?
示例数据:
raw_data <- structure(
list(
col1 = c("", "", "", "", ""),
col2 = c("", "", "", "", ""),
col3 = c("", "", "", "", ""),
col4 = c("", "", "", "Millburn - Union", ""),
col5 = c("", "", "Cranston (aka Garden City Center)", "",""),
col6 = c("", "", "", "", ""),
col7 = c("", "", "", "", ""),
col8 = c("", "", "", "", "Colorado Blvd"),
col9 = c("", "", "", "", ""),
col10 = c("", "", "", "", ""),
col11 = c("Palo Alto", "Castro (aka Market St)", "", "", "")
),
.Names = c("col1", "col2", "col3", "col4", "col5", "col6", "col7", "col8", "col9", "col10", "col11"),
row.names = c(5L, 4L, 3L, 2L, 1L),
class = "data.frame"
)
这是我尝试过的但失败了,因为它 returns 一个二维矩阵而不是所需的向量:
raw_data$test <- apply(raw_data, MAR=1, FUN=paste0)
您对 apply
的直觉是正确的。您只需要将 collapse
参数传递给 paste
:
apply( raw_data, 1, paste0, collapse = "" )
5 4 3
"Palo Alto" "Castro (aka Market St)" "Cranston (aka Garden City Center)"
2 1
"Millburn - Union" "Colorado Blvd"
您可以通过单个索引操作非常简单地完成此操作:
raw_data[raw_data!='']
演示:
R> raw_data[raw_data!=''];
[1] "Millburn - Union" "Cranston (aka Garden City Center)" "Colorado Blvd" "Palo Alto" "Castro (aka Market St)"
如果您关心矢量顺序是从上到下(而不是从左到右然后从上到下,这是上述操作所做的),您可以转置输入 data.frame:
R> t(raw_data)[t(raw_data)!=''];
[1] "Palo Alto" "Castro (aka Market St)" "Cranston (aka Garden City Center)" "Millburn - Union" "Colorado Blvd"
在此示例中,每行只有一个元素不是 ''
。这是将 paste
与 do.call
do.call(paste, c(raw_data, sep=''))
#[1] "Palo Alto" "Castro (aka Market St)"
#[3] "Cranston (aka Garden City Center)" "Millburn - Union"
#[5] "Colorado Blvd"
假设,如果 'raw_data' 中每行有多个元素不是 ''
,在这种情况下,使用 sep=';' or
、`.[=19 可能会更好=]
raw_data[1,1] <- 'Millburn'
raw_data[1,3] <- 'Something'
gsub('^;+|;+$|(;);+', '\1', do.call(paste, c(raw_data, sep=';')))
#[1] "Millburn;Something;Palo Alto" "Castro (aka Market St)"
#[3] "Cranston (aka Garden City Center)" "Millburn - Union"
#[5] "Colorado Blvd"
apply
得到与上面相同的结果
unname(apply(raw_data, 1, FUN=function(x) paste(x[x!=''],collapse=';')))
#[1] "Millburn;Something;Palo Alto" "Castro (aka Market St)"
#[3] "Cranston (aka Garden City Center)" "Millburn - Union"
#[5] "Colorado Blvd"