用另一个字符串替换R中数据帧的每一行的子字符串
Substituting a substring with another string for each line of a dataframe in R
我有一个数据框,对于每一行,我想用 B 列中的值替换 A 列中的常规参数。
我可以用循环来做到这一点,但我不知道如何用 lapply 更快地做到这一点。
column A column B
hotels in {d} London
{d} city breaks Bangkok
cheap hotels {d} New York
我想要的结果是:
Column A
hotels in London
Bangkok city breaks
cheap hotels New York
我可以用这样的循环来做到这一点:
for (i in 1:nrow(df){
df$Column A[i] <- gsub("\{d\}",df$Column B[i], dfColumn A[i])
}
但是对于数百万行,这会很慢..
这里是 apply
的版本:
a<-c("{d} rises in the east", "{d} has come")
b <- c("the sun", "morning")
df <- data.frame(a = a, b = b)
df
#> a b
#> 1 {d} rises in the east the sun
#> 2 {d} has come morning
df$a <- apply(df, 1, function(row) {gsub("\{d\}", row[2], row[1])})
df
#> a b
#> 1 the sun rises in the east the sun
#> 2 morning has come morning
你可以用 stringr
一行完成,这是矢量化的...
library(stringr)
df$columnA <- str_replace(df$columnA, "\{d\}", df$columnB)
df
columnA columnB
1 hotels in London London
2 Bangkok city breaks Bangkok
3 cheap hotels New York New York
这是一个没有循环的基础 R 方法。
首先,读入数据。请注意,我稍微更改了列的名称。
df <- read.table(text = "
column.A column.B
'hotels in {d}' 'London'
'{d} city breaks' 'Bangkok'
'cheap hotels {d}' 'New York'
", header = TRUE, stringsAsFactors = FALSE)
df2 <- df # make a copy for results comparison
# your code
for (i in 1:nrow(df)){
df$column.A[i] <- gsub("\{d\}",df$column.B[i], df$column.A[i])
}
regmatches(df2$column.A, regexpr("\{d\}", df2$column.A)) <- df2$column.B
df2
# column.A column.B
#1 hotels in London London
#2 Bangkok city breaks Bangkok
#3 cheap hotels New York New York
identical(df, df2)
#[1] TRUE
我有一个数据框,对于每一行,我想用 B 列中的值替换 A 列中的常规参数。
我可以用循环来做到这一点,但我不知道如何用 lapply 更快地做到这一点。
column A column B
hotels in {d} London
{d} city breaks Bangkok
cheap hotels {d} New York
我想要的结果是:
Column A
hotels in London
Bangkok city breaks
cheap hotels New York
我可以用这样的循环来做到这一点:
for (i in 1:nrow(df){
df$Column A[i] <- gsub("\{d\}",df$Column B[i], dfColumn A[i])
}
但是对于数百万行,这会很慢..
这里是 apply
的版本:
a<-c("{d} rises in the east", "{d} has come")
b <- c("the sun", "morning")
df <- data.frame(a = a, b = b)
df
#> a b
#> 1 {d} rises in the east the sun
#> 2 {d} has come morning
df$a <- apply(df, 1, function(row) {gsub("\{d\}", row[2], row[1])})
df
#> a b
#> 1 the sun rises in the east the sun
#> 2 morning has come morning
你可以用 stringr
一行完成,这是矢量化的...
library(stringr)
df$columnA <- str_replace(df$columnA, "\{d\}", df$columnB)
df
columnA columnB
1 hotels in London London
2 Bangkok city breaks Bangkok
3 cheap hotels New York New York
这是一个没有循环的基础 R 方法。
首先,读入数据。请注意,我稍微更改了列的名称。
df <- read.table(text = "
column.A column.B
'hotels in {d}' 'London'
'{d} city breaks' 'Bangkok'
'cheap hotels {d}' 'New York'
", header = TRUE, stringsAsFactors = FALSE)
df2 <- df # make a copy for results comparison
# your code
for (i in 1:nrow(df)){
df$column.A[i] <- gsub("\{d\}",df$column.B[i], df$column.A[i])
}
regmatches(df2$column.A, regexpr("\{d\}", df2$column.A)) <- df2$column.B
df2
# column.A column.B
#1 hotels in London London
#2 Bangkok city breaks Bangkok
#3 cheap hotels New York New York
identical(df, df2)
#[1] TRUE