如何让 R 创建新列(从旧列中字符串的左侧部分命名),然后将旧列中的字符串右侧部分放入新列中
How to get R to create new column (named from left part of string in old column), and then put right part of string from old column into new column
给定包含字符列的现有数据框,如下所示 (oldColumn1),我想让 R 在同一数据框中自动创建一个新列,从字符串的左侧部分命名(例如 COLOR ).
然后对于每一行,将出现在“:”之后的字符串内容的右侧部分(例如 RED、BLUE、ETC)放入名为“COLOR”的新列中。
有许多旧列(oldColumn1、oldColumn2 等)需要像这样拆分出来,因此手动执行此操作是不切实际的。提前感谢您提供的任何帮助。
# Here is an example of 3 oldColumns that already exist in dataframe.
# There are thousands of these columns, need to auto create a new
# column for each one as described.
# Maybe hoping to have the oldColumn names in a vector, to then pass
# to a function that creates a new column for each oldColumn.
oldColumn1 <- c('COLOR: RED', 'COLOR: RED', 'COLOR: BLUE', 'COLOR: GREEN', 'COLOR: BLUE')
oldColumn2 <- c('SIZE: LARGE', 'SIZE: MEDIUM','SIZE: XLARGE','SIZE: MEDIUM','SIZE: SMALL')
oldColumn3 <- c('DESIGNSTYLE: STYLED', 'DESIGNSTYLE: ORIGINAL MAKER', 'DESIGNSTYLE: COUTURE','DESIGNSTYLE: COUTURE','DESIGNSTYLE: STYLED')
COLOR <- c('RED', 'RED', 'BLUE', 'GREEN', 'BLUE')
SIZE <- c('LARGE', 'MEDIUM', 'XLARGE', 'MEDIUM', 'SMALL')
DESIGNSTYLE <- c('STYLED', 'ORIGINAL MAKER', 'COUTURE', 'COUTURE', 'STYLED')
dat <- data.frame(oldColumn1, oldColumn2, oldColumn3, COLOR, SIZE, DESIGNSTYLE)
dat
您可以使用 $
创建一个新列,然后使用 gsub()
从目标列中删除 COLOR:
。
yourdf$COLOR <- gsub("COLOR: ", "", yourdf$oldColumn1)
如果您还想删除旧列:
yourdf$oldColumn1 <- NULL
编辑
如果您有很多列,您可以将 gsub
函数应用于所有目标列。如果您的目标列具有通用名称模式,例如示例中的 oldColumn
,您可以通过使用 grep
识别该模式来对数据框进行子集化。
之后,您可以将编辑的列重命名为COLOR1
、COLOR2
等
完整的步骤如下:
# Remove "COLOR: " from the targeted columns
colname_pattern <- grep("oldColumn", colnames(yourdf))
yourdf[, colname_pattern] <- apply(yourdf[, colname_pattern], 2,
gsub, pattern = "COLOR: ",
replacement = "")
# Rename the edited columns
index <- seq_along(colname_pattern)
newnames <- paste0("COLOR", index)
colnames(yourdf[, colname_pattern]) <- newnames
开始于
quux <- structure(list(oldColumn1 = c("COLOR: RED", "COLOR: RED", "COLOR: BLUE", "COLOR: GREEN", "COLOR: BLUE")), class = "data.frame", row.names = c(NA, -5L))
天真的方法是
data.frame(COLOR = trimws(sub("COLOR:", "", quux$oldColumn1)))
# COLOR
# 1 RED
# 2 RED
# 3 BLUE
# 4 GREEN
# 5 BLUE
但我假设您有更一般的需求。假设您还有更多内容需要解析,例如
quux <- structure(list(oldColumn1 = c("COLOR: RED", "COLOR: RED", "COLOR: BLUE", "COLOR: GREEN", "COLOR: BLUE", "SIZE: 1", "SIZE: 3", "SIZE: 5")), class = "data.frame", row.names = c(NA, -8L))
quux
# oldColumn1
# 1 COLOR: RED
# 2 COLOR: RED
# 3 COLOR: BLUE
# 4 COLOR: GREEN
# 5 COLOR: BLUE
# 6 SIZE: 1
# 7 SIZE: 3
# 8 SIZE: 5
然后我们可以将其概括为
tmp <- strcapture("(.*)\s*:\s*(.*)", quux$oldColumn1, list(k="", v=""))
tmp$ign <- ave(rep(1L, nrow(tmp)), tmp$k, FUN = seq_along)
reshape2::dcast(tmp, ign ~ k, value.var = "v")[,-1,drop=FALSE]
# COLOR SIZE
# 1 RED 1
# 2 RED 3
# 3 BLUE 5
# 4 GREEN <NA>
# 5 BLUE <NA>
--
编辑:更新数据的备选方案:
do.call(cbind, lapply(dat, function(X) {
nm <- sub(":.*", "", X[1])
out <- data.frame(trimws(sub(".*:", "", X)))
names(out) <- nm
out
}))
# COLOR SIZE DESIGNSTYLE
# 1 RED LARGE STYLED
# 2 RED MEDIUM ORIGINAL MAKER
# 3 BLUE XLARGE COUTURE
# 4 GREEN MEDIUM COUTURE
# 5 BLUE SMALL STYLED
给定包含字符列的现有数据框,如下所示 (oldColumn1),我想让 R 在同一数据框中自动创建一个新列,从字符串的左侧部分命名(例如 COLOR ).
然后对于每一行,将出现在“:”之后的字符串内容的右侧部分(例如 RED、BLUE、ETC)放入名为“COLOR”的新列中。
有许多旧列(oldColumn1、oldColumn2 等)需要像这样拆分出来,因此手动执行此操作是不切实际的。提前感谢您提供的任何帮助。
# Here is an example of 3 oldColumns that already exist in dataframe.
# There are thousands of these columns, need to auto create a new
# column for each one as described.
# Maybe hoping to have the oldColumn names in a vector, to then pass
# to a function that creates a new column for each oldColumn.
oldColumn1 <- c('COLOR: RED', 'COLOR: RED', 'COLOR: BLUE', 'COLOR: GREEN', 'COLOR: BLUE')
oldColumn2 <- c('SIZE: LARGE', 'SIZE: MEDIUM','SIZE: XLARGE','SIZE: MEDIUM','SIZE: SMALL')
oldColumn3 <- c('DESIGNSTYLE: STYLED', 'DESIGNSTYLE: ORIGINAL MAKER', 'DESIGNSTYLE: COUTURE','DESIGNSTYLE: COUTURE','DESIGNSTYLE: STYLED')
COLOR <- c('RED', 'RED', 'BLUE', 'GREEN', 'BLUE')
SIZE <- c('LARGE', 'MEDIUM', 'XLARGE', 'MEDIUM', 'SMALL')
DESIGNSTYLE <- c('STYLED', 'ORIGINAL MAKER', 'COUTURE', 'COUTURE', 'STYLED')
dat <- data.frame(oldColumn1, oldColumn2, oldColumn3, COLOR, SIZE, DESIGNSTYLE)
dat
您可以使用 $
创建一个新列,然后使用 gsub()
从目标列中删除 COLOR:
。
yourdf$COLOR <- gsub("COLOR: ", "", yourdf$oldColumn1)
如果您还想删除旧列:
yourdf$oldColumn1 <- NULL
编辑
如果您有很多列,您可以将 gsub
函数应用于所有目标列。如果您的目标列具有通用名称模式,例如示例中的 oldColumn
,您可以通过使用 grep
识别该模式来对数据框进行子集化。
之后,您可以将编辑的列重命名为COLOR1
、COLOR2
等
完整的步骤如下:
# Remove "COLOR: " from the targeted columns
colname_pattern <- grep("oldColumn", colnames(yourdf))
yourdf[, colname_pattern] <- apply(yourdf[, colname_pattern], 2,
gsub, pattern = "COLOR: ",
replacement = "")
# Rename the edited columns
index <- seq_along(colname_pattern)
newnames <- paste0("COLOR", index)
colnames(yourdf[, colname_pattern]) <- newnames
开始于
quux <- structure(list(oldColumn1 = c("COLOR: RED", "COLOR: RED", "COLOR: BLUE", "COLOR: GREEN", "COLOR: BLUE")), class = "data.frame", row.names = c(NA, -5L))
天真的方法是
data.frame(COLOR = trimws(sub("COLOR:", "", quux$oldColumn1)))
# COLOR
# 1 RED
# 2 RED
# 3 BLUE
# 4 GREEN
# 5 BLUE
但我假设您有更一般的需求。假设您还有更多内容需要解析,例如
quux <- structure(list(oldColumn1 = c("COLOR: RED", "COLOR: RED", "COLOR: BLUE", "COLOR: GREEN", "COLOR: BLUE", "SIZE: 1", "SIZE: 3", "SIZE: 5")), class = "data.frame", row.names = c(NA, -8L))
quux
# oldColumn1
# 1 COLOR: RED
# 2 COLOR: RED
# 3 COLOR: BLUE
# 4 COLOR: GREEN
# 5 COLOR: BLUE
# 6 SIZE: 1
# 7 SIZE: 3
# 8 SIZE: 5
然后我们可以将其概括为
tmp <- strcapture("(.*)\s*:\s*(.*)", quux$oldColumn1, list(k="", v=""))
tmp$ign <- ave(rep(1L, nrow(tmp)), tmp$k, FUN = seq_along)
reshape2::dcast(tmp, ign ~ k, value.var = "v")[,-1,drop=FALSE]
# COLOR SIZE
# 1 RED 1
# 2 RED 3
# 3 BLUE 5
# 4 GREEN <NA>
# 5 BLUE <NA>
--
编辑:更新数据的备选方案:
do.call(cbind, lapply(dat, function(X) {
nm <- sub(":.*", "", X[1])
out <- data.frame(trimws(sub(".*:", "", X)))
names(out) <- nm
out
}))
# COLOR SIZE DESIGNSTYLE
# 1 RED LARGE STYLED
# 2 RED MEDIUM ORIGINAL MAKER
# 3 BLUE XLARGE COUTURE
# 4 GREEN MEDIUM COUTURE
# 5 BLUE SMALL STYLED