在 R 中拆分具有不同长度的数据框中的列

Question

我正在尝试拆分数据框中的 Awards 列，但拆分时该列 returns 不同数量的结果，如何将其绑定回原始数据框：

样本 DF：

        Name   Value     Awards
1       A1      NA      3 wins.
2       A2      1000    NA
3       A3      NA      2 wins.
4       A4      1999    1 win
5       A5      8178569 5 wins & 4 nominations.

预期结果：

        Name   Value     Awards                 AwardsNum  Cat
1       A1      NA      3 wins.                 3          A
2       A2      1000    NA                      NA         NA
3       A3      NA      2 wins.                 2          A
4       A4      1999    1 win                   1          A
5       A5      8178569 5 wins & 4 nominations. 9          C

所以基本上我需要在获胜和提名之前拆分奖项和每个数字我需要添加一个函数来总结它们然后根据函数的结果和范围提供类别（Cat）值

我有以下内容：

  strsplit(DF$Awards," ")
  cbind(DF,strsplit(DF$Awards," ") 

Error in data.frame(c("3", "wins."), "N/A", c("2", "wins."), c("1", "win." : 
arguments imply differing number of rows: 2, 1, 5

更新：类别 <--- 对于 NA，没有奖项和提名 - A <--- 1 到 5 之间 B 类 <-- 否则 C

I need to play around between B and C since I need to make sure that they are not more than 5:1 ratio between B and C

Answer 1

解决办法是使用正则表达式来匹配所有数字。然后你可以对它们求和并分配类别。

library(stringr)

df_new <- sapply(DF$Awards, function(x){
    # get all numbers
    nums <- unlist(str_match_all(x, "[0-9]+"))
    # calculate sum
    AwardsNum <- sum(as.numeric(nums))
    # assign category basing on sum
    if (is.na(AwardsNum)){
        Cat <- NA
    }else if(AwardsNum == 0){
        Cat <- "A"
    }else if(AwardsNum < 5){
        Cat <- "B"
    }else{
        Cat <- "C"
    }
    return(c(AwardsNum, Cat))
})

# create new rows in df
DF$AwardsNum <- as.numeric(df_new[1, ])
DF$Cat <- df_new[2, ]

Answer 2

我刚意识到@Istrel 在我处理这个问题时已经 post 给出了答案。无论如何我都会 post 我的，因为它略有不同。

df <- data.frame(
    Name = c("A1", "A2", "A3", "A4", "A5"),
    Value = c(NA, 1000, NA, 1999, 8178569),
    Awards = c("3 wins", NA, "2 wins", "1 win", "5 wins & 4 nomiations")
)

library(magrittr)
n.awards <- sapply(df$Awards, function(x){
    ifelse(is.na(x), 0,{
        x %>% as.character %>%
            strsplit("[^0-9]+") %>%
            unlist %>%
            as.numeric %>%
            sum
    })
})
brks <- c(-0.1,0.9,4.9, 100)
cc <- cut(n.awards,brks)
cat <- c("A", "B", "C")
df.final <- cbind(df, AwardsNum = n.awards, Cat = cat[cc])

使用 cut，您可以在不使用多个 if 语句的情况下对向量进行分组。

在 R 中拆分具有不同长度的数据框中的列

in R split a column in a dataframe with different length

r

apply

strsplit

sapply