用 R 中列出的条目中的最后一个值替换数据帧条目

Question

我有一个如下所示的数据框：

BaseRating    contRating Participant
5,4,6,3,2,4       5        01       
4                 4        01

我首先想要运行一些代码来查看数据框中是否有任何逗号，然后 returns 一个列号。我已经尝试了以下问题中的一些解决方案，在寻找逗号而不是 string/whole 值时，这些解决方案似乎不起作用？我可能在这里遗漏了一些简单的东西，但感谢您的帮助！

Selecting data frame rows based on partial string match in a column

Filter rows which contain a certain string

确定我的数据中是否有逗号后，我只想提取该条目中用逗号分隔的列表中的最后一个数字，并用该值替换该条目。例如，我希望 BaseRating 列中的第一行变为“4”，因为它在该列表中排在最后。

有没有办法在不手动更改数字的情况下在 R 中执行此操作？

Answer 1

下面是一个可能的解决方案。

解释

接下来，我将根据@milsandhills 的要求，解释 str_extract 函数中使用的 regex 表达式：

中间的符号|表示逻辑OR运算符。
我们使用它是因为 BaseRating 可以有多个数字或只有一个数字 — 因此需要使用 | 来分别处理每个案例。
|的left-hand表示由一位或多位数字(\d+)组成的数字，以(^)开头，完成字符串 ($).
|的right-hand边表示由一位或多位数字组成的数字(\d+)，结束字符串($ ).而(?<=\,)用于保证数字前面有逗号。

您可以在 stringr cheat sheet 找到更多详细信息。

library(tidyverse)

df <- data.frame(
  BaseRating = c("5,4,6,3,2,4", "4"),
  contRating = c(5L, 4L),
  Participant = c(1L, 1L)
)

df %>% 
  mutate(BaseRating = sapply(BaseRating, 
         function(x) str_extract(x, "^\d+$|(?<=\,)\d+$") %>% as.integer))

#>   BaseRating contRating Participant
#> 1          4          5           1
#> 2          4          4           1

或者：

library(tidyverse)

df %>% 
  separate_rows(BaseRating, sep = ",", convert = TRUE) %>% 
  group_by(contRating, Participant) %>% 
  summarise(BaseRating = last(BaseRating), .groups = "drop") %>% 
  relocate(BaseRating, .before = 1)

#> # A tibble: 2 × 3
#>   BaseRating contRating Participant
#>        <int>      <int>       <int>
#> 1          4          4           1
#> 2          4          5           1

Answer 2

如果我们想要快速选择，我们可以使用 base R

中的 trimws

df$BaseRating <- as.numeric(trimws(df$BaseRating, whitespace = ".*,"))

-输出

> df
  BaseRating contRating Participant
1          4          5           1
2          4          4           1

或者另一种选择是 stri_extract_last

library(stringi)
df$BaseRating <- as.numeric(stri_extract_last_regex(df$BaseRating, "\d+"))

数据

df <- structure(list(BaseRating = c("5,4,6,3,2,4", "4"), contRating = 5:4, 
    Participant = c(1L, 1L)), class = "data.frame", row.names = c(NA, 
-2L))

用 R 中列出的条目中的最后一个值替换数据帧条目

Replacing a dataframe entry with the last value in a listed entry in R

formatting

r

dataframe

data-wrangling

数据