tidyverse中如何使用separate来拆分列？

Question

我使用separate()函数拆分列：Enterdateofexam2，它是字符格式，值如“25.07”，“13.09”，“16.06”......我的目标是将它拆分为天（ 25) 和 month(07)，然后使用 convert = true 将它们转换为数字以供下一步过滤。

我的代码是：

jimma3n <- jimma3 %>%
        select(Enterdateofexam2, Enterdayofexam, UniqueKey,MEDICALRECORD)%>%
        separate(Enterdateofexam2,into=c("day", "month"), sep=".", convert = TRUE)
view (jimma3n)

但 R 一直警告我：

Expected 2 pieces. Additional pieces discarded in 4088 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

那么谁能帮忙找出我的代码哪一部分有问题？谢谢~~！

Answer 1

我们可以使用extra参数。此外，默认情况下，sep 处于 regex 模式 - 根据 ?separate 文档

sep - If character, sep is interpreted as a regular expression. The default value is a regular expression that matches any sequence of non-alphanumeric values.

和.是元字符，可以匹配任何字符。因此，我们可能需要转义 (\.) 或将其放在方括号中 ([.])。此外，基于 dput，该列是 list，在执行 separate

之前应先对其进行 unnested

library(dplyr)
library(tidyr)
jimma3 %>%
      select(Enterdateofexam2, Enterdayofexam, UniqueKey,MEDICALRECORD)%>%
      unnest(Enterdateofexam2) %>%
      separate(Enterdateofexam2,into=c("day", "month"), 
              sep="\.", convert = TRUE, extra = "merge") %>% 
      na.omit

-输出

# A tibble: 6 x 5
    day month Enterdayofexam UniqueKey MEDICALRECORD
  <int> <int> <chr>          <chr>     <chr>        
1     7     6 1              530       577207       
2     8     6 2              530       577207       
3     9     6 3              530       577207       
4     2    12 1              531       575333       
5     3    12 2              531       575333       
6     4    12 3              531       575333

基本上，sep = "."，它在每个字符元素处拆分，因此弹出警告

数据

jimma3 <- structure(list(Enterdateofexam2 = list(c("", "7.06"), c("", "8.06"
), c("", "9.06"), c("", "2.12"), c("", "3.12"), c("", "4.12")), 
    Enterdayofexam = c("1", "2", "3", "1", "2", "3"), UniqueKey = c("530", 
    "530", "530", "531", "531", "531"), MEDICALRECORD = c("577207", 
    "577207", "577207", "575333", "575333", "575333")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

Answer 2

主要问题是您必须定义要分隔的列数。如果您定义 2 列，假设 a 和 b，并且您有 3 个元素要分开：假设 x y z，然后 z将被丢弃。

使用 separate 时，您必须定义新列，如果您不知道在 separate

之后需要多少列，这会很困难

考虑这个例子：在第 3 行中，您有 3 个元素：

df <- data.frame(x = c("x", "x y", "x y z", NA))
      x
1     x
2   x y
3 x y z
4  <NA>

使用此代码，您可以定义 2 列以分隔为

df %>% separate(x, c("a", "b"))

     a    b
1    x <NA>
2    x    y
3    x    y
4 <NA> <NA>

第 3 行 z 被丢弃，因为我们只定义了 2 列 a 和 b

如果我们像

一样定义3列

df %>% separate(x, c("a", "b", "c"))

丢弃警告将消失。

另一方面，如果 x 的元素少于 3 个，您将收到警告，这些元素将用 NA 填充。

tidyverse中如何使用separate来拆分列？

How to use separate in tidyverse to split a column?

r

dplyr

tidyr

tidyverse

数据