如何从数据框的每一列中提取非空值并制作一个列表?
How to extract non-empty values from each column of a dataframe and make a list?
我有一个如下所示的数据集,我想从每一列中提取非空单元格,同时保留 Date
信息。
df <- structure(list(Date = as.Date(c("6/25/2020", "6/26/2020", "6/27/2020"),
format = "%m/%d/%y"),
A = c("",2L,1L),B = c(3L,"",""),C = c(3L,2L,"")),
class = "data.frame", row.names = c("1", "2", "3"))
这是我正在寻找的结果:
Date Company Number
2020-06-26 A 2
2020-06-27 A 1
2020-06-25 B 3
2020-06-25 C 3
2020-06-26 C 2
您可以将 pivot_longer
与 values_drop_na = T
一起使用:
library(tidyverse)
df %>%
na_if("") %>%
pivot_longer(-Date, values_drop_na = T, names_to = "Company", values_to = "Number")
Date Company Number
<date> <chr> <chr>
1 2020-06-25 B 3
2 2020-06-25 C 3
3 2020-06-26 A 2
4 2020-06-26 C 2
5 2020-06-27 A 1
您还可以使用 pivot_longer
并使用 filter
处理空单元格:
df %>%
pivot_longer(-Date, names_to = "Company", values_to = "Number") %>%
filter(Number != "")
另一个可能的解决方案:
library(tidyverse)
df %>%
pivot_longer(A:C, names_to = "Company", values_to = "Number",
values_transform = list(Number = \(x) ifelse(x == "", NA, as.numeric(x))),
values_drop_na = T)
#> # A tibble: 5 × 3
#> Date Company Number
#> <date> <chr> <dbl>
#> 1 2020-06-25 B 3
#> 2 2020-06-25 C 3
#> 3 2020-06-26 A 2
#> 4 2020-06-26 C 2
#> 5 2020-06-27 A 1
将 base R
与 reshape
结合使用
out <- transform(na.omit(reshape(type.convert(df, as.is = TRUE),
idvar = 'Date', varying = list(2:4), v.names = 'Number',
direction = "long", timevar = "Company")), Company = names(df)[-1][Company])
row.names(out) <- NULL
我有一个如下所示的数据集,我想从每一列中提取非空单元格,同时保留 Date
信息。
df <- structure(list(Date = as.Date(c("6/25/2020", "6/26/2020", "6/27/2020"),
format = "%m/%d/%y"),
A = c("",2L,1L),B = c(3L,"",""),C = c(3L,2L,"")),
class = "data.frame", row.names = c("1", "2", "3"))
这是我正在寻找的结果:
Date Company Number
2020-06-26 A 2
2020-06-27 A 1
2020-06-25 B 3
2020-06-25 C 3
2020-06-26 C 2
您可以将 pivot_longer
与 values_drop_na = T
一起使用:
library(tidyverse)
df %>%
na_if("") %>%
pivot_longer(-Date, values_drop_na = T, names_to = "Company", values_to = "Number")
Date Company Number
<date> <chr> <chr>
1 2020-06-25 B 3
2 2020-06-25 C 3
3 2020-06-26 A 2
4 2020-06-26 C 2
5 2020-06-27 A 1
您还可以使用 pivot_longer
并使用 filter
处理空单元格:
df %>%
pivot_longer(-Date, names_to = "Company", values_to = "Number") %>%
filter(Number != "")
另一个可能的解决方案:
library(tidyverse)
df %>%
pivot_longer(A:C, names_to = "Company", values_to = "Number",
values_transform = list(Number = \(x) ifelse(x == "", NA, as.numeric(x))),
values_drop_na = T)
#> # A tibble: 5 × 3
#> Date Company Number
#> <date> <chr> <dbl>
#> 1 2020-06-25 B 3
#> 2 2020-06-25 C 3
#> 3 2020-06-26 A 2
#> 4 2020-06-26 C 2
#> 5 2020-06-27 A 1
将 base R
与 reshape
out <- transform(na.omit(reshape(type.convert(df, as.is = TRUE),
idvar = 'Date', varying = list(2:4), v.names = 'Number',
direction = "long", timevar = "Company")), Company = names(df)[-1][Company])
row.names(out) <- NULL