R中的多行到一行
Multiples rows to one row in R
在 R 中,我有一个包含多个值的数据框。我想要一个数据框,将数据框转换为只有一行包含所有值的数据框。我有一个这样的数据框:
df <- data.frame(A = c("time", "time", "time"),
B = c("place", "place", "place"),
C = c(NA, 1, NA),
D = c(NA, NA, 2),
E = c(3, NA, NA),
`F` = c(4,4, NA),
G = c(NA, 5, NA))
A B C D E F G
1 time place NA NA 3 4 NA
2 time place 1 NA NA 4 5
3 time place NA 2 NA NA NA
我想要这样的数据框:
A B C D E F G
1 time place 1 2 3 4 5
我尝试像这里一样使用重塑函数:
并尝试了独特的功能,但我丢失了很多数据:Selecting unique rows in matrix using R
您可以将 na.omit() 与摘要一起使用,a la:
library(tidyverse)
df %>% group_by(A, B) %>%
summarise(C = mean(na.omit(C)),
D = mean(na.omit(D)),
E = mean(na.omit(E)),
F = mean(na.omit(F)),
G = mean(na.omit(G)))
您的示例数据在每列 C-G 中只有唯一值,因此根据您的评论,我使用 mean() 来获取非 NA 观察值的平均值。
数据:
df <- structure(list(A = c("time", "time", "time"), B = c("place",
"place", "place"), C = c(NA, 1L, NA), D = c(NA, NA, 2L), E = c(3L,
NA, NA), F = c(4L, 4L, NA), G = c(NA, 5L, NA)), class = "data.frame", row.names = c(NA,
-3L))
使用 colMeans
解决不同值的解决方案:
1.Create 测试数据
df <- structure(list(
A = c("time", "time", "time"),
B = c("place", "place", "place"),
C = c(NA, 1L, NA),
D = c(NA, NA, 2L),
E= c(3L, NA, NA),
F = c(4L, 4L, NA),
G = c(NA, 5L, NA)),
row.names = c(NA, -3L),
class = c("data.table", "data.frame"))
2.Use 前两列为 unique
,其余为 colMeans
,转换为 data.frame:
cbind(unique(df[, 1:2]), as.data.frame.list(colMeans(df[,3:7], na.rm = TRUE)))
Returns:
A B C D E F G
1: time place 1 2 3 4 5
我们可以按 'A'、'B' 和 select 第一个非 NA 元素 across
其他列
分组
library(dplyr)
df1 %>%
group_by(A, B) %>%
summarise(across(everything(), ~ .[order(is.na(.))][1]), .groups = 'drop')
-输出
# A tibble: 1 x 8
# A B C D E F G H
# <chr> <chr> <int> <int> <int> <int> <int> <lgl>
#1 time place 1 2 3 4 5 NA
或 coalesce
library(purrr)
df1 %>%
group_by(A, B) %>%
summarise(across(everything(), ~ reduce(., coalesce)), .groups = 'drop')
数据
df1 <- structure(list(A = c("time", "time", "time"), B = c("place",
"place", "place"), C = c(NA, 1L, NA), D = c(NA, NA, 2L), E = c(3L,
NA, NA), F = c(4L, NA, NA), G = c(NA, 5L, NA), H = c(NA, NA,
NA)), class = "data.frame", row.names = c(NA, -3L))
在 R 中,我有一个包含多个值的数据框。我想要一个数据框,将数据框转换为只有一行包含所有值的数据框。我有一个这样的数据框:
df <- data.frame(A = c("time", "time", "time"),
B = c("place", "place", "place"),
C = c(NA, 1, NA),
D = c(NA, NA, 2),
E = c(3, NA, NA),
`F` = c(4,4, NA),
G = c(NA, 5, NA))
A B C D E F G
1 time place NA NA 3 4 NA
2 time place 1 NA NA 4 5
3 time place NA 2 NA NA NA
我想要这样的数据框:
A B C D E F G
1 time place 1 2 3 4 5
我尝试像这里一样使用重塑函数:
并尝试了独特的功能,但我丢失了很多数据:Selecting unique rows in matrix using R
您可以将 na.omit() 与摘要一起使用,a la:
library(tidyverse)
df %>% group_by(A, B) %>%
summarise(C = mean(na.omit(C)),
D = mean(na.omit(D)),
E = mean(na.omit(E)),
F = mean(na.omit(F)),
G = mean(na.omit(G)))
您的示例数据在每列 C-G 中只有唯一值,因此根据您的评论,我使用 mean() 来获取非 NA 观察值的平均值。
数据:
df <- structure(list(A = c("time", "time", "time"), B = c("place",
"place", "place"), C = c(NA, 1L, NA), D = c(NA, NA, 2L), E = c(3L,
NA, NA), F = c(4L, 4L, NA), G = c(NA, 5L, NA)), class = "data.frame", row.names = c(NA,
-3L))
使用 colMeans
解决不同值的解决方案:
1.Create 测试数据
df <- structure(list(
A = c("time", "time", "time"),
B = c("place", "place", "place"),
C = c(NA, 1L, NA),
D = c(NA, NA, 2L),
E= c(3L, NA, NA),
F = c(4L, 4L, NA),
G = c(NA, 5L, NA)),
row.names = c(NA, -3L),
class = c("data.table", "data.frame"))
2.Use 前两列为 unique
,其余为 colMeans
,转换为 data.frame:
cbind(unique(df[, 1:2]), as.data.frame.list(colMeans(df[,3:7], na.rm = TRUE)))
Returns:
A B C D E F G
1: time place 1 2 3 4 5
我们可以按 'A'、'B' 和 select 第一个非 NA 元素 across
其他列
library(dplyr)
df1 %>%
group_by(A, B) %>%
summarise(across(everything(), ~ .[order(is.na(.))][1]), .groups = 'drop')
-输出
# A tibble: 1 x 8
# A B C D E F G H
# <chr> <chr> <int> <int> <int> <int> <int> <lgl>
#1 time place 1 2 3 4 5 NA
或 coalesce
library(purrr)
df1 %>%
group_by(A, B) %>%
summarise(across(everything(), ~ reduce(., coalesce)), .groups = 'drop')
数据
df1 <- structure(list(A = c("time", "time", "time"), B = c("place",
"place", "place"), C = c(NA, 1L, NA), D = c(NA, NA, 2L), E = c(3L,
NA, NA), F = c(4L, NA, NA), G = c(NA, 5L, NA), H = c(NA, NA,
NA)), class = "data.frame", row.names = c(NA, -3L))