无法在 r 中准确地从长格式转换为宽格式
Cannot accurately convert from long format to wide in r
我正在尝试使用以下代码将长格式转换为宽格式。
data_ige<-read.csv("serology.csv",header = TRUE,na.strings=0)</p>
<p>library(tidyverse)
library(magrittr)</p>
<p>data_new <- data_ige %>% spread(test, value)
我有以下数据集
existing dataset
在 运行 代码之后,它会转换日期(但不是我想要的方式),如下图所示,以黄色突出显示的项目表明这些值出现在多行中,但它们应该在第一行而不是新行。每个患者都有 1 次访问或 2 次访问的数据。所以 1 次访问的所有测试结果,我想在一行中看到它们,在第二行中看到 2 次访问的测试结果。
After transformation
此屏幕截图显示了预期的结果。
desired outcome
我们需要创建一个序列列,因为有重复项
library(dplyr)
library(tidyr)
data_ige %>%
group_by(ID, date, test) %>%
mutate(rn = row_number()) %>%
ungroup %>%
spread(test, value) %>%
#or use pivot_wider as spread is getting deprecated
# pivot_wider(names_from = test, values_from = value) %>%
select(-rn)
# A tibble: 8 x 9
# ID date `1` `3` `4` `5` `6` `7` `8`
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 A 2008 0.035 NA NA NA NA NA NA
#2 A 2011 2.75 NA NA NA NA NA NA
#3 B 2011 9.99 3.65 0.68 0.02 0.17 0.5 NA
#4 C 2008 0 NA NA NA NA NA NA
#5 C 2011 NA NA NA NA NA NA 0.09
#6 D 2008 0 0 0 0 0 0.59 0
#7 D 2011 0 0.49 0.2 0.08 0.16 0.5 0.13
#8 D 2011 9.99 NA NA NA NA NA NA
数据
data_ige <- structure(list(ID = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L), .Label = c("A", "B", "C", "D"), class = "factor"), date = c(2008,
2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2008, 2011, 2008,
2008, 2008, 2008, 2008, 2008, 2008, 2011, 2011, 2011, 2011, 2011,
2011, 2011), test = c(1, 1, 1, 3, 4, 5, 6, 7, 8, 1, 1, 1, 3,
4, 5, 6, 7, 8, 1, 3, 4, 5, 6, 7, 8), value = c(0.035, 2.75, 9.99,
3.65, 0.68, 0.02, 0.17, 0.5, 0.09, 0, 0, 0, 0, 0, 0, 0, 0.59,
0, 9.99, 0.49, 0.2, 0.08, 0.16, 0.5, 0.13)),
class = "data.frame", row.names = c(NA,
-25L))
当我使用 table 你 link 时,它实际上对我来说很好用。您的数据可能存在问题,即您可能将导入的字符串作为因素或类似因素。尝试使用我在下面提供的数据:
data_ige %>% spread(test, value)
#### OUTPUT ####
ID date 1 3 4 5 6 7 8
1 A 2008 0.035 NA NA NA NA NA NA
2 A 2011 2.750 NA NA NA NA NA NA
3 B 2011 9.990 3.65 0.68 0.02 0.17 0.50 0.09
4 C 2008 0.000 NA NA NA NA NA NA
5 C 2011 0.000 NA NA NA NA NA NA
6 D 2008 0.000 0.00 0.00 0.00 0.00 0.59 0.00
7 D 2011 9.990 0.49 0.20 0.08 0.16 0.50 0.13
您可能想要做的一件事是为 test == 2
添加一行,这不在您的数据中。这样你就会得到一个只有 NA
的列 2
,就像你 link 到:
的数据框的图像一样
data_ige %>%
add_row(ID = "A", date = 2008, test = 2) %>%
spread(test, value)
#### OUTPUT ####
ID date 1 2 3 4 5 6 7 8
1 A 2008 0.035 NA NA NA NA NA NA NA
2 A 2011 2.750 NA NA NA NA NA NA NA
3 B 2011 9.990 NA 3.65 0.68 0.02 0.17 0.50 0.09
4 C 2008 0.000 NA NA NA NA NA NA NA
5 C 2011 0.000 NA NA NA NA NA NA NA
6 D 2008 0.000 NA 0.00 0.00 0.00 0.00 0.59 0.00
7 D 2011 9.990 NA 0.49 0.20 0.08 0.16 0.50 0.13
这是我使用的数据框:
data_ige <- structure(list(ID = c("A", "A", "B", "B", "B", "B", "B", "B",
"B", "C", "C", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D",
"D", "D", "D", "D"), date = c(2008L, 2011L, 2011L, 2011L, 2011L,
2011L, 2011L, 2011L, 2011L, 2008L, 2011L, 2008L, 2008L, 2008L,
2008L, 2008L, 2008L, 2008L, 2011L, 2011L, 2011L, 2011L, 2011L,
2011L, 2011L), test = c(1L, 1L, 1L, 3L, 4L, 5L, 6L, 7L, 8L, 1L,
1L, 1L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 3L, 4L, 5L, 6L, 7L, 8L),
value = c(0.035, 2.75, 9.99, 3.65, 0.68, 0.02, 0.17, 0.5,
0.09, 0, 0, 0, 0, 0, 0, 0, 0.59, 0, 9.99, 0.49, 0.2, 0.08,
0.16, 0.5, 0.13)), class = "data.frame", row.names = c(NA,
-25L))
我正在尝试使用以下代码将长格式转换为宽格式。
data_ige<-read.csv("serology.csv",header = TRUE,na.strings=0)</p>
<p>library(tidyverse)
library(magrittr)</p>
<p>data_new <- data_ige %>% spread(test, value)
我有以下数据集 existing dataset
在 运行 代码之后,它会转换日期(但不是我想要的方式),如下图所示,以黄色突出显示的项目表明这些值出现在多行中,但它们应该在第一行而不是新行。每个患者都有 1 次访问或 2 次访问的数据。所以 1 次访问的所有测试结果,我想在一行中看到它们,在第二行中看到 2 次访问的测试结果。
After transformation
此屏幕截图显示了预期的结果。
desired outcome
我们需要创建一个序列列,因为有重复项
library(dplyr)
library(tidyr)
data_ige %>%
group_by(ID, date, test) %>%
mutate(rn = row_number()) %>%
ungroup %>%
spread(test, value) %>%
#or use pivot_wider as spread is getting deprecated
# pivot_wider(names_from = test, values_from = value) %>%
select(-rn)
# A tibble: 8 x 9
# ID date `1` `3` `4` `5` `6` `7` `8`
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 A 2008 0.035 NA NA NA NA NA NA
#2 A 2011 2.75 NA NA NA NA NA NA
#3 B 2011 9.99 3.65 0.68 0.02 0.17 0.5 NA
#4 C 2008 0 NA NA NA NA NA NA
#5 C 2011 NA NA NA NA NA NA 0.09
#6 D 2008 0 0 0 0 0 0.59 0
#7 D 2011 0 0.49 0.2 0.08 0.16 0.5 0.13
#8 D 2011 9.99 NA NA NA NA NA NA
数据
data_ige <- structure(list(ID = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L), .Label = c("A", "B", "C", "D"), class = "factor"), date = c(2008,
2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2008, 2011, 2008,
2008, 2008, 2008, 2008, 2008, 2008, 2011, 2011, 2011, 2011, 2011,
2011, 2011), test = c(1, 1, 1, 3, 4, 5, 6, 7, 8, 1, 1, 1, 3,
4, 5, 6, 7, 8, 1, 3, 4, 5, 6, 7, 8), value = c(0.035, 2.75, 9.99,
3.65, 0.68, 0.02, 0.17, 0.5, 0.09, 0, 0, 0, 0, 0, 0, 0, 0.59,
0, 9.99, 0.49, 0.2, 0.08, 0.16, 0.5, 0.13)),
class = "data.frame", row.names = c(NA,
-25L))
当我使用 table 你 link 时,它实际上对我来说很好用。您的数据可能存在问题,即您可能将导入的字符串作为因素或类似因素。尝试使用我在下面提供的数据:
data_ige %>% spread(test, value)
#### OUTPUT ####
ID date 1 3 4 5 6 7 8
1 A 2008 0.035 NA NA NA NA NA NA
2 A 2011 2.750 NA NA NA NA NA NA
3 B 2011 9.990 3.65 0.68 0.02 0.17 0.50 0.09
4 C 2008 0.000 NA NA NA NA NA NA
5 C 2011 0.000 NA NA NA NA NA NA
6 D 2008 0.000 0.00 0.00 0.00 0.00 0.59 0.00
7 D 2011 9.990 0.49 0.20 0.08 0.16 0.50 0.13
您可能想要做的一件事是为 test == 2
添加一行,这不在您的数据中。这样你就会得到一个只有 NA
的列 2
,就像你 link 到:
data_ige %>%
add_row(ID = "A", date = 2008, test = 2) %>%
spread(test, value)
#### OUTPUT ####
ID date 1 2 3 4 5 6 7 8
1 A 2008 0.035 NA NA NA NA NA NA NA
2 A 2011 2.750 NA NA NA NA NA NA NA
3 B 2011 9.990 NA 3.65 0.68 0.02 0.17 0.50 0.09
4 C 2008 0.000 NA NA NA NA NA NA NA
5 C 2011 0.000 NA NA NA NA NA NA NA
6 D 2008 0.000 NA 0.00 0.00 0.00 0.00 0.59 0.00
7 D 2011 9.990 NA 0.49 0.20 0.08 0.16 0.50 0.13
这是我使用的数据框:
data_ige <- structure(list(ID = c("A", "A", "B", "B", "B", "B", "B", "B",
"B", "C", "C", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D",
"D", "D", "D", "D"), date = c(2008L, 2011L, 2011L, 2011L, 2011L,
2011L, 2011L, 2011L, 2011L, 2008L, 2011L, 2008L, 2008L, 2008L,
2008L, 2008L, 2008L, 2008L, 2011L, 2011L, 2011L, 2011L, 2011L,
2011L, 2011L), test = c(1L, 1L, 1L, 3L, 4L, 5L, 6L, 7L, 8L, 1L,
1L, 1L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 3L, 4L, 5L, 6L, 7L, 8L),
value = c(0.035, 2.75, 9.99, 3.65, 0.68, 0.02, 0.17, 0.5,
0.09, 0, 0, 0, 0, 0, 0, 0, 0.59, 0, 9.99, 0.49, 0.2, 0.08,
0.16, 0.5, 0.13)), class = "data.frame", row.names = c(NA,
-25L))