R:使用 'spread' 函数旋转
R: Pivoting using 'spread' function
继续我之前的 ,我现在多了 1 列 ID 值,我需要使用它们将行转换为列。
NUM <- c(1,2,3,1,2,3,1,2,3,1)
ID <- c("DJ45","DJ45","DJ45","DJ46","DJ46","DJ46","DJ47","DJ47","DJ47","DJ48")
Type <- c("A", "F", "C", "B", "D", "A", "E", "C", "F", "D")
Points <- c(9.2,60.8,22.9,1012.7,18.7,11.1,67.2,63.1,16.7,58.4)
df1 <- data.frame(ID,NUM,Type,Points)
df1:
+------+-----+------+--------+
| ID | Num | Type | Points |
+------+-----+------+--------+
| DJ45 | 1 | A | 9.2 |
| DJ45 | 2 | F | 60.8 |
| DJ45 | 3 | C | 22.9 |
| DJ46 | 1 | B | 1012.7 |
| DJ46 | 2 | D | 18.7 |
| DJ46 | 3 | A | 11.1 |
| DJ47 | 1 | E | 67.2 |
| DJ47 | 2 | C | 63.1 |
| DJ47 | 3 | F | 16.7 |
| DJ48 | 1 | D | 58.4 |
+------+-----+------+--------+
我想要的输出是
+------+-----+------+--------+------+------+------+------+
| ID | Num | A | B | C | D | E | F |
+------+-----+------+--------+------+------+------+------+
| DJ45 | 1 | 9.2 | N/A | N/A | N/A | N/A | N/A |
| DJ45 | 2 | N/A | N/A | N/A | N/A | N/A | 60.8 |
| DJ45 | 3 | N/A | N/A | 22.9 | N/A | N/A | N/A |
| DJ46 | 1 | N/A | 1012.7 | N/A | N/A | N/A | N/A |
| DJ46 | 2 | N/A | N/A | N/A | 18.7 | N/A | N/A |
| DJ46 | 3 | 11.1 | N/A | N/A | N/A | N/A | N/A |
| DJ47 | 1 | N/A | N/A | N/A | N/A | 67.2 | N/A |
| DJ47 | 2 | N/A | N/A | 63.1 | N/A | N/A | N/A |
| DJ47 | 3 | N/A | N/A | N/A | N/A | N/A | 16.7 |
| DJ48 | 1 | N/A | N/A | N/A | 58.4 | N/A | N/A |
+------+-----+------+--------+------+------+------+------+
我在 R 中使用 spread
函数,但收到错误提示重复标识符。这是因为我现在有 2 列(ID 和 NUM),而不是以前的一列(NUM)。请让我知道我该怎么做。
不知道你试过什么,我建议:
spread(df1, Type, Points)
# ID NUM A B C D E F
# 1 DJ45 1 9.2 NA NA NA NA NA
# 2 DJ45 2 NA NA NA NA NA 60.8
# 3 DJ45 3 NA NA 22.9 NA NA NA
# 4 DJ46 1 NA 1012.7 NA NA NA NA
# 5 DJ46 2 NA NA NA 18.7 NA NA
# 6 DJ46 3 11.1 NA NA NA NA NA
# 7 DJ47 1 NA NA NA NA 67.2 NA
# 8 DJ47 2 NA NA 63.1 NA NA NA
# 9 DJ47 3 NA NA NA NA NA 16.7
# 10 DJ48 1 NA NA NA 58.4 NA NA
如果您收到关于重复标识符的错误,那是因为您的实际数据中 "ID" 和 "Num" 的组合有一个或多个重复条目(在您的样本数据中,它们不't).
如果是这种情况,您需要添加另一列以使其唯一。
将dplyr
添加到链中,它可能是这样的:
df1 %>%
group_by(ID, NUM) %>%
mutate(id2 = sequence(n())) %>%
spread(Type, Points)
的演示假定错误:
df2 <- rbind(df1, df1[1:3, ]) ## Duplicate the first three rows
spread(df2, Type, Points)
# Error: Duplicate identifiers for rows (1, 11), (3, 13), (2, 12)
library(dplyr)
df2 %>%
group_by(ID, NUM) %>%
mutate(id2 = sequence(n())) %>%
spread(Type, Points)
# Source: local data frame [13 x 9]
#
# ID NUM id2 A B C D E F
# 1 DJ45 1 1 9.2 NA NA NA NA NA
# 2 DJ45 1 2 9.2 NA NA NA NA NA
# 3 DJ45 2 1 NA NA NA NA NA 60.8
# 4 DJ45 2 2 NA NA NA NA NA 60.8
# 5 DJ45 3 1 NA NA 22.9 NA NA NA
# 6 DJ45 3 2 NA NA 22.9 NA NA NA
# 7 DJ46 1 1 NA 1012.7 NA NA NA NA
# 8 DJ46 2 1 NA NA NA 18.7 NA NA
# 9 DJ46 3 1 11.1 NA NA NA NA NA
# 10 DJ47 1 1 NA NA NA NA 67.2 NA
# 11 DJ47 2 1 NA NA 63.1 NA NA NA
# 12 DJ47 3 1 NA NA NA NA NA 16.7
# 13 DJ48 1 1 NA NA NA 58.4 NA NA
继续我之前的
NUM <- c(1,2,3,1,2,3,1,2,3,1)
ID <- c("DJ45","DJ45","DJ45","DJ46","DJ46","DJ46","DJ47","DJ47","DJ47","DJ48")
Type <- c("A", "F", "C", "B", "D", "A", "E", "C", "F", "D")
Points <- c(9.2,60.8,22.9,1012.7,18.7,11.1,67.2,63.1,16.7,58.4)
df1 <- data.frame(ID,NUM,Type,Points)
df1:
+------+-----+------+--------+
| ID | Num | Type | Points |
+------+-----+------+--------+
| DJ45 | 1 | A | 9.2 |
| DJ45 | 2 | F | 60.8 |
| DJ45 | 3 | C | 22.9 |
| DJ46 | 1 | B | 1012.7 |
| DJ46 | 2 | D | 18.7 |
| DJ46 | 3 | A | 11.1 |
| DJ47 | 1 | E | 67.2 |
| DJ47 | 2 | C | 63.1 |
| DJ47 | 3 | F | 16.7 |
| DJ48 | 1 | D | 58.4 |
+------+-----+------+--------+
我想要的输出是
+------+-----+------+--------+------+------+------+------+
| ID | Num | A | B | C | D | E | F |
+------+-----+------+--------+------+------+------+------+
| DJ45 | 1 | 9.2 | N/A | N/A | N/A | N/A | N/A |
| DJ45 | 2 | N/A | N/A | N/A | N/A | N/A | 60.8 |
| DJ45 | 3 | N/A | N/A | 22.9 | N/A | N/A | N/A |
| DJ46 | 1 | N/A | 1012.7 | N/A | N/A | N/A | N/A |
| DJ46 | 2 | N/A | N/A | N/A | 18.7 | N/A | N/A |
| DJ46 | 3 | 11.1 | N/A | N/A | N/A | N/A | N/A |
| DJ47 | 1 | N/A | N/A | N/A | N/A | 67.2 | N/A |
| DJ47 | 2 | N/A | N/A | 63.1 | N/A | N/A | N/A |
| DJ47 | 3 | N/A | N/A | N/A | N/A | N/A | 16.7 |
| DJ48 | 1 | N/A | N/A | N/A | 58.4 | N/A | N/A |
+------+-----+------+--------+------+------+------+------+
我在 R 中使用 spread
函数,但收到错误提示重复标识符。这是因为我现在有 2 列(ID 和 NUM),而不是以前的一列(NUM)。请让我知道我该怎么做。
不知道你试过什么,我建议:
spread(df1, Type, Points)
# ID NUM A B C D E F
# 1 DJ45 1 9.2 NA NA NA NA NA
# 2 DJ45 2 NA NA NA NA NA 60.8
# 3 DJ45 3 NA NA 22.9 NA NA NA
# 4 DJ46 1 NA 1012.7 NA NA NA NA
# 5 DJ46 2 NA NA NA 18.7 NA NA
# 6 DJ46 3 11.1 NA NA NA NA NA
# 7 DJ47 1 NA NA NA NA 67.2 NA
# 8 DJ47 2 NA NA 63.1 NA NA NA
# 9 DJ47 3 NA NA NA NA NA 16.7
# 10 DJ48 1 NA NA NA 58.4 NA NA
如果您收到关于重复标识符的错误,那是因为您的实际数据中 "ID" 和 "Num" 的组合有一个或多个重复条目(在您的样本数据中,它们不't).
如果是这种情况,您需要添加另一列以使其唯一。
将dplyr
添加到链中,它可能是这样的:
df1 %>%
group_by(ID, NUM) %>%
mutate(id2 = sequence(n())) %>%
spread(Type, Points)
的演示假定错误:
df2 <- rbind(df1, df1[1:3, ]) ## Duplicate the first three rows
spread(df2, Type, Points)
# Error: Duplicate identifiers for rows (1, 11), (3, 13), (2, 12)
library(dplyr)
df2 %>%
group_by(ID, NUM) %>%
mutate(id2 = sequence(n())) %>%
spread(Type, Points)
# Source: local data frame [13 x 9]
#
# ID NUM id2 A B C D E F
# 1 DJ45 1 1 9.2 NA NA NA NA NA
# 2 DJ45 1 2 9.2 NA NA NA NA NA
# 3 DJ45 2 1 NA NA NA NA NA 60.8
# 4 DJ45 2 2 NA NA NA NA NA 60.8
# 5 DJ45 3 1 NA NA 22.9 NA NA NA
# 6 DJ45 3 2 NA NA 22.9 NA NA NA
# 7 DJ46 1 1 NA 1012.7 NA NA NA NA
# 8 DJ46 2 1 NA NA NA 18.7 NA NA
# 9 DJ46 3 1 11.1 NA NA NA NA NA
# 10 DJ47 1 1 NA NA NA NA 67.2 NA
# 11 DJ47 2 1 NA NA 63.1 NA NA NA
# 12 DJ47 3 1 NA NA NA NA NA 16.7
# 13 DJ48 1 1 NA NA NA 58.4 NA NA