如何使用 spread() 获得所需的输出
How to use spread() to get desired output
假设,我有一个如下所示的数据框
df1:
+------+--+------+--------+
| ID | | Type | Points |
+------+--+------+--------+
| DJ45 | | A | 69.2 |
| DJ45 | | F | 60.8 |
| DJ45 | | C | 2.9 |
| DJ46 | | B | 22.7 |
| DJ46 | | D | 18.7 |
| DJ46 | | A | 16.1 |
| DJ47 | | E | 67.2 |
| DJ47 | | C | 63.1 |
| DJ47 | | F | 16.7 |
| DJ48 | | D | 8.4 |
+------+--+------+------+
我想获得一个结果,它将提供以下格式的类型的前 2 个值(逐点):
输出:
+------+---------+---------+
| ID | Type1 | Type2 |
+------+---------+---------+
| DJ45 | A | F |
| DJ46 | B | D |
| DJ47 | E | C |
| DJ48 | D | NA |
我用过:
df1 %>%
group_by(Id) %>%
top_n(2,wt=Points) %>%
mutate(val = paste("Type", row_number())) %>%
filter(row_number()<=2) %>%
select(-Points) %>%
spread(val, Type)
但我得到以下答案:
输出:
+------+------+--------+---------+
| ID |Points|Type1 | Type2 |
+------+------+--------+---------+
| DJ45 | 69.2 | A | NA |
| DJ45 | 60.8 | NA | F |
| DJ46 | 22.7 | B | NA |
| DJ46 | 18.7 | NA | D |
| DJ47 | 67.2 | E | NA |
| DJ47 | 63.1 | NA | C |
| DJ48 | 8.4 | D | NA |
df <- read.table(header = T, stringsAsFactors = F, text = "
ID Type Points
DJ45 A 69.2
DJ45 F 60.8
DJ45 C 2.9
DJ46 B 22.7
DJ46 D 18.7
DJ46 A 16.1
DJ47 E 67.2
DJ47 C 63.1
DJ47 F 16.7
DJ48 D 8.4
")
library(dplyr)
library(tidyr)
df %>%
group_by(ID) %>%
top_n(2, wt = Points) %>%
arrange(-Points) %>%
mutate(Points = paste0('Type', row_number())) %>%
spread(Points, Type)
top_n(2, wt = Points)
根据 Points
在 ID 组 中过滤前两行
arrange(-Points)
降序排列
mutate(Points = paste0('Type', row_number()))
修改 Points
等于 'Type' + ID 组内的行号(1 到 2)
spread(Points, Type)
为 Points
中的每个唯一值创建列,并将 Type
的适当值放入其中
假设,我有一个如下所示的数据框
df1:
+------+--+------+--------+
| ID | | Type | Points |
+------+--+------+--------+
| DJ45 | | A | 69.2 |
| DJ45 | | F | 60.8 |
| DJ45 | | C | 2.9 |
| DJ46 | | B | 22.7 |
| DJ46 | | D | 18.7 |
| DJ46 | | A | 16.1 |
| DJ47 | | E | 67.2 |
| DJ47 | | C | 63.1 |
| DJ47 | | F | 16.7 |
| DJ48 | | D | 8.4 |
+------+--+------+------+
我想获得一个结果,它将提供以下格式的类型的前 2 个值(逐点):
输出:
+------+---------+---------+
| ID | Type1 | Type2 |
+------+---------+---------+
| DJ45 | A | F |
| DJ46 | B | D |
| DJ47 | E | C |
| DJ48 | D | NA |
我用过:
df1 %>%
group_by(Id) %>%
top_n(2,wt=Points) %>%
mutate(val = paste("Type", row_number())) %>%
filter(row_number()<=2) %>%
select(-Points) %>%
spread(val, Type)
但我得到以下答案:
输出:
+------+------+--------+---------+
| ID |Points|Type1 | Type2 |
+------+------+--------+---------+
| DJ45 | 69.2 | A | NA |
| DJ45 | 60.8 | NA | F |
| DJ46 | 22.7 | B | NA |
| DJ46 | 18.7 | NA | D |
| DJ47 | 67.2 | E | NA |
| DJ47 | 63.1 | NA | C |
| DJ48 | 8.4 | D | NA |
df <- read.table(header = T, stringsAsFactors = F, text = "
ID Type Points
DJ45 A 69.2
DJ45 F 60.8
DJ45 C 2.9
DJ46 B 22.7
DJ46 D 18.7
DJ46 A 16.1
DJ47 E 67.2
DJ47 C 63.1
DJ47 F 16.7
DJ48 D 8.4
")
library(dplyr)
library(tidyr)
df %>%
group_by(ID) %>%
top_n(2, wt = Points) %>%
arrange(-Points) %>%
mutate(Points = paste0('Type', row_number())) %>%
spread(Points, Type)
top_n(2, wt = Points)
根据Points
在 ID 组 中过滤前两行
arrange(-Points)
降序排列mutate(Points = paste0('Type', row_number()))
修改Points
等于 'Type' + ID 组内的行号(1 到 2)spread(Points, Type)
为Points
中的每个唯一值创建列,并将Type
的适当值放入其中