为不同的值添加列并计数到 R 中的相同 tibble
Add column for distinct values and count to same tibble in R
我想合并两个小标题。它们按相同的变量分组,但我希望在同一个 table 中看到它们。第一个是:
df %>%
filter(Cancelled == FALSE) %>%
count(School)
这给了我“学校”的计数:
School
count
Comm
42
IR
52
Business
34
Nursing
23
下一个是:
df%>%
filter(Cancelled == FALSE) %>%
group_by(School) %>%
summarise(n_distinct(ID))
这给了我每个“学校”中唯一“ID”值的计数。:
School
unique
Comm
17
IR
18
Business
14
Nursing
12
基本上,我希望计数为一行,唯一值计数为第二行:
School
count
unique
Comm
17
42
IR
18
52
Business
14
34
Nursing
12
23
提前致谢!
*编辑:更好地描述原始数据
dput(data)
structure(list(ID = c(1986, 3707, 2467, 3087, 2155, 3133, 2531,
3112, 2042, 2912, 1305, 1519, 2411, 3630, 2015, 2943, 2873, 1591,
3127, 3733, 3492, 3156, 3907, 3877, 2050, 2956, 1280, 3544, 1465,
1410, 3946, 2868, 2288, 3722, 1611, 3188, 3609, 2847, 1803, 2580,
1928, 1775, 2774, 1259, 3851, 2135, 3046, 1480, 2480, 2240, 3279,
3983, 2042, 3754, 1851, 3528, 3161, 2547, 3068, 2739, 3936, 3290,
2465, 2839, 2139, 2635, 1655, 3903, 2333, 1787, 2913, 2764, 2791,
1501, 2101, 3312, 3428, 3502, 1826, 3823, 3064, 2705, 1917, 1427,
1627, 1519, 3811, 3661, 3034, 1977, 2502, 3240, 1575, 2882, 3651,
2065, 2366, 2016, 2991, 1996), School = c("Nursing", "Business",
"Comm", "Nursing", "Business", "Nursing", "Nursing", "Nursing",
"Nursing", "Nursing", "IR", "Comm", "Nursing", "IR", "Nursing",
"Comm", "Business", "Business", "Business", "Nursing", "Nursing",
"Nursing", "Comm", "Nursing", "Business", "Nursing", "Comm",
"Business", "IR", "IR", "Nursing", "Business", "Business", "IR",
"Business", "Business", "Business", "Comm", "Nursing", "Comm",
"IR", "Nursing", "Nursing", "Nursing", "Nursing", "Comm", "Nursing",
"Business", "IR", "Comm", "Comm", "Business", "IR", "Nursing",
"Nursing", "IR", "Comm", "Business", "IR", "IR", "Nursing", "IR",
"Nursing", "Nursing", "Nursing", "Business", "Comm", "Nursing",
"IR", "IR", "Business", "Comm", "IR", "Nursing", "Nursing", "Business",
"Nursing", "Comm", "Business", "Business", "Nursing", "Nursing",
"Nursing", "Nursing", "Nursing", "Nursing", "Comm", "Nursing",
"IR", "Business", "Nursing", "Comm", "Nursing", "Comm", "Nursing",
"Nursing", "IR", "Business", "Nursing", "Comm")), row.names = c(NA,
-100L), class = c("tbl_df", "tbl", "data.frame"))
我们可以使用 left_join
:
library(dplyr)
left_join(df, df1, by="School")
School count unique
1 Comm 42 17
2 IR 52 18
3 Business 34 14
4 Nursing 23 12
你可以在一个管道中完成所有事情,但它不一定看起来更干净:
library(tidyverse)
data %>%
count(School, name = 'count') %>%
left_join(., data %>%
group_by(School) %>%
summarize(unique = n_distinct(ID)),
by = 'School')
其中给出了您的示例数据:
# A tibble: 4 x 3
School count unique
<chr> <int> <int>
1 Business 22 22
2 Comm 18 18
3 IR 17 17
4 Nursing 43 43
我 guess/assume 您的示例数据只是巧合,每个学校没有重复的 ID,因此计数和唯一值相同。
我想合并两个小标题。它们按相同的变量分组,但我希望在同一个 table 中看到它们。第一个是:
df %>%
filter(Cancelled == FALSE) %>%
count(School)
这给了我“学校”的计数:
School | count |
---|---|
Comm | 42 |
IR | 52 |
Business | 34 |
Nursing | 23 |
下一个是:
df%>%
filter(Cancelled == FALSE) %>%
group_by(School) %>%
summarise(n_distinct(ID))
这给了我每个“学校”中唯一“ID”值的计数。:
School | unique |
---|---|
Comm | 17 |
IR | 18 |
Business | 14 |
Nursing | 12 |
基本上,我希望计数为一行,唯一值计数为第二行:
School | count | unique |
---|---|---|
Comm | 17 | 42 |
IR | 18 | 52 |
Business | 14 | 34 |
Nursing | 12 | 23 |
提前致谢!
*编辑:更好地描述原始数据
dput(data)
structure(list(ID = c(1986, 3707, 2467, 3087, 2155, 3133, 2531,
3112, 2042, 2912, 1305, 1519, 2411, 3630, 2015, 2943, 2873, 1591,
3127, 3733, 3492, 3156, 3907, 3877, 2050, 2956, 1280, 3544, 1465,
1410, 3946, 2868, 2288, 3722, 1611, 3188, 3609, 2847, 1803, 2580,
1928, 1775, 2774, 1259, 3851, 2135, 3046, 1480, 2480, 2240, 3279,
3983, 2042, 3754, 1851, 3528, 3161, 2547, 3068, 2739, 3936, 3290,
2465, 2839, 2139, 2635, 1655, 3903, 2333, 1787, 2913, 2764, 2791,
1501, 2101, 3312, 3428, 3502, 1826, 3823, 3064, 2705, 1917, 1427,
1627, 1519, 3811, 3661, 3034, 1977, 2502, 3240, 1575, 2882, 3651,
2065, 2366, 2016, 2991, 1996), School = c("Nursing", "Business",
"Comm", "Nursing", "Business", "Nursing", "Nursing", "Nursing",
"Nursing", "Nursing", "IR", "Comm", "Nursing", "IR", "Nursing",
"Comm", "Business", "Business", "Business", "Nursing", "Nursing",
"Nursing", "Comm", "Nursing", "Business", "Nursing", "Comm",
"Business", "IR", "IR", "Nursing", "Business", "Business", "IR",
"Business", "Business", "Business", "Comm", "Nursing", "Comm",
"IR", "Nursing", "Nursing", "Nursing", "Nursing", "Comm", "Nursing",
"Business", "IR", "Comm", "Comm", "Business", "IR", "Nursing",
"Nursing", "IR", "Comm", "Business", "IR", "IR", "Nursing", "IR",
"Nursing", "Nursing", "Nursing", "Business", "Comm", "Nursing",
"IR", "IR", "Business", "Comm", "IR", "Nursing", "Nursing", "Business",
"Nursing", "Comm", "Business", "Business", "Nursing", "Nursing",
"Nursing", "Nursing", "Nursing", "Nursing", "Comm", "Nursing",
"IR", "Business", "Nursing", "Comm", "Nursing", "Comm", "Nursing",
"Nursing", "IR", "Business", "Nursing", "Comm")), row.names = c(NA,
-100L), class = c("tbl_df", "tbl", "data.frame"))
我们可以使用 left_join
:
library(dplyr)
left_join(df, df1, by="School")
School count unique
1 Comm 42 17
2 IR 52 18
3 Business 34 14
4 Nursing 23 12
你可以在一个管道中完成所有事情,但它不一定看起来更干净:
library(tidyverse)
data %>%
count(School, name = 'count') %>%
left_join(., data %>%
group_by(School) %>%
summarize(unique = n_distinct(ID)),
by = 'School')
其中给出了您的示例数据:
# A tibble: 4 x 3
School count unique
<chr> <int> <int>
1 Business 22 22
2 Comm 18 18
3 IR 17 17
4 Nursing 43 43
我 guess/assume 您的示例数据只是巧合,每个学校没有重复的 ID,因此计数和唯一值相同。