计算最高比率
Calculate highest ratio
我正在尝试为具有给定值的列找到最高比率。
假设我的数据是这样的:
Job company
=========================
accountant Bank
accountant Insurance Co
Manager Bank
Manager Bank
accountant Insurance Co
如果我想找到给定公司(例如银行)的会计与经理的最高比率,如何使用分组依据?
我正在尝试类似的方法但没有用,
MyData %>%
count( MyData$Job,MyData$company) %>%
group_by(MyData$Job) %>%
mutate(prop = MyData$Job[accountant] / sum(MyData$Job[accountant])) %>%
spread(key = company[bank], value = prop)
count()
是 group_by()
+ tally()
+ ungroup()
的包装器。否则,根据您的问题,听起来好像您不需要再次 group_by()
。
另外,这里可以直接引用变量名,不用$
符号。
示例数据:
set.seed(1)
mydata <- data.frame(
Job = sample(c("Acct", "Manager"), size = 50, replace = TRUE),
Company = sample(c("Bank", "Insurance"), size = 50, replace = TRUE)
)
> head(mydata)
Job Company
1 Acct Bank
2 Acct Insurance
3 Manager Bank
4 Manager Bank
5 Acct Bank
6 Manager Bank
代码:
count()
计算每个公司内每个工作的数量:
library(dplyr)
mydata %>%
count(Job, Company)
# A tibble: 4 x 3
Job Company n
<fctr> <fctr> <int>
1 Acct Bank 17
2 Acct Insurance 6
3 Manager Bank 12
4 Manager Insurance 15
spread()
重新排列数据框,使每个作业都在自己的列中。在这种情况下,每个公司都留在自己的行中:
library(tidyr)
mydata %>%
count(Job, Company) %>%
spread(Job, n)
# A tibble: 2 x 3
Company Acct Manager
* <fctr> <int> <int>
1 Bank 17 12
2 Insurance 6 15
如果你想计算会计/经理的比率,你可以直接这样做:
mydata %>%
count(Job, Company) %>%
spread(Job, n) %>%
mutate(p = Acct / Manager) %>%
arrange(desc(p))
# A tibble: 2 x 4
Company Acct Manager p
<fctr> <int> <int> <dbl>
1 Bank 17 12 1.42
2 Insurance 6 15 0.400
我正在尝试为具有给定值的列找到最高比率。 假设我的数据是这样的:
Job company
=========================
accountant Bank
accountant Insurance Co
Manager Bank
Manager Bank
accountant Insurance Co
如果我想找到给定公司(例如银行)的会计与经理的最高比率,如何使用分组依据?
我正在尝试类似的方法但没有用,
MyData %>%
count( MyData$Job,MyData$company) %>%
group_by(MyData$Job) %>%
mutate(prop = MyData$Job[accountant] / sum(MyData$Job[accountant])) %>%
spread(key = company[bank], value = prop)
count()
是 group_by()
+ tally()
+ ungroup()
的包装器。否则,根据您的问题,听起来好像您不需要再次 group_by()
。
另外,这里可以直接引用变量名,不用$
符号。
示例数据:
set.seed(1)
mydata <- data.frame(
Job = sample(c("Acct", "Manager"), size = 50, replace = TRUE),
Company = sample(c("Bank", "Insurance"), size = 50, replace = TRUE)
)
> head(mydata)
Job Company
1 Acct Bank
2 Acct Insurance
3 Manager Bank
4 Manager Bank
5 Acct Bank
6 Manager Bank
代码:
count()
计算每个公司内每个工作的数量:
library(dplyr)
mydata %>%
count(Job, Company)
# A tibble: 4 x 3
Job Company n
<fctr> <fctr> <int>
1 Acct Bank 17
2 Acct Insurance 6
3 Manager Bank 12
4 Manager Insurance 15
spread()
重新排列数据框,使每个作业都在自己的列中。在这种情况下,每个公司都留在自己的行中:
library(tidyr)
mydata %>%
count(Job, Company) %>%
spread(Job, n)
# A tibble: 2 x 3
Company Acct Manager
* <fctr> <int> <int>
1 Bank 17 12
2 Insurance 6 15
如果你想计算会计/经理的比率,你可以直接这样做:
mydata %>%
count(Job, Company) %>%
spread(Job, n) %>%
mutate(p = Acct / Manager) %>%
arrange(desc(p))
# A tibble: 2 x 4
Company Acct Manager p
<fctr> <int> <int> <dbl>
1 Bank 17 12 1.42
2 Insurance 6 15 0.400