变量的多模式
Multiple Modes by variables
我似乎找不到问题的答案。
这是示例数据
Credit Card Type Bank Year Total Balance
MASTER CARD BOFA 2017 0
MASTER CARD BOFA 2017 0
MASTER CARD BOFA 2017 0
VISA Wells 2018
VISA Wells 2018
VISA Wells 2018
等等
我想弄清楚如何通过所有变量的总余额来获得模式
所以最后会变成这样
期望的输出:
Credit Card Type Bank Year Mode
MASTER CARD BOFA 2017 0
VISA Wells 2018
按照 Frank 的建议,使用 whosebug.com/q/2547402 中的 Mode
,使用 dplyr
很容易做到这一点。
library(dplyr)
df %>%
group_by(CreditCardType, Bank, Year) %>%
summarise(mode = Mode(TotalBalance))
其中 df
是:
df <- read.table(text = 'CreditCardType Bank Year TotalBalance
MASTERCARD BOFA 2017 0
MASTERCARD BOFA 2017 0
MASTERCARD BOFA 2017 0
VISA Wells 2018
VISA Wells 2018
VISA Wells 2018 ', header = T, stringsAsFactors = F)
来自这个问题
library(plyr)
getmode<- function(origtable,groupby,columnname) {
data <- ddply (origtable, groupby, .fun = function(xx){
c(m1 = paste(names(sort(table(xx[,columnname]),decreasing=TRUE)[1]))
) } )
return(data)
}
getmode(df,c("CreditCardType","Bank","Year"),"TotalBalance")
df<-read.table(text="CreditCardType Bank Year TotalBalance
MASTERCARD BOFA 2017 0
MASTERCARD BOFA 2017 0
MASTERCARD BOFA 2017 0
VISA Wells 2018
VISA Wells 2018
VISA Wells 2018 ", header=T, stringsAsFactors=F)
不同的dplyr
解决方案:
df %>%
add_count(Credit_Card_Type, Bank, Year, Total_Balance) %>%
filter(n == max(n)) %>%
distinct() %>%
select(-n)
考虑关系并选择第一个模式值:
df %>%
add_count(Credit_Card_Type, Bank, Year, Total_Balance) %>%
filter(n == max(n)) %>%
distinct() %>%
select(-n) %>%
group_by(Credit_Card_Type, Bank, Year) %>%
summarise(Total_Balance = first(Total_Balance))
数据:
df <- read.table(text = "Credit_Card_Type Bank Year Total_Balance
MASTER_CARD BOFA 2017 100
MASTER_CARD BOFA 2017 100
MASTER_CARD BOFA 2017 700
VISA Wells 2018 60
VISA Wells 2018 50
VISA Wells 2018 60", header = TRUE)
我找到了使用 data.table 和 modeest 包的解决方案。
library(data.table)
library(modeest)
dt <- data.table("Type"=c(rep("MASTERCARD",3),rep("VISA",3)),"Bank"=c(rep("BOFA",3),rep("Wells",3)),"Year"=c(rep(2017,3),rep(2018,3)),"TotalBalance"=c(100,100,700,60,50,60))
dt[,mfv(TotalBalance)[1],by=c("Type","Bank","Year")]
Type Bank Year V1
1: MASTERCARD BOFA 2017 100
2: VISA Wells 2018 60
我似乎找不到问题的答案。
这是示例数据
Credit Card Type Bank Year Total Balance
MASTER CARD BOFA 2017 0
MASTER CARD BOFA 2017 0
MASTER CARD BOFA 2017 0
VISA Wells 2018
VISA Wells 2018
VISA Wells 2018
等等
我想弄清楚如何通过所有变量的总余额来获得模式 所以最后会变成这样
期望的输出:
Credit Card Type Bank Year Mode
MASTER CARD BOFA 2017 0
VISA Wells 2018
按照 Frank 的建议,使用 whosebug.com/q/2547402 中的 Mode
,使用 dplyr
很容易做到这一点。
library(dplyr)
df %>%
group_by(CreditCardType, Bank, Year) %>%
summarise(mode = Mode(TotalBalance))
其中 df
是:
df <- read.table(text = 'CreditCardType Bank Year TotalBalance
MASTERCARD BOFA 2017 0
MASTERCARD BOFA 2017 0
MASTERCARD BOFA 2017 0
VISA Wells 2018
VISA Wells 2018
VISA Wells 2018 ', header = T, stringsAsFactors = F)
来自这个问题
library(plyr)
getmode<- function(origtable,groupby,columnname) {
data <- ddply (origtable, groupby, .fun = function(xx){
c(m1 = paste(names(sort(table(xx[,columnname]),decreasing=TRUE)[1]))
) } )
return(data)
}
getmode(df,c("CreditCardType","Bank","Year"),"TotalBalance")
df<-read.table(text="CreditCardType Bank Year TotalBalance
MASTERCARD BOFA 2017 0
MASTERCARD BOFA 2017 0
MASTERCARD BOFA 2017 0
VISA Wells 2018
VISA Wells 2018
VISA Wells 2018 ", header=T, stringsAsFactors=F)
不同的dplyr
解决方案:
df %>%
add_count(Credit_Card_Type, Bank, Year, Total_Balance) %>%
filter(n == max(n)) %>%
distinct() %>%
select(-n)
考虑关系并选择第一个模式值:
df %>%
add_count(Credit_Card_Type, Bank, Year, Total_Balance) %>%
filter(n == max(n)) %>%
distinct() %>%
select(-n) %>%
group_by(Credit_Card_Type, Bank, Year) %>%
summarise(Total_Balance = first(Total_Balance))
数据:
df <- read.table(text = "Credit_Card_Type Bank Year Total_Balance
MASTER_CARD BOFA 2017 100
MASTER_CARD BOFA 2017 100
MASTER_CARD BOFA 2017 700
VISA Wells 2018 60
VISA Wells 2018 50
VISA Wells 2018 60", header = TRUE)
我找到了使用 data.table 和 modeest 包的解决方案。
library(data.table)
library(modeest)
dt <- data.table("Type"=c(rep("MASTERCARD",3),rep("VISA",3)),"Bank"=c(rep("BOFA",3),rep("Wells",3)),"Year"=c(rep(2017,3),rep(2018,3)),"TotalBalance"=c(100,100,700,60,50,60))
dt[,mfv(TotalBalance)[1],by=c("Type","Bank","Year")]
Type Bank Year V1
1: MASTERCARD BOFA 2017 100
2: VISA Wells 2018 60