函数 na.rv(T)、na.omit、is.finite 等不适用于列的平均值
functions na.rv(T), na.omit, is.finite, etc. don't work for the mean of a column
我正在尝试计算大 df 的平均值,将观察结果除以 Id 和月份,然后 none 我找到的答案按预期工作,有时它们会清空我的样本,这没有用。
如果 df 是:
permno company amihud illiq MonthYr
10026 J & J SNACK FOODS CORP 1.389026403 1.625 1990-01
10026 J & J SNACK FOODS CORP 1.028968686 NA 1990-01
10026 J & J SNACK FOODS CORP NA NA 1990-01
10026 J & J SNACK FOODS CORP NA NA 1990-01
10026 J & J SNACK FOODS CORP Inf NA 1990-01
10026 J & J SNACK FOODS CORP Inf NA 1990-02
10026 J & J SNACK FOODS CORP 0.891034483 NA 1990-02
10397 WERNER ENTERPRISES INC 0.443933917 NA 1990-01
10397 WERNER ENTERPRISES INC 0.255496848 NA 1990-01
10397 WERNER ENTERPRISES INC 0.891034483 NA 1990-02
structure(list(permno = c(10026L, 10026L, 10026L, 10026L, 10026L,
10026L, 10397L, 10397L, 10397L, 10397L), date = structure(c(5L,
6L, 1L, 2L, 3L, 4L, 7L, 8L, 9L, 10L), .Label = c("1/10/1990",
"1/11/1990", "1/12/1990", "1/15/1990", "1/2/1990", "1/3/1990",
"7/29/1998", "7/30/1998", "8/6/1998", "8/7/1998"), class = "factor"),
company = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L), .Label = c("J & J SNACK FOODS CORP", "WERNER ENTERPRISES INC"
), class = "factor"), price = c(11.75, 12.75, 13, 13, 12.375,
12.75, 12.25, 12.25, 10.75, 11.25), volume = c(36360L, 82710L,
22750L, 8574L, 40262L, 10150L, 25200L, 9000L, 333100L, 52200L
), amihud = c(1.389026403, 1.028968686, NA, Inf, Inf, 0.891034483,
0.255496848, NA, Inf, 0.891034483), illiq = c(1.625240831,
NA, NA, NA, NA, NA, NA, NA, NA, NA), MonthYr = structure(c(1L,
1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("1990-01",
"1990-02"), class = "factor")), .Names = c("permno", "date",
"company", "price", "volume", "amihud", "illiq", "MonthYr"), class = "data.frame", row.names = c(NA,
-10L))
我想计算 Amihud 指标(衡量金融流动性不足,因此风险)。简而言之:我需要变量 'amihud' 的每只股票 (permno) 和每月的平均值,我将其称为 'illiq'。
我试过了:
res <- smallcap %>%
group_by(permno, MonthYr) %>%
mean(amihud, na.rm=T) %>%
group_by(permno)
我不知道这在多大程度上是正确的,但是所有省略或子集化 NA 和 Inf 的尝试都没有成功。
预期结果,不管这个例子的正确性和不需要 amihud 变量:
permno company illiq MonthYr
10026 J & J SNACK FOODS CORP 1.65 1990-01
10026 J & J SNACK FOODS CORP 0.87 1990-02
10397 WERNER ENTERPRISES INC 0.25 1990-01
10397 WERNER ENTERPRISES INC 0.55 1990-02
感谢您提供的任何提示。
您需要执行以下操作:
#since you don't care about the Infs convert them to NAs
#so that they get removed at the mean function
#since we have set na.rm=TRUE
df$amihud[df$amihud==Inf] <- NA
library(dplyr)
#you need to use summarise to calculate the means as below:
res <- df %>%
select(permno, company, MonthYr, amihud) %>%
group_by(permno, company, MonthYr) %>%
summarise(illiq = mean(amihud, na.rm=TRUE))
输出:
> res
Source: local data frame [4 x 4]
Groups: permno, company
permno company MonthYr illiq
1 10026 J & J SNACK FOODS CORP 1990-01 1.2089975
2 10026 J & J SNACK FOODS CORP 1990-02 0.8910345
3 10397 WERNER ENTERPRISES INC 1990-01 0.2554968
4 10397 WERNER ENTERPRISES INC 1990-02 0.8910345
P.S。您预期输出中的值可能来自完整集合,因为 10026 J & J SNACK FOODS CORP 1990-02
只有一个值,而且它也应该是平均值,即 0.8910345
而不是 0.87
,如您的输出。
我正在尝试计算大 df 的平均值,将观察结果除以 Id 和月份,然后 none 我找到的答案按预期工作,有时它们会清空我的样本,这没有用。
如果 df 是:
permno company amihud illiq MonthYr
10026 J & J SNACK FOODS CORP 1.389026403 1.625 1990-01
10026 J & J SNACK FOODS CORP 1.028968686 NA 1990-01
10026 J & J SNACK FOODS CORP NA NA 1990-01
10026 J & J SNACK FOODS CORP NA NA 1990-01
10026 J & J SNACK FOODS CORP Inf NA 1990-01
10026 J & J SNACK FOODS CORP Inf NA 1990-02
10026 J & J SNACK FOODS CORP 0.891034483 NA 1990-02
10397 WERNER ENTERPRISES INC 0.443933917 NA 1990-01
10397 WERNER ENTERPRISES INC 0.255496848 NA 1990-01
10397 WERNER ENTERPRISES INC 0.891034483 NA 1990-02
structure(list(permno = c(10026L, 10026L, 10026L, 10026L, 10026L,
10026L, 10397L, 10397L, 10397L, 10397L), date = structure(c(5L,
6L, 1L, 2L, 3L, 4L, 7L, 8L, 9L, 10L), .Label = c("1/10/1990",
"1/11/1990", "1/12/1990", "1/15/1990", "1/2/1990", "1/3/1990",
"7/29/1998", "7/30/1998", "8/6/1998", "8/7/1998"), class = "factor"),
company = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L), .Label = c("J & J SNACK FOODS CORP", "WERNER ENTERPRISES INC"
), class = "factor"), price = c(11.75, 12.75, 13, 13, 12.375,
12.75, 12.25, 12.25, 10.75, 11.25), volume = c(36360L, 82710L,
22750L, 8574L, 40262L, 10150L, 25200L, 9000L, 333100L, 52200L
), amihud = c(1.389026403, 1.028968686, NA, Inf, Inf, 0.891034483,
0.255496848, NA, Inf, 0.891034483), illiq = c(1.625240831,
NA, NA, NA, NA, NA, NA, NA, NA, NA), MonthYr = structure(c(1L,
1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("1990-01",
"1990-02"), class = "factor")), .Names = c("permno", "date",
"company", "price", "volume", "amihud", "illiq", "MonthYr"), class = "data.frame", row.names = c(NA,
-10L))
我想计算 Amihud 指标(衡量金融流动性不足,因此风险)。简而言之:我需要变量 'amihud' 的每只股票 (permno) 和每月的平均值,我将其称为 'illiq'。
我试过了:
res <- smallcap %>%
group_by(permno, MonthYr) %>%
mean(amihud, na.rm=T) %>%
group_by(permno)
我不知道这在多大程度上是正确的,但是所有省略或子集化 NA 和 Inf 的尝试都没有成功。
预期结果,不管这个例子的正确性和不需要 amihud 变量:
permno company illiq MonthYr
10026 J & J SNACK FOODS CORP 1.65 1990-01
10026 J & J SNACK FOODS CORP 0.87 1990-02
10397 WERNER ENTERPRISES INC 0.25 1990-01
10397 WERNER ENTERPRISES INC 0.55 1990-02
感谢您提供的任何提示。
您需要执行以下操作:
#since you don't care about the Infs convert them to NAs
#so that they get removed at the mean function
#since we have set na.rm=TRUE
df$amihud[df$amihud==Inf] <- NA
library(dplyr)
#you need to use summarise to calculate the means as below:
res <- df %>%
select(permno, company, MonthYr, amihud) %>%
group_by(permno, company, MonthYr) %>%
summarise(illiq = mean(amihud, na.rm=TRUE))
输出:
> res
Source: local data frame [4 x 4]
Groups: permno, company
permno company MonthYr illiq
1 10026 J & J SNACK FOODS CORP 1990-01 1.2089975
2 10026 J & J SNACK FOODS CORP 1990-02 0.8910345
3 10397 WERNER ENTERPRISES INC 1990-01 0.2554968
4 10397 WERNER ENTERPRISES INC 1990-02 0.8910345
P.S。您预期输出中的值可能来自完整集合,因为 10026 J & J SNACK FOODS CORP 1990-02
只有一个值,而且它也应该是平均值,即 0.8910345
而不是 0.87
,如您的输出。