调用feols回归时如何处理变量名中的特殊字符?

How to deal with special characters in the variable names when calling feols regression?

我正在尝试为 return FE 回归系数和标准误差编写一个函数,因为我需要 运行 大量回归。数据可能看起来像这样。列名中有很多特殊字符,如space、-、&、数字等

library(data.table)
library(fixest)
library(broom)
data<-data.table(Date = c("2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-02-01","2020-02-01","2020-02-01","2020-02-01"),
         Card = c(1,2,3,4,1,2,3,4),
         A = rnorm(8),
         B = rnorm(8),
         C = rnorm(8),
         D = rnorm(8)
         )
setnames(data, old = "A", new = "A-A")
setnames(data, old = "B", new = "B B")
setnames(data, old = "C", new = "C&C")
setnames(data, old = "D", new = "1-D")

感谢@Ronak Shah 和@Laurent Bergé,他们提供了以下两位优秀的候选人

estimation_fun <- function(col1,col2,df) {
  regression<-feols(as.formula(sprintf('%s ~ %s | Card + Date', col1, col2)), df)
  est =tidy(regression)$estimate
  se = tidy(regression)$std.error
  output <- list(est,se)
  return(output)
}

estimation_fun <- function(lhs, rhs, df) {
regression<-feols(.[col1] ~ .[col2] | Card + Date, df)
est =tidy(regression)$estimate
se = tidy(regression)$std.error
output <- list(est,se)
return(output)
}

如果列名只是“A”、“B”、“C”等,它们都有效。但是,只需尝试这个功能

estimation_fun("A-A","B B",data)

Error in feols(as.formula(sprintf("%s ~ %s | Card + Date", col1, col2)), : 
Argument 'fml' could not be evaluated: <text>:1:9: unexpected symbol
1: A-A ~ B B
^

我正在寻找可以处理这种情况的 feols 公式格式。 或者欢迎任何建议,即直接删除列名中的这些特殊字符。 (但这将是次优的)

感谢这里的优秀社区!

考虑将特殊字符更改为 _

 setnames(data, gsub("[-& ]", "_", names(data)))
 setnames(data, make.names(names(data)))

-查资料

> data
         Date Card         A_A        B_B         C_C        X1_D
1: 2020-01-01    1  0.19083908  0.4835800 -0.08755933  1.01311944
2: 2020-01-01    2 -0.57726617  0.6421043  1.12987445 -0.52168711
3: 2020-01-01    3  2.02653159 -1.4505543 -0.43367868 -0.04474157
4: 2020-01-01    4 -0.20575821  0.4691786 -1.58562690  0.49362528
5: 2020-02-01    1 -0.03461155 -0.2913712 -0.16457341 -0.07701185
6: 2020-02-01    2 -0.50734472 -0.7545768 -0.53227356  0.46468144
7: 2020-02-01    3  0.76653913 -0.1634451  1.00350319  0.25886312
8: 2020-02-01    4  0.33414436  0.6395322  1.10383819 -1.08479631

-测试

 estimation_fun('A_A', 'B_B', data)
[[1]]
[1] -0.3915516
attr(,"type")
[1] "Clustered (Card)"

[[2]]
[1] 0.2658773
attr(,"type")
[1] "Clustered (Card)"

通常反引号有效,但使用 feols,反引号就失效了。因此,安全的选择是使用 janitor 中的 clean_namesgsub 将特殊字符替换为 _.