如何在 R 中创建按 PERMNO 分组并按日期排列的累积变量

Question

我有一个数据框，其中包含来自 COMPUSTAT 的变量，其中包含各种会计项目的数据，包括来自不同公司的 SG&A 费用。

我想在数据框中创建一个新变量，它按时间顺序累积每家公司的 SG&A 费用。我使用 PERMNO 代码作为每个公司的唯一 ID。

我试过这段代码，但它似乎不起作用：

crsp.comp2$cxsgaq <- crsp.comp2 %>%
  group_by(permno) %>%
  arrange(date) %>%
  mutate_at(vars(xsgaq), cumsum(xsgaq))

（xsgag 是 SG&A 费用的 COMPUSTAT 变量）

非常感谢您的帮助

Answer 1

您的示例代码试图将整个数据帧 crsp.comp2 写入变量 crsp.comp2$cxsgaq。

通常vars()函数变量需要“引用”；尽管在您的情况下，请使用标准 mutate() 函数并在那里分配 cxsgaq 变量。

crsp.comp2 <- crsp.comp2 %>%
  group_by(permno) %>%
  arrange(date) %>%
  mutate(cxsgaq = cumsum(xsgaq))

虹膜数据集的可重现示例：

library(tidyverse)
iris %>% 
  group_by(Species) %>% 
  arrange(Sepal.Length) %>% 
  mutate(C.Sepal.Width = cumsum(Sepal.Width))

Answer 2

根据@m-viking 的回答，如果使用 WRDS PostgreSQL 服务器，您只需使用 window_order（来自 dplyr）代替 arrange。（我使用 Compustat 公司标识符 gvkey 代替 permno 以便此代码有效，但思路是一样的。）

library(dplyr, warn.conflicts = FALSE)
library(DBI)

pg <- dbConnect(RPostgres::Postgres(), 
                bigint = "integer", sslmode='allow')

fundq <- tbl(pg, sql("SELECT * FROM comp.fundq"))

comp2 <-
  fundq %>%
  filter(indfmt == "INDL", datafmt == "STD",
         consol == "C", popsrc == "D")

comp2 <- 
  comp2 %>%
  group_by(gvkey) %>%
  dbplyr::window_order(datadate) %>%
  mutate(cxsgaq = cumsum(xsgaq))

comp2 %>%
  filter(!is.na(xsgaq)) %>%
  select(gvkey, datadate, xsgaq, cxsgaq)
#> # Source:     lazy query [?? x 4]
#> # Database:   postgres [iangow@wrds-pgdata.wharton.upenn.edu:9737/wrds]
#> # Groups:     gvkey
#> # Ordered by: datadate
#>    gvkey  datadate   xsgaq cxsgaq
#>    <chr>  <date>     <dbl>  <dbl>
#>  1 001000 1966-12-31 0.679  0.679
#>  2 001000 1967-12-31 1.02   1.70 
#>  3 001000 1968-12-31 5.86   7.55 
#>  4 001000 1969-12-31 7.18  14.7  
#>  5 001000 1970-12-31 8.25  23.0  
#>  6 001000 1971-12-31 7.96  30.9  
#>  7 001000 1972-12-31 7.55  38.5  
#>  8 001000 1973-12-31 8.53  47.0  
#>  9 001000 1974-12-31 8.86  55.9  
#> 10 001000 1975-12-31 9.59  65.5  
#> # … with more rows

^{由 reprex package (v1.0.0)}

于 2021-04-05 创建

如何在 R 中创建按 PERMNO 分组并按日期排列的累积变量

How to create a cumulative variable that groups by PERMNO and arranges by date in R

r

wrds-compusat

wrds