尝试从横截面数据创建面板数据

Question

我正在尝试转换来自全球恐怖主义数据库的数据，这样单位就不再是恐怖事件，而是 "Country_Year" 其中一个变量包含当年恐怖事件的数量。

我设法创建了一个数据框，其中所有一列都包含所有 Country_Year 组合作为一个变量。我还发现通过使用` ´table(GTD_94_Land$country_txt, GTD_94_Land$iyear)´ table 显示我希望新变量具有的值。我想不通的是如何将这个数字存储为变量。

所以我的数据是这样的

        eventid iyear crit1 crit2 crit3 country country_txt
      <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl> <chr>      
 1 199401010008  1994     1     1     1     182 Somalia    
 2 199401010012  1994     1     1     1     209 Turkey     
 3 199401010013  1994     1     1     1     209 Turkey     
 4 199401020003  1994     1     1     1     209 Turkey     
 5 199401020007  1994     1     1     0     106 Kuwait     
 6 199401030002  1994     1     1     1     209 Turkey     
 7 199401030003  1994     1     1     1     228 Yemen      
 8 199401030006  1994     1     1     0      53 Cyprus     
 9 199401040005  1994     1     1     0     209 Turkey     
10 199401040006  1994     1     1     0     209 Turkey     
11 199401040007  1994     1     1     1     209 Turkey     
12 199401040008  1994     1     1     1     209 Turkey

我想转型，这样我就有了

Terror attacks iyear crit1 crit2 crit3 country country_txt
          <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl> <chr>      
 1 1  1994     1     1     1     182 Somalia    
 2 8  1994     1     1     1     209 Turkey     
 5 1  1994     1     1     0     106 Kuwait     
  7 1  1994    1     1     1     228 Yemen      
 8 1  1994     1     1     0      53 Cyprus     
´´´

I've looked at some solutions but most of them seems to assume that the number the new variable should have already is in the data. 

All help is appreciated!

Answer 1

假设 df 是原始数据帧：

df_out = df %>% 
  dplyr::select(-eventid) %>% 
  dplyr::group_by(country_txt,iyear) %>% 
  dplyr::mutate(Terrorattacs = n()) %>% 
  dplyr::slice(1L) %>% 
  dplyr::ungroup()

理想情况下，我会使用 summarize，但由于我不知道其他列的汇总标准，所以我只使用了 mutate 和 slice。

注意：'crit' 列值将是 'country_txt' 和 'iyear' 的第一次出现。

Answer 2

这是一个 data.table 解决方案。如果数据集已被过滤为 crit1 和 crit2 等于 1（您在评论中作为条件给出），则可以删除第一个参数 (crit1 == 1 & crit2 == 1)

library(data.table)
set.seed(1011)

dat <- data.table(eventid = round(runif(100, 1000, 10000)),
                  iyear = sample(1994:1996, 100, rep = T),
                  crit1 = rbinom(100, 1, .9),
                  crit2 = rbinom(100, 1, .9),
                  crit3 = rbinom(100, 1, .9),
                  country = sample(1:3, 100, rep = T))
dat[, country_txt := LETTERS[country]]

## remove crit variables
dat[crit1 == 1 & crit2 == 1, .N, .(country, country_txt, iyear)]
#>    country country_txt iyear  N
#> 1:       1           A  1994 10
#> 2:       1           A  1995  4
#> 3:       3           C  1995 10
#> 4:       1           A  1996  7
#> 5:       2           B  1996  9
#> 6:       3           C  1996  5
#> 7:       2           B  1994  8
#> 8:       3           C  1994 13
#> 9:       2           B  1995 10

^{由 reprex package (v0.3.0)}

于 2019-09-24 创建

尝试从横截面数据创建面板数据

Attempting to create panel-data from cross sectional data

r

panel-data

dplyr