将有价值的列传播到 R 中的二进制 'time series'

Question

我试图首先将一个有值列散布到一组二进制列中，然后以 'time series' 格式再次收集它们。

例如，考虑在特定时间被征服的位置，数据如下所示：

df1 <- data.frame(locationID = c(1,2,3), conquered_in = c(1931, 1932, 1929))

  locationID conquered_in
1          1         1931
2          2         1932
3          3         1929

我正在尝试将数据重塑为如下所示：

df2 <- data.frame(locationID = c(1,1,1,1,2,2,2,2,3,3,3,3), year = c(1929,1930,1931,1932,1929,1930,1931,1932,1929,1930,1931,1932), conquered = c(0,0,1,1,0,0,0,0,1,1,1,1))

   locationID year conquered
1           1 1929         0
2           1 1930         0
3           1 1931         1
4           1 1932         1
5           2 1929         0
6           2 1930         0
7           2 1931         0
8           2 1932         0
9           3 1929         1
10          3 1930         1
11          3 1931         1
12          3 1932         1

我最初的策略是 spread 征服，然后尝试 gather。似乎很接近，但我似乎无法用 fill 来正确处理，因为我也试图用 1 填充后来的年份。

Answer 1

可以用complete()扩展数据框，然后在conquered等于1时用cumsum()向下填充分组数据

library(tidyr)
library(dplyr)

df1 %>% 
  mutate(conquered = 1) %>%
  complete(locationID, conquered_in = seq(min(conquered_in), max(conquered_in)), fill = list(conquered = 0)) %>%
  group_by(locationID) %>%
  mutate(conquered = cumsum(conquered == 1))

# A tibble: 12 x 3
# Groups:   locationID [3]
   locationID conquered_in conquered
        <dbl>        <dbl>     <int>
 1          1         1929         0
 2          1         1930         0
 3          1         1931         1
 4          1         1932         1
 5          2         1929         0
 6          2         1930         0
 7          2         1931         0
 8          2         1932         1
 9          3         1929         1
10          3         1930         1
11          3         1931         1
12          3         1932         1

Answer 2

使用 tidyr 的 complete 会是更好的选择。虽然我们需要 ware 征服的年份可能不会完全涵盖 war.

从开始到结束的所有年份

library(dplyr)
library(tidyr)
library(magrittr)

df1 <- data.frame(locationID = c(1,2,3), conquered_in = c(1931, 1932, 1929))

# A data frame full of all year you want to cover
df2 <- data.frame(year=seq(1929, 1940, by=1))

# Create a data frame full of combination of year and location + conquered data
df3 <- full_join(df2, df1, by=c("year"="conquered_in")) %>%
  mutate(conquered=if_else(!is.na(locationID), 1, 0)) %>%
  complete(year, locationID) %>%
  arrange(locationID) %>%
  filter(!is.na(locationID))

# calculate conquered depend on the first year it get conquered - using group by location
df3 %<>%
  group_by(locationID) %>%
  # year 2000 in the min just for case if you have location that never conquered 
  mutate(conquered=if_else(year>=min(2000, year[conquered==1], na.rm=T), 1, 0)) %>%
  ungroup()

df3 %>% filter(year<=1932)
# A tibble: 12 x 3
    year locationID conquered
   <dbl>      <dbl>     <dbl>
 1  1929          1         0
 2  1930          1         0
 3  1931          1         1
 4  1932          1         1
 5  1929          2         0
 6  1930          2         0
 7  1931          2         0
 8  1932          2         1
 9  1929          3         1
10  1930          3         1
11  1931          3         1
12  1932          3         1

将有价值的列传播到 R 中的二进制 'time series'

Spread valued column into binary 'time series' in R

r

spread