R:给定值范围创建或删除行

R: create or delete rows given a range of values

我有下一个包含国家、年份和 GDP 的数据库:

我有什么

Country Year GDP
Afghanistan 1950 3
Afghanistan 1951 3
Afghanistan 2019 3
Australia 1945 3
Australia 2021 3

我需要创建或删除行,以便每个国家/地区都有从 1948 年到 2021 年的行。因此,例如,对于阿富汗,我需要创建 GDP 为空的 1948 到 1949 和 2021 行,对于澳大利亚删除 1945 行并创建其间的所有内容。

这不是我的确切数据库,我有 200 多个国家/地区,每个国家/地区的年份都不同。有没有办法轻松创建它?

我需要的

Country Year GDP
Afghanistan 1948 NA
... ... ...
Afghanistan 2021 NA
Australia 1948 3
... ... ...
Australia 2021 3

我们可以使用 complete 创建缺失的组合并将 GDP 指定为 0

library(tidyr)
complete(df1, Country, Year = 1948:2021, list(GDP = 0)) %>%
    arrange(Country)
library(tidyr)
library(dplyr)

df <-
  tibble::tribble(
         ~Country, ~Year,   ~GDP,
    "Afghanistan", 1950L, "3",
    "Afghanistan", 1951L, "3",
    "Afghanistan", 2019L, "3",
      "Australia", 1945L, "3",
      "Australia", 2021L, "3"
    )

df %>% 
  filter(Year >= 1948 & Year <= 2021) %>% 
  complete(Year = 1948:2021,Country) %>% 
  arrange(Country)

# A tibble: 148 x 3
    Year Country     GDP  
   <int> <chr>       <chr>
 1  1948 Afghanistan NA   
 2  1949 Afghanistan NA   
 3  1950 Afghanistan 3 
 4  1951 Afghanistan 3 
 5  1952 Afghanistan NA   
 6  1953 Afghanistan NA   
 7  1954 Afghanistan NA   
 8  1955 Afghanistan NA   
 9  1956 Afghanistan NA   
10  1957 Afghanistan NA   
# ... with 138 more rows

我们可以使用 complete,然后是 filter,最后是 replace_na

library(dplyr)


df <-read.table(header=TRUE, text="Country  Year    GDP
Afghanistan 1950    3
Afghanistan 1951    3
Afghanistan 2019    3
Australia   1945    3
Australia   2021    3")


df <- df %>% 
  complete(Year = 1948:2021, Country) %>%
  filter(between(Year, 1948, 2021)) %>%
  replace_na(list(GDP = 0)) %>%
  arrange(Country)

head(df)
tail(df)
 
> print(head(df))
# A tibble: 6 x 3
   Year Country     GDP  
  <int> <chr>       <chr>
1  1948 Afghanistan 0    
2  1949 Afghanistan 0    
3  1950 Afghanistan 3 
4  1951 Afghanistan 3 
5  1952 Afghanistan 0    
6  1953 Afghanistan 0    
> print(tail(df))
# A tibble: 6 x 3
   Year Country   GDP  
  <int> <chr>     <chr>
1  2016 Australia 0    
2  2017 Australia 0    
3  2018 Australia 0    
4  2019 Australia 0    
5  2020 Australia 0    
6  2021 Australia 3 

reprex package (v2.0.1)

于 2021-09-26 创建

这是 completecoalesce

的解决方案
library(dplyr)
library(tidyr)
df %>% 
  complete(Year = 1948:2021, Country) %>% 
  arrange(Country, Year) %>% 
  mutate(GDP = coalesce(GDP, "0"))
# A tibble: 149 x 3
    Year Country     GDP  
   <int> <chr>       <chr>
 1  1948 Afghanistan 0    
 2  1949 Afghanistan 0    
 3  1950 Afghanistan 3 
 4  1951 Afghanistan 3 
 5  1952 Afghanistan 0    
 6  1953 Afghanistan 0    
 7  1954 Afghanistan 0    
 8  1955 Afghanistan 0    
 9  1956 Afghanistan 0    
10  1957 Afghanistan 0    
# … with 139 more rows