R:给定值范围创建或删除行
R: create or delete rows given a range of values
我有下一个包含国家、年份和 GDP 的数据库:
我有什么
Country
Year
GDP
Afghanistan
1950
3
Afghanistan
1951
3
Afghanistan
2019
3
Australia
1945
3
Australia
2021
3
我需要创建或删除行,以便每个国家/地区都有从 1948 年到 2021 年的行。因此,例如,对于阿富汗,我需要创建 GDP 为空的 1948 到 1949 和 2021 行,对于澳大利亚删除 1945 行并创建其间的所有内容。
这不是我的确切数据库,我有 200 多个国家/地区,每个国家/地区的年份都不同。有没有办法轻松创建它?
我需要的
Country
Year
GDP
Afghanistan
1948
NA
...
...
...
Afghanistan
2021
NA
Australia
1948
3
...
...
...
Australia
2021
3
我们可以使用 complete
创建缺失的组合并将 GDP
指定为 0
library(tidyr)
complete(df1, Country, Year = 1948:2021, list(GDP = 0)) %>%
arrange(Country)
library(tidyr)
library(dplyr)
df <-
tibble::tribble(
~Country, ~Year, ~GDP,
"Afghanistan", 1950L, "3",
"Afghanistan", 1951L, "3",
"Afghanistan", 2019L, "3",
"Australia", 1945L, "3",
"Australia", 2021L, "3"
)
df %>%
filter(Year >= 1948 & Year <= 2021) %>%
complete(Year = 1948:2021,Country) %>%
arrange(Country)
# A tibble: 148 x 3
Year Country GDP
<int> <chr> <chr>
1 1948 Afghanistan NA
2 1949 Afghanistan NA
3 1950 Afghanistan 3
4 1951 Afghanistan 3
5 1952 Afghanistan NA
6 1953 Afghanistan NA
7 1954 Afghanistan NA
8 1955 Afghanistan NA
9 1956 Afghanistan NA
10 1957 Afghanistan NA
# ... with 138 more rows
我们可以使用 complete
,然后是 filter
,最后是 replace_na
。
library(dplyr)
df <-read.table(header=TRUE, text="Country Year GDP
Afghanistan 1950 3
Afghanistan 1951 3
Afghanistan 2019 3
Australia 1945 3
Australia 2021 3")
df <- df %>%
complete(Year = 1948:2021, Country) %>%
filter(between(Year, 1948, 2021)) %>%
replace_na(list(GDP = 0)) %>%
arrange(Country)
head(df)
tail(df)
> print(head(df))
# A tibble: 6 x 3
Year Country GDP
<int> <chr> <chr>
1 1948 Afghanistan 0
2 1949 Afghanistan 0
3 1950 Afghanistan 3
4 1951 Afghanistan 3
5 1952 Afghanistan 0
6 1953 Afghanistan 0
> print(tail(df))
# A tibble: 6 x 3
Year Country GDP
<int> <chr> <chr>
1 2016 Australia 0
2 2017 Australia 0
3 2018 Australia 0
4 2019 Australia 0
5 2020 Australia 0
6 2021 Australia 3
由 reprex package (v2.0.1)
于 2021-09-26 创建
这是 complete
和 coalesce
的解决方案
library(dplyr)
library(tidyr)
df %>%
complete(Year = 1948:2021, Country) %>%
arrange(Country, Year) %>%
mutate(GDP = coalesce(GDP, "0"))
# A tibble: 149 x 3
Year Country GDP
<int> <chr> <chr>
1 1948 Afghanistan 0
2 1949 Afghanistan 0
3 1950 Afghanistan 3
4 1951 Afghanistan 3
5 1952 Afghanistan 0
6 1953 Afghanistan 0
7 1954 Afghanistan 0
8 1955 Afghanistan 0
9 1956 Afghanistan 0
10 1957 Afghanistan 0
# … with 139 more rows
我有下一个包含国家、年份和 GDP 的数据库:
我有什么
Country | Year | GDP |
---|---|---|
Afghanistan | 1950 | 3 |
Afghanistan | 1951 | 3 |
Afghanistan | 2019 | 3 |
Australia | 1945 | 3 |
Australia | 2021 | 3 |
我需要创建或删除行,以便每个国家/地区都有从 1948 年到 2021 年的行。因此,例如,对于阿富汗,我需要创建 GDP 为空的 1948 到 1949 和 2021 行,对于澳大利亚删除 1945 行并创建其间的所有内容。
这不是我的确切数据库,我有 200 多个国家/地区,每个国家/地区的年份都不同。有没有办法轻松创建它?
我需要的
Country | Year | GDP |
---|---|---|
Afghanistan | 1948 | NA |
... | ... | ... |
Afghanistan | 2021 | NA |
Australia | 1948 | 3 |
... | ... | ... |
Australia | 2021 | 3 |
我们可以使用 complete
创建缺失的组合并将 GDP
指定为 0
library(tidyr)
complete(df1, Country, Year = 1948:2021, list(GDP = 0)) %>%
arrange(Country)
library(tidyr)
library(dplyr)
df <-
tibble::tribble(
~Country, ~Year, ~GDP,
"Afghanistan", 1950L, "3",
"Afghanistan", 1951L, "3",
"Afghanistan", 2019L, "3",
"Australia", 1945L, "3",
"Australia", 2021L, "3"
)
df %>%
filter(Year >= 1948 & Year <= 2021) %>%
complete(Year = 1948:2021,Country) %>%
arrange(Country)
# A tibble: 148 x 3
Year Country GDP
<int> <chr> <chr>
1 1948 Afghanistan NA
2 1949 Afghanistan NA
3 1950 Afghanistan 3
4 1951 Afghanistan 3
5 1952 Afghanistan NA
6 1953 Afghanistan NA
7 1954 Afghanistan NA
8 1955 Afghanistan NA
9 1956 Afghanistan NA
10 1957 Afghanistan NA
# ... with 138 more rows
我们可以使用 complete
,然后是 filter
,最后是 replace_na
。
library(dplyr)
df <-read.table(header=TRUE, text="Country Year GDP
Afghanistan 1950 3
Afghanistan 1951 3
Afghanistan 2019 3
Australia 1945 3
Australia 2021 3")
df <- df %>%
complete(Year = 1948:2021, Country) %>%
filter(between(Year, 1948, 2021)) %>%
replace_na(list(GDP = 0)) %>%
arrange(Country)
head(df)
tail(df)
> print(head(df))
# A tibble: 6 x 3
Year Country GDP
<int> <chr> <chr>
1 1948 Afghanistan 0
2 1949 Afghanistan 0
3 1950 Afghanistan 3
4 1951 Afghanistan 3
5 1952 Afghanistan 0
6 1953 Afghanistan 0
> print(tail(df))
# A tibble: 6 x 3
Year Country GDP
<int> <chr> <chr>
1 2016 Australia 0
2 2017 Australia 0
3 2018 Australia 0
4 2019 Australia 0
5 2020 Australia 0
6 2021 Australia 3
由 reprex package (v2.0.1)
于 2021-09-26 创建这是 complete
和 coalesce
library(dplyr)
library(tidyr)
df %>%
complete(Year = 1948:2021, Country) %>%
arrange(Country, Year) %>%
mutate(GDP = coalesce(GDP, "0"))
# A tibble: 149 x 3
Year Country GDP
<int> <chr> <chr>
1 1948 Afghanistan 0
2 1949 Afghanistan 0
3 1950 Afghanistan 3
4 1951 Afghanistan 3
5 1952 Afghanistan 0
6 1953 Afghanistan 0
7 1954 Afghanistan 0
8 1955 Afghanistan 0
9 1956 Afghanistan 0
10 1957 Afghanistan 0
# … with 139 more rows