根据数字顺序添加行
Add rows based on numerical sequence
我正在尝试根据数字序列中的缺失值向数据框添加行。
这是一个代表。我想从这里开始:
> df[-c(1,3,9),]
id year V1 V2 V3
2 1 2019 0.84788413 0.10418523 0.2249371
4 2 2018 0.73183889 0.66380165 0.7681833
5 2 2019 0.38263072 -0.66741116 -0.1803099
6 2 2020 -0.05915745 2.09814096 0.8558323
7 3 2018 1.42148474 -1.65590355 -0.0879526
8 3 2019 1.46178632 1.96796970 -0.3489630
10 4 2018 0.12511779 -0.91978526 -2.3880951
11 4 2019 0.93936831 -0.24440871 0.3249178
12 4 2020 -1.57864369 -0.05853787 0.6078194
为此:
id year V1 V2 V3
1 1 2018 NA NA NA
2 1 2019 0.84788413 0.10418523 0.2249371
3 1 2020 NA NA NA
4 2 2018 0.73183889 0.66380165 0.7681833
5 2 2019 0.38263072 -0.66741116 -0.1803099
6 2 2020 -0.05915745 2.09814096 0.8558323
7 3 2018 1.42148474 -1.65590355 -0.0879526
8 3 2019 1.46178632 1.96796970 -0.3489630
9 3 2020 NA NA NA
10 4 2018 0.12511779 -0.91978526 -2.3880951
11 4 2019 0.93936831 -0.24440871 0.3249178
12 4 2020 -1.57864369 -0.05853787 0.6078194
逻辑是添加缺失的 year
行并将 NA 添加到其余列。
数据:
structure(list(id = c(1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 4L), year = c(2019L,
2018L, 2019L, 2020L, 2018L, 2019L, 2018L, 2019L, 2020L), V1 = c(0.847884128902485,
0.731838887436047, 0.382630718058478, -0.0591574520333011, 1.42148473746568,
1.46178631522088, 0.125117791300285, 0.939368308197552, -1.57864368576782
), V2 = c(0.104185228129027, 0.663801650973095, -0.667411160654917,
2.09814095835567, -1.65590354896798, 1.96796970263568, -0.919785264321656,
-0.244408708889214, -0.0585378742959754), V3 = c(0.224937129454626,
0.7681832776488, -0.180309905647701, 0.855832252932298, -0.0879525996394009,
-0.34896299605019, -2.38809514212219, 0.324917787941616, 0.607819444746004
)), row.names = c(2L, 4L, 5L, 6L, 7L, 8L, 10L, 11L, 12L), class = "data.frame")
您可以使用 tidyr
的 complete
tidyr::complete(df, id, year)
# id year V1 V2 V3
# <int> <int> <dbl> <dbl> <dbl>
# 1 1 2018 NA NA NA
# 2 1 2019 0.848 0.104 0.225
# 3 1 2020 NA NA NA
# 4 2 2018 0.732 0.664 0.768
# 5 2 2019 0.383 -0.667 -0.180
# 6 2 2020 -0.0592 2.10 0.856
# 7 3 2018 1.42 -1.66 -0.0880
# 8 3 2019 1.46 1.97 -0.349
# 9 3 2020 NA NA NA
#10 4 2018 0.125 -0.920 -2.39
#11 4 2019 0.939 -0.244 0.325
#12 4 2020 -1.58 -0.0585 0.608
在基础 R 中你可以使用 expand.grid()
+ merge()
:
merge(df, expand.grid(id = unique(df$id), year = unique(df$year)), all = TRUE)
# id year V1 V2 V3
# 1 1 2018 NA NA NA
# 2 1 2019 0.84788413 0.10418523 0.2249371
# 3 1 2020 NA NA NA
# 4 2 2018 0.73183889 0.66380165 0.7681833
# 5 2 2019 0.38263072 -0.66741116 -0.1803099
# 6 2 2020 -0.05915745 2.09814096 0.8558323
# 7 3 2018 1.42148474 -1.65590355 -0.0879526
# 8 3 2019 1.46178632 1.96796970 -0.3489630
# 9 3 2020 NA NA NA
# 10 4 2018 0.12511779 -0.91978526 -2.3880951
# 11 4 2019 0.93936831 -0.24440871 0.3249178
# 12 4 2020 -1.57864369 -0.05853787 0.6078194
我正在尝试根据数字序列中的缺失值向数据框添加行。
这是一个代表。我想从这里开始:
> df[-c(1,3,9),]
id year V1 V2 V3
2 1 2019 0.84788413 0.10418523 0.2249371
4 2 2018 0.73183889 0.66380165 0.7681833
5 2 2019 0.38263072 -0.66741116 -0.1803099
6 2 2020 -0.05915745 2.09814096 0.8558323
7 3 2018 1.42148474 -1.65590355 -0.0879526
8 3 2019 1.46178632 1.96796970 -0.3489630
10 4 2018 0.12511779 -0.91978526 -2.3880951
11 4 2019 0.93936831 -0.24440871 0.3249178
12 4 2020 -1.57864369 -0.05853787 0.6078194
为此:
id year V1 V2 V3
1 1 2018 NA NA NA
2 1 2019 0.84788413 0.10418523 0.2249371
3 1 2020 NA NA NA
4 2 2018 0.73183889 0.66380165 0.7681833
5 2 2019 0.38263072 -0.66741116 -0.1803099
6 2 2020 -0.05915745 2.09814096 0.8558323
7 3 2018 1.42148474 -1.65590355 -0.0879526
8 3 2019 1.46178632 1.96796970 -0.3489630
9 3 2020 NA NA NA
10 4 2018 0.12511779 -0.91978526 -2.3880951
11 4 2019 0.93936831 -0.24440871 0.3249178
12 4 2020 -1.57864369 -0.05853787 0.6078194
逻辑是添加缺失的 year
行并将 NA 添加到其余列。
数据:
structure(list(id = c(1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 4L), year = c(2019L,
2018L, 2019L, 2020L, 2018L, 2019L, 2018L, 2019L, 2020L), V1 = c(0.847884128902485,
0.731838887436047, 0.382630718058478, -0.0591574520333011, 1.42148473746568,
1.46178631522088, 0.125117791300285, 0.939368308197552, -1.57864368576782
), V2 = c(0.104185228129027, 0.663801650973095, -0.667411160654917,
2.09814095835567, -1.65590354896798, 1.96796970263568, -0.919785264321656,
-0.244408708889214, -0.0585378742959754), V3 = c(0.224937129454626,
0.7681832776488, -0.180309905647701, 0.855832252932298, -0.0879525996394009,
-0.34896299605019, -2.38809514212219, 0.324917787941616, 0.607819444746004
)), row.names = c(2L, 4L, 5L, 6L, 7L, 8L, 10L, 11L, 12L), class = "data.frame")
您可以使用 tidyr
的 complete
tidyr::complete(df, id, year)
# id year V1 V2 V3
# <int> <int> <dbl> <dbl> <dbl>
# 1 1 2018 NA NA NA
# 2 1 2019 0.848 0.104 0.225
# 3 1 2020 NA NA NA
# 4 2 2018 0.732 0.664 0.768
# 5 2 2019 0.383 -0.667 -0.180
# 6 2 2020 -0.0592 2.10 0.856
# 7 3 2018 1.42 -1.66 -0.0880
# 8 3 2019 1.46 1.97 -0.349
# 9 3 2020 NA NA NA
#10 4 2018 0.125 -0.920 -2.39
#11 4 2019 0.939 -0.244 0.325
#12 4 2020 -1.58 -0.0585 0.608
在基础 R 中你可以使用 expand.grid()
+ merge()
:
merge(df, expand.grid(id = unique(df$id), year = unique(df$year)), all = TRUE)
# id year V1 V2 V3
# 1 1 2018 NA NA NA
# 2 1 2019 0.84788413 0.10418523 0.2249371
# 3 1 2020 NA NA NA
# 4 2 2018 0.73183889 0.66380165 0.7681833
# 5 2 2019 0.38263072 -0.66741116 -0.1803099
# 6 2 2020 -0.05915745 2.09814096 0.8558323
# 7 3 2018 1.42148474 -1.65590355 -0.0879526
# 8 3 2019 1.46178632 1.96796970 -0.3489630
# 9 3 2020 NA NA NA
# 10 4 2018 0.12511779 -0.91978526 -2.3880951
# 11 4 2019 0.93936831 -0.24440871 0.3249178
# 12 4 2020 -1.57864369 -0.05853787 0.6078194