r 中嵌套 table 的 rowsum
rowsum on a nested table in r
我有一个复杂的数据框,一个最小的例子如下:
df <- structure(list(District = c("Adilabad", "Adilabad", "Adilabad",
"Adilabad", "Adilabad", "Adilabad", "Adilabad", "Adilabad", "Adilabad",
"Adilabad"), Subdistt = c("Adilabad", "Adilabad", "Adilabad",
"Tamsi", "Tamsi", "Tamsi", "Tamsi", "Tamsi", "Tamsi", "Tamsi"
), TRU = c("Total", "Rural", "Urban", "Total", "Rural", "Urban",
"Rural", "Rural", "Urban", "Urban"), Level = c("District", "District",
"District", "Sub-District", "Sub-District", "Sub-District", "Village",
"Village", "Town", "Town"), No_HH = c(1277, 364, 913,
1277, 364, 913, 117, 247, 614, 299)), .Names = c("District",
"Subdistt", "TRU", "Level", "No_HH"), row.names = c(NA, 10L), class = "data.frame")
看起来像这样:
District Subdistt TRU Level No_HH
1 Adilabad Adilabad Total District 1277
2 Adilabad Adilabad Rural District 364
3 Adilabad Adilabad Urban District 913
4 Adilabad Tamsi Total Sub-District 1277
5 Adilabad Tamsi Rural Sub-District 364
6 Adilabad Tamsi Urban Sub-District 913
7 Adilabad Tamsi Rural Village 117
8 Adilabad Tamsi Rural Village 247
9 Adilabad Tamsi Urban Town 614
10 Adilabad Tamsi Urban Town 299
在某种程度上,每个后续列都是前一列的一种子集。我必须验证农村、城市和总级别的分区和地区的总和。
例如:第 7 行和第 8 行的总和等于第 5 行中的值。第 5 行是农村分区。当我们扩展 df 时,我有许多农村街道。 Rural District 中给出了所有农村分区的总和,即第 2 行。
最小预期输出如下:
District Subdistt TRU Level No_HH
1 Adilabad Tamsi Rural Sub-District 364
2 Adilabad Tamsi Urban Sub-District 913
364 是上面最小示例中给出的 117 + 247 的总和,913 是最小示例中给出的第 614 + 299 行总和的总和。
目前我可以将子集化为特定值,但不知道如何根据这些复杂的选择进行求和。有人可以帮忙吗?
我们可以试试
library(dplyr)
df %>%
filter(Level=='Sub-District' & TRU != 'Total')
# District Subdistt TRU Level No_HH
#1 Adilabad Tamsi Rural Sub-District 364
#2 Adilabad Tamsi Urban Sub-District 913
如果我们需要通过 sum
ming 获得相同的输出,
df %>%
filter(!grepl('District', Level)) %>%
group_by(District, Subdistt,TRU) %>%
summarise(No_HH= sum(No_HH)) %>%
mutate(Level= 'Sub_District')
# District Subdistt TRU No_HH Level
# (chr) (chr) (chr) (dbl) (chr)
# 1 Adilabad Tamsi Rural 364 Sub_District
# 2 Adilabad Tamsi Urban 913 Sub_District
我有一个复杂的数据框,一个最小的例子如下:
df <- structure(list(District = c("Adilabad", "Adilabad", "Adilabad",
"Adilabad", "Adilabad", "Adilabad", "Adilabad", "Adilabad", "Adilabad",
"Adilabad"), Subdistt = c("Adilabad", "Adilabad", "Adilabad",
"Tamsi", "Tamsi", "Tamsi", "Tamsi", "Tamsi", "Tamsi", "Tamsi"
), TRU = c("Total", "Rural", "Urban", "Total", "Rural", "Urban",
"Rural", "Rural", "Urban", "Urban"), Level = c("District", "District",
"District", "Sub-District", "Sub-District", "Sub-District", "Village",
"Village", "Town", "Town"), No_HH = c(1277, 364, 913,
1277, 364, 913, 117, 247, 614, 299)), .Names = c("District",
"Subdistt", "TRU", "Level", "No_HH"), row.names = c(NA, 10L), class = "data.frame")
看起来像这样:
District Subdistt TRU Level No_HH
1 Adilabad Adilabad Total District 1277
2 Adilabad Adilabad Rural District 364
3 Adilabad Adilabad Urban District 913
4 Adilabad Tamsi Total Sub-District 1277
5 Adilabad Tamsi Rural Sub-District 364
6 Adilabad Tamsi Urban Sub-District 913
7 Adilabad Tamsi Rural Village 117
8 Adilabad Tamsi Rural Village 247
9 Adilabad Tamsi Urban Town 614
10 Adilabad Tamsi Urban Town 299
在某种程度上,每个后续列都是前一列的一种子集。我必须验证农村、城市和总级别的分区和地区的总和。
例如:第 7 行和第 8 行的总和等于第 5 行中的值。第 5 行是农村分区。当我们扩展 df 时,我有许多农村街道。 Rural District 中给出了所有农村分区的总和,即第 2 行。
最小预期输出如下:
District Subdistt TRU Level No_HH
1 Adilabad Tamsi Rural Sub-District 364
2 Adilabad Tamsi Urban Sub-District 913
364 是上面最小示例中给出的 117 + 247 的总和,913 是最小示例中给出的第 614 + 299 行总和的总和。
目前我可以将子集化为特定值,但不知道如何根据这些复杂的选择进行求和。有人可以帮忙吗?
我们可以试试
library(dplyr)
df %>%
filter(Level=='Sub-District' & TRU != 'Total')
# District Subdistt TRU Level No_HH
#1 Adilabad Tamsi Rural Sub-District 364
#2 Adilabad Tamsi Urban Sub-District 913
如果我们需要通过 sum
ming 获得相同的输出,
df %>%
filter(!grepl('District', Level)) %>%
group_by(District, Subdistt,TRU) %>%
summarise(No_HH= sum(No_HH)) %>%
mutate(Level= 'Sub_District')
# District Subdistt TRU No_HH Level
# (chr) (chr) (chr) (dbl) (chr)
# 1 Adilabad Tamsi Rural 364 Sub_District
# 2 Adilabad Tamsi Urban 913 Sub_District