如何根据R中重复标称值的总和计算新值
How to calculate new value based on sum of duplicate nominal values in R
我有两列数据:一列用于变量,一列用于变量出现的区域。
veg_dominant Shape_Area
Hm1.1 28216.344
Bp1.2molcae 6509.464
Bp1.2molcae 43518.162
Hm1.1 21348.608
Hm1.1 14529.108
Hm1.1 18050.676
我想对所有相同的 veg_dominant
求 Shape_Area
的总和。
例如,在每个 Hm1.1 后面,我想要所有形状面积 Bp1.2molcae
的总和,即 50027.626。所以我希望这个数字出现在包含 Bp1.2molcae
的两行后面。 Hm1.1
也是如此。我最终想要的是一个只有每个唯一变量和 Shape_Area
之和的新数据框。
上述示例的预期输出为:
Veg_dominant Shape_Area
Hm1.1 82144.736
Bp1.2molcae 50027.626
我有很多行,但下面是上面示例中所示的头部代码。
structure(list(veg_dominant = structure(c(59L, 14L, 14L, 59L,
59L, 59L), .Label = c("", "Bb1.1.1", "bebouwing", "Beuk", "bos",
"Bp", "Bp1.1", "Bp1.1.1", "Bp1.1.3", "Bp1.1.3loof", "Bp1.1Calluna",
"Bp1.2", "Bp1.2desflex", "Bp1.2molcae", "Bp1.3", "Bq11.1", "Bq3.2.5",
"Bq4.1betpin", "Bq4.1querob", "Bq5.2pinbet", "Bq6.1desflex",
"Bq6.1molcae", "Bq6.2", "Bq6.2molcae", "Bq9", "Bq99.1", "Bq99.2",
"Bq99.3", "Dd1", "Dd2.1", "Dd2.2", "Dd3", "Dd5.1", "E00", "G01a",
"G02", "G03", "G04", "G05", "G06", "G07", "Gc04", "Gc1", "Gc2",
"Gc3", "Gc4", "grasland", "Grasland ", "H01", "H03", "Hc1", "Hc2",
"Hc3", "Hc3_0", "Hc3_3", "Hc3Cp", "Hc3t", "hm1.1", "Hm1.1", "Hm1.1_3",
"Hm1.2", "Hp1.1", "Hpc", "Hv", "jeneverbesstruweel", "Oefendorp",
"open zand", "Open zand", "opslag", "Opslag", "P02", "Sj1.1",
"weg", "x", "x00", "X00"), class = "factor"), Shape_Area = c(28216.3437,
6509.46415, 43518.16186, 21348.60848, 14529.10796, 18050.6759
)), row.names = c(NA, 6L), class = "data.frame")
在这里,我们可以按sum
分组
library(dplyr)
df1 %>%
group_by(veg_dominant) %>%
summarise(Shape_Area = sum(Shape_Area))
或在base R
aggregate(Shape_Area ~ veg_dominant, df1, sum)
我有两列数据:一列用于变量,一列用于变量出现的区域。
veg_dominant Shape_Area
Hm1.1 28216.344
Bp1.2molcae 6509.464
Bp1.2molcae 43518.162
Hm1.1 21348.608
Hm1.1 14529.108
Hm1.1 18050.676
我想对所有相同的 veg_dominant
求 Shape_Area
的总和。
例如,在每个 Hm1.1 后面,我想要所有形状面积 Bp1.2molcae
的总和,即 50027.626。所以我希望这个数字出现在包含 Bp1.2molcae
的两行后面。 Hm1.1
也是如此。我最终想要的是一个只有每个唯一变量和 Shape_Area
之和的新数据框。
上述示例的预期输出为:
Veg_dominant Shape_Area
Hm1.1 82144.736
Bp1.2molcae 50027.626
我有很多行,但下面是上面示例中所示的头部代码。
structure(list(veg_dominant = structure(c(59L, 14L, 14L, 59L,
59L, 59L), .Label = c("", "Bb1.1.1", "bebouwing", "Beuk", "bos",
"Bp", "Bp1.1", "Bp1.1.1", "Bp1.1.3", "Bp1.1.3loof", "Bp1.1Calluna",
"Bp1.2", "Bp1.2desflex", "Bp1.2molcae", "Bp1.3", "Bq11.1", "Bq3.2.5",
"Bq4.1betpin", "Bq4.1querob", "Bq5.2pinbet", "Bq6.1desflex",
"Bq6.1molcae", "Bq6.2", "Bq6.2molcae", "Bq9", "Bq99.1", "Bq99.2",
"Bq99.3", "Dd1", "Dd2.1", "Dd2.2", "Dd3", "Dd5.1", "E00", "G01a",
"G02", "G03", "G04", "G05", "G06", "G07", "Gc04", "Gc1", "Gc2",
"Gc3", "Gc4", "grasland", "Grasland ", "H01", "H03", "Hc1", "Hc2",
"Hc3", "Hc3_0", "Hc3_3", "Hc3Cp", "Hc3t", "hm1.1", "Hm1.1", "Hm1.1_3",
"Hm1.2", "Hp1.1", "Hpc", "Hv", "jeneverbesstruweel", "Oefendorp",
"open zand", "Open zand", "opslag", "Opslag", "P02", "Sj1.1",
"weg", "x", "x00", "X00"), class = "factor"), Shape_Area = c(28216.3437,
6509.46415, 43518.16186, 21348.60848, 14529.10796, 18050.6759
)), row.names = c(NA, 6L), class = "data.frame")
在这里,我们可以按sum
library(dplyr)
df1 %>%
group_by(veg_dominant) %>%
summarise(Shape_Area = sum(Shape_Area))
或在base R
aggregate(Shape_Area ~ veg_dominant, df1, sum)