使用 ddply 函数获取数据框中的百分比
Get percentage in a dataframe with ddply function
我有这个数据框mydf
structure(list(Driver = c("Crop agriculture", "Crop agriculture",
"Infrastructure", "Infrastructure", "Mining", "Mining", "Mixed Agriculture",
"Mixed Agriculture", "Other land use", "Other land use", "Pasture",
"Pasture", "Tree crops", "Tree crops", "Water", "Water"), Period = c("1990-2000",
"1990-2005", "1990-2000", "1990-2005", "1990-2000", "1990-2005",
"1990-2000", "1990-2005", "1990-2000", "1990-2005", "1990-2000",
"1990-2005", "1990-2000", "1990-2005", "1990-2000", "1990-2005"
), Total = c(120328.157829121, 301821.02190182, 12829.2774726025,
10727.4383383233, 1087.58971425679, 639.851573022215, 27213.5917382956,
19832.3424927037, 72326.7471322223, 64524.3243532213, 1064383.44273723,
1347648.2335736, 7814.32273630087, 7672.0730281537, 20332.6943805768,
17504.7712037337), n = c("n = 1669", "n = 783", "n = 298", "n = 151",
"n = 20", "n = 7", "n = 1355", "n = 925", "n = 1623", "n = 851",
"n = 10986", "n = 6039", "n = 316", "n = 211", "n = 466", "n = 244"
)), .Names = c("Driver", "Period", "Total", "n"), class = "data.frame", row.names = c(NA,
-16L))
我们的想法是获取每个驱动程序在此期间的百分比。我已经尝试了 ddply 函数并得到了这行代码。
Percentage<- ddply(mydf, c("Driver", "Period"), summarise,
percent= ((Total/sum(Total))*100))
但是,我只得到所有单元格的 100% 值。有人知道我做错了什么吗?
在您的调用中,当您执行 sum(Total)
时,您使用的是 组 的总值,当与 Total/sum(Total)
一起使用时,只需为这个data/grouping。您可以通过在 sum()
调用中使用 df$Total
来计算整个数据集的总和。使用 ddply
这将是
ddply(df, .(Driver, Period), summarise, Pct = Total/sum(df$Total) * 100)
这是 dplyr
等价物
library(dplyr)
group_by(df, Driver, Period) %>% summarise(Pct = Total/sum(df$Total) * 100)
我有这个数据框mydf
structure(list(Driver = c("Crop agriculture", "Crop agriculture",
"Infrastructure", "Infrastructure", "Mining", "Mining", "Mixed Agriculture",
"Mixed Agriculture", "Other land use", "Other land use", "Pasture",
"Pasture", "Tree crops", "Tree crops", "Water", "Water"), Period = c("1990-2000",
"1990-2005", "1990-2000", "1990-2005", "1990-2000", "1990-2005",
"1990-2000", "1990-2005", "1990-2000", "1990-2005", "1990-2000",
"1990-2005", "1990-2000", "1990-2005", "1990-2000", "1990-2005"
), Total = c(120328.157829121, 301821.02190182, 12829.2774726025,
10727.4383383233, 1087.58971425679, 639.851573022215, 27213.5917382956,
19832.3424927037, 72326.7471322223, 64524.3243532213, 1064383.44273723,
1347648.2335736, 7814.32273630087, 7672.0730281537, 20332.6943805768,
17504.7712037337), n = c("n = 1669", "n = 783", "n = 298", "n = 151",
"n = 20", "n = 7", "n = 1355", "n = 925", "n = 1623", "n = 851",
"n = 10986", "n = 6039", "n = 316", "n = 211", "n = 466", "n = 244"
)), .Names = c("Driver", "Period", "Total", "n"), class = "data.frame", row.names = c(NA,
-16L))
我们的想法是获取每个驱动程序在此期间的百分比。我已经尝试了 ddply 函数并得到了这行代码。
Percentage<- ddply(mydf, c("Driver", "Period"), summarise,
percent= ((Total/sum(Total))*100))
但是,我只得到所有单元格的 100% 值。有人知道我做错了什么吗?
在您的调用中,当您执行 sum(Total)
时,您使用的是 组 的总值,当与 Total/sum(Total)
一起使用时,只需为这个data/grouping。您可以通过在 sum()
调用中使用 df$Total
来计算整个数据集的总和。使用 ddply
这将是
ddply(df, .(Driver, Period), summarise, Pct = Total/sum(df$Total) * 100)
这是 dplyr
等价物
library(dplyr)
group_by(df, Driver, Period) %>% summarise(Pct = Total/sum(df$Total) * 100)