R 条件 rowSums 替换为基于百分比的总和
R conditional rowSums to replace with sums based on percentage
如果这些行代表 <1% 的数据,我正在寻找有条件的 rowSums - 然后用 rowSums 替换原始值。 *如果 table 可以包含汇总到名称列中的行数(例如,“Other(n=2)”),则有额外好处。这是一个更大功能的一小部分。请参见下面的示例:
示例数据:
name
Year1
Year2
Year3
Total
Percent
John
1
2
1
4
0.7029877
Paul
230
100
150
480
84.358524
George
41
30
10
81
14.235501
Ringo
2
1
1
4
0.7029877
# Code for example data
name <- c("John", "Paul", "George", "Ringo")
Year1 <- c(1, 230, 41, 2)
Year2 <- c(2, 100, 30, 1)
Year3 <- c(1, 150, 10, 1)
df <- data.frame(name, Year1, Year2, Year3)
df$Total <- rowSums(select(df,Year1:Year3))
df$Percent <- df$Total/sum(df$Total)*100
在解决方案中,John 和 Ringo 将合并为一个 'Other' 解决方案,因为两者的百分比 < 1。
# Code for example solution
name <- c("Paul", "George", "Other(n=2)")
Year1 <- c(230, 41, 3)
Year2 <- c(100, 30, 3)
Year3 <- c(150, 10, 2)
df2 <- data.frame(name, Year1, Year2, Year3)
df2$Total <- rowSums(select(df2,Year1:Year3))
df2$Percent <- df2$Total/sum(df2$Total)*100
示例解决方案:
name
Year1
Year2
Year3
Total
Percent
Paul
230
100
150
480
84.358524
George
41
30
10
81
14.235501
Other(n=2)
3
3
2
8
1.405975
library(tidyverse) # or use forcats::fct_lump(...
df %>%
mutate(name_lumped = fct_lump(name, w = Percent, prop = 0.01)) %>%
group_by(name_lumped) %>%
summarize(across(Year1:Percent, sum))
# A tibble: 3 x 6
name_lumped Year1 Year2 Year3 Total Percent
<fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 George 41 30 10 81 14.2
2 Paul 230 100 150 480 84.4
3 Other 3 3 2 8 1.41
如果这些行代表 <1% 的数据,我正在寻找有条件的 rowSums - 然后用 rowSums 替换原始值。 *如果 table 可以包含汇总到名称列中的行数(例如,“Other(n=2)”),则有额外好处。这是一个更大功能的一小部分。请参见下面的示例:
示例数据:
name | Year1 | Year2 | Year3 | Total | Percent |
---|---|---|---|---|---|
John | 1 | 2 | 1 | 4 | 0.7029877 |
Paul | 230 | 100 | 150 | 480 | 84.358524 |
George | 41 | 30 | 10 | 81 | 14.235501 |
Ringo | 2 | 1 | 1 | 4 | 0.7029877 |
# Code for example data
name <- c("John", "Paul", "George", "Ringo")
Year1 <- c(1, 230, 41, 2)
Year2 <- c(2, 100, 30, 1)
Year3 <- c(1, 150, 10, 1)
df <- data.frame(name, Year1, Year2, Year3)
df$Total <- rowSums(select(df,Year1:Year3))
df$Percent <- df$Total/sum(df$Total)*100
在解决方案中,John 和 Ringo 将合并为一个 'Other' 解决方案,因为两者的百分比 < 1。
# Code for example solution
name <- c("Paul", "George", "Other(n=2)")
Year1 <- c(230, 41, 3)
Year2 <- c(100, 30, 3)
Year3 <- c(150, 10, 2)
df2 <- data.frame(name, Year1, Year2, Year3)
df2$Total <- rowSums(select(df2,Year1:Year3))
df2$Percent <- df2$Total/sum(df2$Total)*100
示例解决方案:
name | Year1 | Year2 | Year3 | Total | Percent |
---|---|---|---|---|---|
Paul | 230 | 100 | 150 | 480 | 84.358524 |
George | 41 | 30 | 10 | 81 | 14.235501 |
Other(n=2) | 3 | 3 | 2 | 8 | 1.405975 |
library(tidyverse) # or use forcats::fct_lump(...
df %>%
mutate(name_lumped = fct_lump(name, w = Percent, prop = 0.01)) %>%
group_by(name_lumped) %>%
summarize(across(Year1:Percent, sum))
# A tibble: 3 x 6
name_lumped Year1 Year2 Year3 Total Percent
<fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 George 41 30 10 81 14.2
2 Paul 230 100 150 480 84.4
3 Other 3 3 2 8 1.41