如何对 R 中的行子集求和
How to sum over subsets of rows in R
我正在使用 R 与麻省理工学院管家的美国县级 voting data 合作。我想知道每位候选人在每个县获得的总票数。对于某些州,例如威斯康星州,这很容易:
"state", "county_name", "county_fips", "candidate", "party", "candidatevotes", "totalvotes", "mode"<br>
"WISCONSIN", "WINNEBAGO", "55139", "JO JORGENSEN", "LIBERTARIAN", 1629, 94032, "TOTAL"
对于其他州,例如犹他州,这是可行的:
"UTAH", "WEBER", "49057", "DONALD J TRUMP", "REPUBLICAN", 0, 111403, "EARLY"<br>
"UTAH", "WEBER", "49057", "DONALD J TRUMP", "REPUBLICAN", 0, 111403, "ELECTION DAY"<br>
"UTAH", "WEBER", "49057", "DONALD J TRUMP", "REPUBLICAN", 0, 111403, "MAIL"<br>
"UTAH", "WEBER", "49057", "DONALD J TRUMP", "REPUBLICAN", 65949, 111403, "TOTAL"
然而,南卡罗来纳州存在问题:
"SOUTH CAROLINA", "YORK", "45091", "JOSEPH R BIDEN JR", "DEMOCRAT", 13656, 144050, "ABSENTEE BY MAIL"<br>
"SOUTH CAROLINA", "YORK", "45091", "JOSEPH R BIDEN JR", "DEMOCRAT", 22075, 144050, "ELECTION DAY"<br>
"SOUTH CAROLINA", "YORK", "45091", "JOSEPH R BIDEN JR", "DEMOCRAT", 18, 144050, "FAILSAFE"<br>
"SOUTH CAROLINA", "YORK", "45091", "JOSEPH R BIDEN JR", "DEMOCRAT", 176, 144050, "FAILSAFE PROVISIONAL"<br>
"SOUTH CAROLINA", "YORK", "45091", "JOSEPH R BIDEN JR", "DEMOCRAT", 22950, 144050, "IN-PERSON ABSENTEE"<br>
"SOUTH CAROLINA", "YORK", "45091", "JOSEPH R BIDEN JR", "DEMOCRAT", 133, 144050, "PROVISIONAL"
在我看来,应该有一些方法可以遍历 FIPS 代码和政党名称以生成每个县的总数,但我很困惑。
这是否解决了您的问题?
library(tidyverse)
df <- read_csv("~/Desktop/countypres_2000-2020.csv")
#> Rows: 72617 Columns: 12
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (8): state, state_po, county_name, county_fips, office, candidate, party...
#> dbl (4): year, candidatevotes, totalvotes, version
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df %>%
filter(year == 2020) %>%
group_by(candidate, county_fips) %>%
summarise(
county_name,
total_votes_per_candidate_per_county = sum(candidatevotes)
) %>%
relocate(candidate, .before = 4) %>%
distinct() %>%
arrange(county_fips)
#> `summarise()` has grouped output by 'candidate', 'county_fips'. You can override using the `.groups` argument.
#> # A tibble: 11,902 × 4
#> # Groups: candidate, county_fips [11,898]
#> county_fips county_name candidate total_votes_per_candidate_per_coun…
#> <chr> <chr> <chr> <dbl>
#> 1 01001 AUTAUGA DONALD J TRUMP 19838
#> 2 01001 AUTAUGA JOSEPH R BIDEN JR 7503
#> 3 01001 AUTAUGA OTHER 429
#> 4 01003 BALDWIN DONALD J TRUMP 83544
#> 5 01003 BALDWIN JOSEPH R BIDEN JR 24578
#> 6 01003 BALDWIN OTHER 1557
#> 7 01005 BARBOUR DONALD J TRUMP 5622
#> 8 01005 BARBOUR JOSEPH R BIDEN JR 4816
#> 9 01005 BARBOUR OTHER 80
#> 10 01007 BIBB DONALD J TRUMP 7525
#> # … with 11,892 more rows
由 reprex package (v2.0.1)
创建于 2022-01-20
我正在使用 R 与麻省理工学院管家的美国县级 voting data 合作。我想知道每位候选人在每个县获得的总票数。对于某些州,例如威斯康星州,这很容易:
"state", "county_name", "county_fips", "candidate", "party", "candidatevotes", "totalvotes", "mode"<br>
"WISCONSIN", "WINNEBAGO", "55139", "JO JORGENSEN", "LIBERTARIAN", 1629, 94032, "TOTAL"
对于其他州,例如犹他州,这是可行的:
"UTAH", "WEBER", "49057", "DONALD J TRUMP", "REPUBLICAN", 0, 111403, "EARLY"<br>
"UTAH", "WEBER", "49057", "DONALD J TRUMP", "REPUBLICAN", 0, 111403, "ELECTION DAY"<br>
"UTAH", "WEBER", "49057", "DONALD J TRUMP", "REPUBLICAN", 0, 111403, "MAIL"<br>
"UTAH", "WEBER", "49057", "DONALD J TRUMP", "REPUBLICAN", 65949, 111403, "TOTAL"
然而,南卡罗来纳州存在问题:
"SOUTH CAROLINA", "YORK", "45091", "JOSEPH R BIDEN JR", "DEMOCRAT", 13656, 144050, "ABSENTEE BY MAIL"<br>
"SOUTH CAROLINA", "YORK", "45091", "JOSEPH R BIDEN JR", "DEMOCRAT", 22075, 144050, "ELECTION DAY"<br>
"SOUTH CAROLINA", "YORK", "45091", "JOSEPH R BIDEN JR", "DEMOCRAT", 18, 144050, "FAILSAFE"<br>
"SOUTH CAROLINA", "YORK", "45091", "JOSEPH R BIDEN JR", "DEMOCRAT", 176, 144050, "FAILSAFE PROVISIONAL"<br>
"SOUTH CAROLINA", "YORK", "45091", "JOSEPH R BIDEN JR", "DEMOCRAT", 22950, 144050, "IN-PERSON ABSENTEE"<br>
"SOUTH CAROLINA", "YORK", "45091", "JOSEPH R BIDEN JR", "DEMOCRAT", 133, 144050, "PROVISIONAL"
在我看来,应该有一些方法可以遍历 FIPS 代码和政党名称以生成每个县的总数,但我很困惑。
这是否解决了您的问题?
library(tidyverse)
df <- read_csv("~/Desktop/countypres_2000-2020.csv")
#> Rows: 72617 Columns: 12
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (8): state, state_po, county_name, county_fips, office, candidate, party...
#> dbl (4): year, candidatevotes, totalvotes, version
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df %>%
filter(year == 2020) %>%
group_by(candidate, county_fips) %>%
summarise(
county_name,
total_votes_per_candidate_per_county = sum(candidatevotes)
) %>%
relocate(candidate, .before = 4) %>%
distinct() %>%
arrange(county_fips)
#> `summarise()` has grouped output by 'candidate', 'county_fips'. You can override using the `.groups` argument.
#> # A tibble: 11,902 × 4
#> # Groups: candidate, county_fips [11,898]
#> county_fips county_name candidate total_votes_per_candidate_per_coun…
#> <chr> <chr> <chr> <dbl>
#> 1 01001 AUTAUGA DONALD J TRUMP 19838
#> 2 01001 AUTAUGA JOSEPH R BIDEN JR 7503
#> 3 01001 AUTAUGA OTHER 429
#> 4 01003 BALDWIN DONALD J TRUMP 83544
#> 5 01003 BALDWIN JOSEPH R BIDEN JR 24578
#> 6 01003 BALDWIN OTHER 1557
#> 7 01005 BARBOUR DONALD J TRUMP 5622
#> 8 01005 BARBOUR JOSEPH R BIDEN JR 4816
#> 9 01005 BARBOUR OTHER 80
#> 10 01007 BIBB DONALD J TRUMP 7525
#> # … with 11,892 more rows
由 reprex package (v2.0.1)
创建于 2022-01-20