R中的加权连接或匹配
weighted join or match in R
我正在处理来自加利福尼亚全州数据库 (https://statewidedatabase.org/election.html) 的选举数据。我正在尝试将他们的辖区级选举结果转换为 2010 年人口普查区块级结果。我有选区级别的选举结果
> sov_results
# A tibble: 20,744 x 136
COUNTY FIPS SRPREC_KEY SRPREC ADDIST CDDIST SDDIST BEDIST TOTREG DEMREG REPREG AIPREG GRNREG LIBREG NLPREG REFREG DCLREG MSCREG TOTVOTE
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 49 06097 060971002 1002 2 5 2 2 29 0 0 0 0 0 0 0 0 0 18
2 49 06097 060971003 1003 2 2 2 2 1 0 0 0 0 0 0 0 0 0 0
3 49 06097 060971005 1005 2 2 2 2 106 0 0 0 0 0 0 0 0 0 67
4 49 06097 060971006 1006 2 5 2 2 2 0 0 0 0 0 0 0 0 0 2
5 49 06097 060971007 1007 2 2 2 2 56 0 0 0 0 0 0 0 0 0 42
6 49 06097 060971008 1008 2 5 2 2 148 0 0 0 0 0 0 0 0 0 109
7 49 06097 060971009 1009 2 5 2 2 137 0 0 0 0 0 0 0 0 0 97
8 49 06097 060971012 1012 2 5 2 2 21 0 0 0 0 0 0 0 0 0 16
9 49 06097 060971017 1017 4 5 2 2 723 0 0 0 0 0 0 0 0 0 591
10 49 06097 060971018 1018 2 2 2 2 14 0 0 0 0 0 0 0 0 0 10
# ... with 20,734 more rows, and 117 more variables: DEMVOTE <dbl>, REPVOTE <dbl>, AIPVOTE <dbl>, GRNVOTE <dbl>, LIBVOTE <dbl>,
# NLPVOTE <dbl>, REFVOTE <dbl>, DCLVOTE <dbl>, MSCVOTE <dbl>, PRCVOTE <dbl>, ABSVOTE <dbl>, ASSDEM01 <dbl>, ASSDEM02 <dbl>,
# ASSDEM03 <dbl>, ASSDEM04 <dbl>, ASSDEM05 <dbl>, ASSDEM06 <dbl>, ASSDEM07 <dbl>, ASSDEM08 <dbl>, ASSGRN01 <dbl>, ASSIND01 <dbl>,
# ASSLIB01 <dbl>, ASSPAF01 <dbl>, ASSREP01 <dbl>, ASSREP02 <dbl>, ASSREP03 <dbl>, ASSREP04 <dbl>, CNGAIP01 <dbl>, CNGDEM01 <dbl>,
# CNGDEM02 <dbl>, CNGDEM03 <dbl>, CNGDEM04 <dbl>, CNGDEM05 <dbl>, CNGDEM06 <dbl>, CNGDEM07 <dbl>, CNGDEM08 <dbl>, CNGDEM09 <dbl>,
以及带有权重的转换键。
> conversion
# A tibble: 398,299 x 13
SRPREC FIPS ELECTION TYPE SRPREC_KEY BLOCK_KEY TRACT BLOCK BLKREG SRTOTREG PCTSRPREC BLKTOTREG PCTBLK
<dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 NA 06097 p20 sr_blk 06097nan 060970000000000 0 0 1 NA NA 1 100
2 1002 06097 p20 sr_blk 060971002 060971525011014 152501 1014 26 29 89.7 26 100
3 1002 06097 p20 sr_blk 060971002 060971525013008 152501 3008 3 29 10.3 3 100
4 1003 06097 p20 sr_blk 060971003 060971526005068 152600 5068 1 1 100 1 100
5 1005 06097 p20 sr_blk 060971005 060971526005000 152600 5000 14 106 13.2 43 32.6
6 1005 06097 p20 sr_blk 060971005 060971526005003 152600 5003 12 106 11.3 12 100
7 1005 06097 p20 sr_blk 060971005 060971526005004 152600 5004 12 106 11.3 20 60
8 1005 06097 p20 sr_blk 060971005 060971526005006 152600 5006 5 106 4.72 5 100
9 1005 06097 p20 sr_blk 060971005 060971526005008 152600 5008 24 106 22.6 24 100
10 1005 06097 p20 sr_blk 060971005 060971526005020 152600 5020 28 106 26.4 28 100
我想知道如何将这些选区结果与人口普查区相匹配,从而使人口普查区从选区结果中获得正确数量的选票(基于 PCTSRPREC 列,该列表示选区的百分比)该辖区属于人口普查区)。
例如,我想加入,以便将 SRPREC_KEY 060971005 的 13.2% 分配给 BLOCK 5000。这将是 TOTVOTE 的 13.2%(四舍五入为整数),DEMVOTE 的 13.2% , 13.2% 的 ASSDEM03 投票等
R 中是否有执行此操作的功能或方法?
我认为您正在寻找一个 join/merge 运算,然后是一个简单的乘法运算。
library(dplyr)
select(conversion, SRPREC_KEY, BLOCK, PCTSRPREC) %>%
left_join(., sov_results, by = "SRPREC_KEY") %>%
mutate(across(TOTREG:TOTVOTE, ~ . * PCTSRPREC / 100))
# SRPREC_KEY BLOCK PCTSRPREC COUNTY FIPS SRPREC ADDIST CDDIST SDDIST BEDIST TOTREG DEMREG REPREG AIPREG GRNREG LIBREG NLPREG REFREG DCLREG MSCREG TOTVOTE
# 1 06097nan 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
# 2 060971002 1014 89.70 49 6097 1002 2 5 2 2 26.0130 0 0 0 0 0 0 0 0 0 16.1460
# 3 060971002 3008 10.30 49 6097 1002 2 5 2 2 2.9870 0 0 0 0 0 0 0 0 0 1.8540
# 4 060971003 5068 100.00 49 6097 1003 2 2 2 2 1.0000 0 0 0 0 0 0 0 0 0 0.0000
# 5 060971005 5000 13.20 49 6097 1005 2 2 2 2 13.9920 0 0 0 0 0 0 0 0 0 8.8440
# 6 060971005 5003 11.30 49 6097 1005 2 2 2 2 11.9780 0 0 0 0 0 0 0 0 0 7.5710
# 7 060971005 5004 11.30 49 6097 1005 2 2 2 2 11.9780 0 0 0 0 0 0 0 0 0 7.5710
# 8 060971005 5006 4.72 49 6097 1005 2 2 2 2 5.0032 0 0 0 0 0 0 0 0 0 3.1624
# 9 060971005 5008 22.60 49 6097 1005 2 2 2 2 23.9560 0 0 0 0 0 0 0 0 0 15.1420
# 10 060971005 5020 26.40 49 6097 1005 2 2 2 2 27.9840 0 0 0 0 0 0 0 0 0 17.6880
我正在处理来自加利福尼亚全州数据库 (https://statewidedatabase.org/election.html) 的选举数据。我正在尝试将他们的辖区级选举结果转换为 2010 年人口普查区块级结果。我有选区级别的选举结果
> sov_results
# A tibble: 20,744 x 136
COUNTY FIPS SRPREC_KEY SRPREC ADDIST CDDIST SDDIST BEDIST TOTREG DEMREG REPREG AIPREG GRNREG LIBREG NLPREG REFREG DCLREG MSCREG TOTVOTE
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 49 06097 060971002 1002 2 5 2 2 29 0 0 0 0 0 0 0 0 0 18
2 49 06097 060971003 1003 2 2 2 2 1 0 0 0 0 0 0 0 0 0 0
3 49 06097 060971005 1005 2 2 2 2 106 0 0 0 0 0 0 0 0 0 67
4 49 06097 060971006 1006 2 5 2 2 2 0 0 0 0 0 0 0 0 0 2
5 49 06097 060971007 1007 2 2 2 2 56 0 0 0 0 0 0 0 0 0 42
6 49 06097 060971008 1008 2 5 2 2 148 0 0 0 0 0 0 0 0 0 109
7 49 06097 060971009 1009 2 5 2 2 137 0 0 0 0 0 0 0 0 0 97
8 49 06097 060971012 1012 2 5 2 2 21 0 0 0 0 0 0 0 0 0 16
9 49 06097 060971017 1017 4 5 2 2 723 0 0 0 0 0 0 0 0 0 591
10 49 06097 060971018 1018 2 2 2 2 14 0 0 0 0 0 0 0 0 0 10
# ... with 20,734 more rows, and 117 more variables: DEMVOTE <dbl>, REPVOTE <dbl>, AIPVOTE <dbl>, GRNVOTE <dbl>, LIBVOTE <dbl>,
# NLPVOTE <dbl>, REFVOTE <dbl>, DCLVOTE <dbl>, MSCVOTE <dbl>, PRCVOTE <dbl>, ABSVOTE <dbl>, ASSDEM01 <dbl>, ASSDEM02 <dbl>,
# ASSDEM03 <dbl>, ASSDEM04 <dbl>, ASSDEM05 <dbl>, ASSDEM06 <dbl>, ASSDEM07 <dbl>, ASSDEM08 <dbl>, ASSGRN01 <dbl>, ASSIND01 <dbl>,
# ASSLIB01 <dbl>, ASSPAF01 <dbl>, ASSREP01 <dbl>, ASSREP02 <dbl>, ASSREP03 <dbl>, ASSREP04 <dbl>, CNGAIP01 <dbl>, CNGDEM01 <dbl>,
# CNGDEM02 <dbl>, CNGDEM03 <dbl>, CNGDEM04 <dbl>, CNGDEM05 <dbl>, CNGDEM06 <dbl>, CNGDEM07 <dbl>, CNGDEM08 <dbl>, CNGDEM09 <dbl>,
以及带有权重的转换键。
> conversion
# A tibble: 398,299 x 13
SRPREC FIPS ELECTION TYPE SRPREC_KEY BLOCK_KEY TRACT BLOCK BLKREG SRTOTREG PCTSRPREC BLKTOTREG PCTBLK
<dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 NA 06097 p20 sr_blk 06097nan 060970000000000 0 0 1 NA NA 1 100
2 1002 06097 p20 sr_blk 060971002 060971525011014 152501 1014 26 29 89.7 26 100
3 1002 06097 p20 sr_blk 060971002 060971525013008 152501 3008 3 29 10.3 3 100
4 1003 06097 p20 sr_blk 060971003 060971526005068 152600 5068 1 1 100 1 100
5 1005 06097 p20 sr_blk 060971005 060971526005000 152600 5000 14 106 13.2 43 32.6
6 1005 06097 p20 sr_blk 060971005 060971526005003 152600 5003 12 106 11.3 12 100
7 1005 06097 p20 sr_blk 060971005 060971526005004 152600 5004 12 106 11.3 20 60
8 1005 06097 p20 sr_blk 060971005 060971526005006 152600 5006 5 106 4.72 5 100
9 1005 06097 p20 sr_blk 060971005 060971526005008 152600 5008 24 106 22.6 24 100
10 1005 06097 p20 sr_blk 060971005 060971526005020 152600 5020 28 106 26.4 28 100
我想知道如何将这些选区结果与人口普查区相匹配,从而使人口普查区从选区结果中获得正确数量的选票(基于 PCTSRPREC 列,该列表示选区的百分比)该辖区属于人口普查区)。
例如,我想加入,以便将 SRPREC_KEY 060971005 的 13.2% 分配给 BLOCK 5000。这将是 TOTVOTE 的 13.2%(四舍五入为整数),DEMVOTE 的 13.2% , 13.2% 的 ASSDEM03 投票等 R 中是否有执行此操作的功能或方法?
我认为您正在寻找一个 join/merge 运算,然后是一个简单的乘法运算。
library(dplyr)
select(conversion, SRPREC_KEY, BLOCK, PCTSRPREC) %>%
left_join(., sov_results, by = "SRPREC_KEY") %>%
mutate(across(TOTREG:TOTVOTE, ~ . * PCTSRPREC / 100))
# SRPREC_KEY BLOCK PCTSRPREC COUNTY FIPS SRPREC ADDIST CDDIST SDDIST BEDIST TOTREG DEMREG REPREG AIPREG GRNREG LIBREG NLPREG REFREG DCLREG MSCREG TOTVOTE
# 1 06097nan 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
# 2 060971002 1014 89.70 49 6097 1002 2 5 2 2 26.0130 0 0 0 0 0 0 0 0 0 16.1460
# 3 060971002 3008 10.30 49 6097 1002 2 5 2 2 2.9870 0 0 0 0 0 0 0 0 0 1.8540
# 4 060971003 5068 100.00 49 6097 1003 2 2 2 2 1.0000 0 0 0 0 0 0 0 0 0 0.0000
# 5 060971005 5000 13.20 49 6097 1005 2 2 2 2 13.9920 0 0 0 0 0 0 0 0 0 8.8440
# 6 060971005 5003 11.30 49 6097 1005 2 2 2 2 11.9780 0 0 0 0 0 0 0 0 0 7.5710
# 7 060971005 5004 11.30 49 6097 1005 2 2 2 2 11.9780 0 0 0 0 0 0 0 0 0 7.5710
# 8 060971005 5006 4.72 49 6097 1005 2 2 2 2 5.0032 0 0 0 0 0 0 0 0 0 3.1624
# 9 060971005 5008 22.60 49 6097 1005 2 2 2 2 23.9560 0 0 0 0 0 0 0 0 0 15.1420
# 10 060971005 5020 26.40 49 6097 1005 2 2 2 2 27.9840 0 0 0 0 0 0 0 0 0 17.6880