R中的加权连接或匹配

weighted join or match in R

我正在处理来自加利福尼亚全州数据库 (https://statewidedatabase.org/election.html) 的选举数据。我正在尝试将他们的辖区级选举结果转换为 2010 年人口普查区块级结果。我有选区级别的选举结果

> sov_results
# A tibble: 20,744 x 136
   COUNTY FIPS  SRPREC_KEY SRPREC ADDIST CDDIST SDDIST BEDIST TOTREG DEMREG REPREG AIPREG GRNREG LIBREG NLPREG REFREG DCLREG MSCREG TOTVOTE
    <dbl> <chr> <chr>       <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>   <dbl>
 1     49 06097 060971002    1002      2      5      2      2     29      0      0      0      0      0      0      0      0      0      18
 2     49 06097 060971003    1003      2      2      2      2      1      0      0      0      0      0      0      0      0      0       0
 3     49 06097 060971005    1005      2      2      2      2    106      0      0      0      0      0      0      0      0      0      67
 4     49 06097 060971006    1006      2      5      2      2      2      0      0      0      0      0      0      0      0      0       2
 5     49 06097 060971007    1007      2      2      2      2     56      0      0      0      0      0      0      0      0      0      42
 6     49 06097 060971008    1008      2      5      2      2    148      0      0      0      0      0      0      0      0      0     109
 7     49 06097 060971009    1009      2      5      2      2    137      0      0      0      0      0      0      0      0      0      97
 8     49 06097 060971012    1012      2      5      2      2     21      0      0      0      0      0      0      0      0      0      16
 9     49 06097 060971017    1017      4      5      2      2    723      0      0      0      0      0      0      0      0      0     591
10     49 06097 060971018    1018      2      2      2      2     14      0      0      0      0      0      0      0      0      0      10
# ... with 20,734 more rows, and 117 more variables: DEMVOTE <dbl>, REPVOTE <dbl>, AIPVOTE <dbl>, GRNVOTE <dbl>, LIBVOTE <dbl>,
#   NLPVOTE <dbl>, REFVOTE <dbl>, DCLVOTE <dbl>, MSCVOTE <dbl>, PRCVOTE <dbl>, ABSVOTE <dbl>, ASSDEM01 <dbl>, ASSDEM02 <dbl>,
#   ASSDEM03 <dbl>, ASSDEM04 <dbl>, ASSDEM05 <dbl>, ASSDEM06 <dbl>, ASSDEM07 <dbl>, ASSDEM08 <dbl>, ASSGRN01 <dbl>, ASSIND01 <dbl>,
#   ASSLIB01 <dbl>, ASSPAF01 <dbl>, ASSREP01 <dbl>, ASSREP02 <dbl>, ASSREP03 <dbl>, ASSREP04 <dbl>, CNGAIP01 <dbl>, CNGDEM01 <dbl>,
#   CNGDEM02 <dbl>, CNGDEM03 <dbl>, CNGDEM04 <dbl>, CNGDEM05 <dbl>, CNGDEM06 <dbl>, CNGDEM07 <dbl>, CNGDEM08 <dbl>, CNGDEM09 <dbl>,

以及带有权重的转换键。

> conversion
# A tibble: 398,299 x 13
   SRPREC FIPS  ELECTION TYPE   SRPREC_KEY BLOCK_KEY        TRACT BLOCK BLKREG SRTOTREG PCTSRPREC BLKTOTREG PCTBLK
    <dbl> <chr> <chr>    <chr>  <chr>      <chr>            <dbl> <dbl>  <dbl>    <dbl>     <dbl>     <dbl>  <dbl>
 1     NA 06097 p20      sr_blk 06097nan   060970000000000      0     0      1       NA     NA            1  100  
 2   1002 06097 p20      sr_blk 060971002  060971525011014 152501  1014     26       29     89.7         26  100  
 3   1002 06097 p20      sr_blk 060971002  060971525013008 152501  3008      3       29     10.3          3  100  
 4   1003 06097 p20      sr_blk 060971003  060971526005068 152600  5068      1        1    100            1  100  
 5   1005 06097 p20      sr_blk 060971005  060971526005000 152600  5000     14      106     13.2         43   32.6
 6   1005 06097 p20      sr_blk 060971005  060971526005003 152600  5003     12      106     11.3         12  100  
 7   1005 06097 p20      sr_blk 060971005  060971526005004 152600  5004     12      106     11.3         20   60  
 8   1005 06097 p20      sr_blk 060971005  060971526005006 152600  5006      5      106      4.72         5  100  
 9   1005 06097 p20      sr_blk 060971005  060971526005008 152600  5008     24      106     22.6         24  100  
10   1005 06097 p20      sr_blk 060971005  060971526005020 152600  5020     28      106     26.4         28  100 

我想知道如何将这些选区结果与人口普查区相匹配,从而使人口普查区从选区结果中获得正确数量的选票(基于 PCTSRPREC 列,该列表示选区的百分比)该辖区属于人口普查区)。

例如,我想加入,以便将 SRPREC_KEY 060971005 的 13.2% 分配给 BLOCK 5000。这将是 TOTVOTE 的 13.2%(四舍五入为整数),DEMVOTE 的 13.2% , 13.2% 的 ASSDEM03 投票等 R 中是否有执行此操作的功能或方法?

我认为您正在寻找一个 join/merge 运算,然后是一个简单的乘法运算。

library(dplyr)
select(conversion, SRPREC_KEY, BLOCK, PCTSRPREC) %>%
  left_join(., sov_results, by = "SRPREC_KEY") %>%
  mutate(across(TOTREG:TOTVOTE, ~ . * PCTSRPREC / 100))
#    SRPREC_KEY BLOCK PCTSRPREC COUNTY FIPS SRPREC ADDIST CDDIST SDDIST BEDIST  TOTREG DEMREG REPREG AIPREG GRNREG LIBREG NLPREG REFREG DCLREG MSCREG TOTVOTE
# 1    06097nan     0        NA     NA   NA     NA     NA     NA     NA     NA      NA     NA     NA     NA     NA     NA     NA     NA     NA     NA      NA
# 2   060971002  1014     89.70     49 6097   1002      2      5      2      2 26.0130      0      0      0      0      0      0      0      0      0 16.1460
# 3   060971002  3008     10.30     49 6097   1002      2      5      2      2  2.9870      0      0      0      0      0      0      0      0      0  1.8540
# 4   060971003  5068    100.00     49 6097   1003      2      2      2      2  1.0000      0      0      0      0      0      0      0      0      0  0.0000
# 5   060971005  5000     13.20     49 6097   1005      2      2      2      2 13.9920      0      0      0      0      0      0      0      0      0  8.8440
# 6   060971005  5003     11.30     49 6097   1005      2      2      2      2 11.9780      0      0      0      0      0      0      0      0      0  7.5710
# 7   060971005  5004     11.30     49 6097   1005      2      2      2      2 11.9780      0      0      0      0      0      0      0      0      0  7.5710
# 8   060971005  5006      4.72     49 6097   1005      2      2      2      2  5.0032      0      0      0      0      0      0      0      0      0  3.1624
# 9   060971005  5008     22.60     49 6097   1005      2      2      2      2 23.9560      0      0      0      0      0      0      0      0      0 15.1420
# 10  060971005  5020     26.40     49 6097   1005      2      2      2      2 27.9840      0      0      0      0      0      0      0      0      0 17.6880