将一个 Dataframe 的值除以另一个 Dataframe 的值

Question

这是我的两个数据框：

    structure(list(Full.Name = c("A. Patrick Beharelle", "A. Patrick Beharelle", 
"Aaron P. Graft", "Aaron P. Graft", "Aaron P. Jagdfeld"), year = c(2019, 
2020, 2019, 2020, 2019), counter = c(5541L, 3269L, 165L, 200L, 
4L)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-5L), groups = structure(list(Full.Name = c("A. Patrick Beharelle", 
"Aaron P. Graft", "Aaron P. Jagdfeld"), .rows = structure(list(
    1:2, 3:4, 5L), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -3L), .drop = TRUE))

和

structure(list(authority_dic = c("accomplished", "accomplished", 
"accomplished", "accomplished", "accomplished"), Full.Name = c("A. Patrick Beharelle", "A. Patrick Beharelle", 
"Aaron P. Graft", "Aaron P. Graft", "Aaron P. Jagdfeld"), Entity = c("WERNER ENTERPRISES INC", "MONDELEZ INTERNATIONAL INC", 
"AEROJET ROCKETDYNE HOLDINGS", "T-MOBILE US INC", "SOUTHWEST AIRLINES"
), `2019` = c(1L, 0L, 1L, 0L, 0L), `2020` = c(0L, 1L, 0L, 3L, 
1L)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-5L), groups = structure(list(authority_dic = c("accomplished", 
"accomplished", "accomplished", "accomplished", "accomplished"
), Full.Name = c("Derek J. Leathers", "Dirk Van de Put", "Eileen P. Drake", 
"G. Michael Sievert", "Gary C. Kelly"), .rows = structure(list(
    1L, 2L, 3L, 4L, 5L), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L), .drop = TRUE))

现在，我想将“2019”列的每个值除以另一个数据框的“计数器”值，并将结果添加为另一列。复杂性就位了，因为我只想除以 2019 年和（例如）Aaron P. Graft 的“计数器”值。我想对包含名称“Aaron P. Graft”的数据框的每一行执行此操作，因此从该行中包含“Aaron P. Graft”的其他数据框中获取“counter”的值。

我自己想不通。也许我需要转置第一个数据框中的年份和计数器列，但我不知道。

这就是我想要实现的目标：

authority_dic	Full.name	2019	2020	2019_freq	2020_freq
example word	Aaron P. Jagdfeld	10	20	10/counter(of 2019)	20/counter(of 2020)

如果有任何问题，请不要介意问我。提前致谢！！！

Answer 1

让结构为 s1 和 s2，这应该可行：

library(tidyr)
mutate(
      full_join(
         summarise(
            group_by(s2, authority_dic, Full.Name),
            `2019`=sum(`2019`),
            `2020`=sum(`2020`)),
         s1 %>% spread(year,counter),
         by=c("Full.Name")),
      `2019_freq`=`2019.x`/`2019.y`,
      `2020_freq`=`2020.x`/`2020.y`)
# A tibble: 3 × 8
# Groups:   authority_dic [1]
  authority_dic Full.Name            `2019.x` `2020.x` `2019.y` `2020.y` `2019_freq` `2020_freq`
  <chr>         <chr>                   <int>    <int>    <int>    <int>       <dbl>       <dbl>
1 accomplished  A. Patrick Beharelle        1        1     5541     3269    0.000180    0.000306
2 accomplished  Aaron P. Graft              1        3      165      200    0.00606     0.015   
3 accomplished  Aaron P. Jagdfeld           0        1        4       NA    0          NA

好的做法是避免使用值命名列，例如2019.... 请改用 year。您的模型需要重构为正常形式（有关详细信息，请参阅数据库规范化主题）。

将一个 Dataframe 的值除以另一个 Dataframe 的值

Divide Value from one Dataframe by Value of another Dataframe

grouping

r

dataframe

dplyr