将一个 Dataframe 的值除以另一个 Dataframe 的值
Divide Value from one Dataframe by Value of another Dataframe
这是我的两个数据框:
structure(list(Full.Name = c("A. Patrick Beharelle", "A. Patrick Beharelle",
"Aaron P. Graft", "Aaron P. Graft", "Aaron P. Jagdfeld"), year = c(2019,
2020, 2019, 2020, 2019), counter = c(5541L, 3269L, 165L, 200L,
4L)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA,
-5L), groups = structure(list(Full.Name = c("A. Patrick Beharelle",
"Aaron P. Graft", "Aaron P. Jagdfeld"), .rows = structure(list(
1:2, 3:4, 5L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -3L), .drop = TRUE))
和
structure(list(authority_dic = c("accomplished", "accomplished",
"accomplished", "accomplished", "accomplished"), Full.Name = c("A. Patrick Beharelle", "A. Patrick Beharelle",
"Aaron P. Graft", "Aaron P. Graft", "Aaron P. Jagdfeld"), Entity = c("WERNER ENTERPRISES INC", "MONDELEZ INTERNATIONAL INC",
"AEROJET ROCKETDYNE HOLDINGS", "T-MOBILE US INC", "SOUTHWEST AIRLINES"
), `2019` = c(1L, 0L, 1L, 0L, 0L), `2020` = c(0L, 1L, 0L, 3L,
1L)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA,
-5L), groups = structure(list(authority_dic = c("accomplished",
"accomplished", "accomplished", "accomplished", "accomplished"
), Full.Name = c("Derek J. Leathers", "Dirk Van de Put", "Eileen P. Drake",
"G. Michael Sievert", "Gary C. Kelly"), .rows = structure(list(
1L, 2L, 3L, 4L, 5L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L), .drop = TRUE))
现在,我想将“2019”列的每个值除以另一个数据框的“计数器”值,并将结果添加为另一列。复杂性就位了,因为我只想除以 2019 年和(例如)Aaron P. Graft 的“计数器”值。
我想对包含名称“Aaron P. Graft”的数据框的每一行执行此操作,因此从该行中包含“Aaron P. Graft”的其他数据框中获取“counter”的值。
我自己想不通。也许我需要转置第一个数据框中的年份和计数器列,但我不知道。
这就是我想要实现的目标:
authority_dic
Full.name
2019
2020
2019_freq
2020_freq
example word
Aaron P. Jagdfeld
10
20
10/counter(of 2019)
20/counter(of 2020)
如果有任何问题,请不要介意问我。
提前致谢!!!
让结构为 s1
和 s2
,这应该可行:
library(tidyr)
mutate(
full_join(
summarise(
group_by(s2, authority_dic, Full.Name),
`2019`=sum(`2019`),
`2020`=sum(`2020`)),
s1 %>% spread(year,counter),
by=c("Full.Name")),
`2019_freq`=`2019.x`/`2019.y`,
`2020_freq`=`2020.x`/`2020.y`)
# A tibble: 3 × 8
# Groups: authority_dic [1]
authority_dic Full.Name `2019.x` `2020.x` `2019.y` `2020.y` `2019_freq` `2020_freq`
<chr> <chr> <int> <int> <int> <int> <dbl> <dbl>
1 accomplished A. Patrick Beharelle 1 1 5541 3269 0.000180 0.000306
2 accomplished Aaron P. Graft 1 3 165 200 0.00606 0.015
3 accomplished Aaron P. Jagdfeld 0 1 4 NA 0 NA
好的做法是避免使用值命名列,例如2019.... 请改用 year
。您的模型需要重构为正常形式(有关详细信息,请参阅数据库规范化主题)。
这是我的两个数据框:
structure(list(Full.Name = c("A. Patrick Beharelle", "A. Patrick Beharelle",
"Aaron P. Graft", "Aaron P. Graft", "Aaron P. Jagdfeld"), year = c(2019,
2020, 2019, 2020, 2019), counter = c(5541L, 3269L, 165L, 200L,
4L)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA,
-5L), groups = structure(list(Full.Name = c("A. Patrick Beharelle",
"Aaron P. Graft", "Aaron P. Jagdfeld"), .rows = structure(list(
1:2, 3:4, 5L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -3L), .drop = TRUE))
和
structure(list(authority_dic = c("accomplished", "accomplished",
"accomplished", "accomplished", "accomplished"), Full.Name = c("A. Patrick Beharelle", "A. Patrick Beharelle",
"Aaron P. Graft", "Aaron P. Graft", "Aaron P. Jagdfeld"), Entity = c("WERNER ENTERPRISES INC", "MONDELEZ INTERNATIONAL INC",
"AEROJET ROCKETDYNE HOLDINGS", "T-MOBILE US INC", "SOUTHWEST AIRLINES"
), `2019` = c(1L, 0L, 1L, 0L, 0L), `2020` = c(0L, 1L, 0L, 3L,
1L)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA,
-5L), groups = structure(list(authority_dic = c("accomplished",
"accomplished", "accomplished", "accomplished", "accomplished"
), Full.Name = c("Derek J. Leathers", "Dirk Van de Put", "Eileen P. Drake",
"G. Michael Sievert", "Gary C. Kelly"), .rows = structure(list(
1L, 2L, 3L, 4L, 5L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L), .drop = TRUE))
现在,我想将“2019”列的每个值除以另一个数据框的“计数器”值,并将结果添加为另一列。复杂性就位了,因为我只想除以 2019 年和(例如)Aaron P. Graft 的“计数器”值。 我想对包含名称“Aaron P. Graft”的数据框的每一行执行此操作,因此从该行中包含“Aaron P. Graft”的其他数据框中获取“counter”的值。
我自己想不通。也许我需要转置第一个数据框中的年份和计数器列,但我不知道。
这就是我想要实现的目标:
authority_dic | Full.name | 2019 | 2020 | 2019_freq | 2020_freq |
---|---|---|---|---|---|
example word | Aaron P. Jagdfeld | 10 | 20 | 10/counter(of 2019) | 20/counter(of 2020) |
如果有任何问题,请不要介意问我。 提前致谢!!!
让结构为 s1
和 s2
,这应该可行:
library(tidyr)
mutate(
full_join(
summarise(
group_by(s2, authority_dic, Full.Name),
`2019`=sum(`2019`),
`2020`=sum(`2020`)),
s1 %>% spread(year,counter),
by=c("Full.Name")),
`2019_freq`=`2019.x`/`2019.y`,
`2020_freq`=`2020.x`/`2020.y`)
# A tibble: 3 × 8
# Groups: authority_dic [1]
authority_dic Full.Name `2019.x` `2020.x` `2019.y` `2020.y` `2019_freq` `2020_freq`
<chr> <chr> <int> <int> <int> <int> <dbl> <dbl>
1 accomplished A. Patrick Beharelle 1 1 5541 3269 0.000180 0.000306
2 accomplished Aaron P. Graft 1 3 165 200 0.00606 0.015
3 accomplished Aaron P. Jagdfeld 0 1 4 NA 0 NA
好的做法是避免使用值命名列,例如2019.... 请改用 year
。您的模型需要重构为正常形式(有关详细信息,请参阅数据库规范化主题)。