如何将函数应用于现有数据框的结果添加?
How to add the results of applying a function to an existing data frame?
我正在尝试计算某些比率的置信区间。
我正在使用 tidyverse 和 epitools 从 Byar 的方法计算 CI。
我几乎可以肯定做错了什么。
library (tidyverse)
library (epitools)
# here's my made up data
DISEASE = c("Marco Polio","Marco Polio","Marco Polio","Marco Polio","Marco Polio",
"Mumps","Mumps","Mumps","Mumps","Mumps",
"Chicky Pox","Chicky Pox","Chicky Pox","Chicky Pox","Chicky Pox")
YEAR = c(2011, 2012, 2013, 2014, 2015,
2011, 2012, 2013, 2014, 2015,
2011, 2012, 2013, 2014, 2015)
VALUE = c(82,89,79,51,51,
79,91,69,89,78,
71,69,95,61,87)
AREA =c("A", "B","C")
DATA = data.frame(DISEASE, YEAR, VALUE,AREA)
# this is a simplification, I have the population values in another table, which I've merged
# to give me the dataframe I then apply pois.byar to.
DATA$POPN = ifelse(DATA$AREA == "A",2.5,
ifelse(DATA$AREA == "B",3,
ifelse(DATA$AREA == "C",7,0)))
# this bit calculates the number of things per area
rates<-DATA%>%group_by(DISEASE,AREA,POPN)%>%
count(AREA)
然后如果我想计算 CI 我认为这可行
rates<-DATA%>%group_by(DISEASE,AREA,POPN)%>%
count(AREA) %>%
mutate(pois.byar(rates$n,rates$POPN))
但我明白了
Error in mutate_impl(.data, dots) :
Evaluation error: arguments imply differing number of rows: 0, 1.
然而这有效:
pois.byar(rates$n,rates$POPN)
然后说:"turn the results of the pois.byar function into a dataframe and then merge back to the original" 似乎很愚蠢。我可能只是为了获取一些数据而尝试过……我不想那样做。这不是正确的做事方式。
非常感谢收到任何建议。
我认为这是一个相当基本的问题。这表明我不是坐着学习,而是边做边做。
这就是我想要的
疾病年份 n 地区 popn x pt 率 lower upper conf.level
我不清楚你的预期输出对我来说应该是什么。您的评论并没有真正的帮助。最好 明确地 包含您提供的示例数据的预期输出。
这里的问题是pois.byvar
returns a data.frame
。因此,为了 mutate
能够使用 pois.byvar
的输出,我们需要将 data.frame
存储在 list
.
中
这是您的代码的更简洁版本
library(tidyverse)
DATA %>%
mutate(POPN = case_when(
AREA == "A" ~ 2.5,
AREA == "B" ~ 3,
AREA == "C" ~ 7,
TRUE ~ 0)) %>%
group_by(DISEASE,AREA,POPN) %>%
count(AREA) %>%
mutate(res = list(pois.byar(n, POPN)))
这将创建一个列 res
,其中包含 pois.byar
的 data.frame
输出。
或者您可能希望 unnest
list
列将条目扩展到不同的列中?
library(tidyverse)
DATA %>%
mutate(POPN = case_when(
AREA == "A" ~ 2.5,
AREA == "B" ~ 3,
AREA == "C" ~ 7,
TRUE ~ 0)) %>%
group_by(DISEASE,AREA,POPN) %>%
count(AREA) %>%
mutate(res = list(pois.byar(n, POPN))) %>%
unnest()
## A tibble: 9 x 10
## Groups: DISEASE, AREA, POPN [9]
# DISEASE AREA POPN n x pt rate lower upper conf.level
# <fct> <fct> <dbl> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Chicky Pox A 2.5 1 1 2.5 0.4 0.0363 1.86 0.95
#2 Chicky Pox B 3 2 2 3 0.667 0.133 2.14 0.95
#3 Chicky Pox C 7 2 2 7 0.286 0.0570 0.916 0.95
#4 Marco Polio A 2.5 2 2 2.5 0.8 0.160 2.56 0.95
#5 Marco Polio B 3 2 2 3 0.667 0.133 2.14 0.95
#6 Marco Polio C 7 1 1 7 0.143 0.0130 0.666 0.95
#7 Mumps A 2.5 2 2 2.5 0.8 0.160 2.56 0.95
#8 Mumps B 3 1 1 3 0.333 0.0302 1.55 0.95
#9 Mumps C 7 2 2 7 0.286 0.0570 0.916 0.95
我正在尝试计算某些比率的置信区间。 我正在使用 tidyverse 和 epitools 从 Byar 的方法计算 CI。
我几乎可以肯定做错了什么。
library (tidyverse)
library (epitools)
# here's my made up data
DISEASE = c("Marco Polio","Marco Polio","Marco Polio","Marco Polio","Marco Polio",
"Mumps","Mumps","Mumps","Mumps","Mumps",
"Chicky Pox","Chicky Pox","Chicky Pox","Chicky Pox","Chicky Pox")
YEAR = c(2011, 2012, 2013, 2014, 2015,
2011, 2012, 2013, 2014, 2015,
2011, 2012, 2013, 2014, 2015)
VALUE = c(82,89,79,51,51,
79,91,69,89,78,
71,69,95,61,87)
AREA =c("A", "B","C")
DATA = data.frame(DISEASE, YEAR, VALUE,AREA)
# this is a simplification, I have the population values in another table, which I've merged
# to give me the dataframe I then apply pois.byar to.
DATA$POPN = ifelse(DATA$AREA == "A",2.5,
ifelse(DATA$AREA == "B",3,
ifelse(DATA$AREA == "C",7,0)))
# this bit calculates the number of things per area
rates<-DATA%>%group_by(DISEASE,AREA,POPN)%>%
count(AREA)
然后如果我想计算 CI 我认为这可行
rates<-DATA%>%group_by(DISEASE,AREA,POPN)%>%
count(AREA) %>%
mutate(pois.byar(rates$n,rates$POPN))
但我明白了
Error in mutate_impl(.data, dots) :
Evaluation error: arguments imply differing number of rows: 0, 1.
然而这有效:
pois.byar(rates$n,rates$POPN)
然后说:"turn the results of the pois.byar function into a dataframe and then merge back to the original" 似乎很愚蠢。我可能只是为了获取一些数据而尝试过……我不想那样做。这不是正确的做事方式。
非常感谢收到任何建议。 我认为这是一个相当基本的问题。这表明我不是坐着学习,而是边做边做。
这就是我想要的 疾病年份 n 地区 popn x pt 率 lower upper conf.level
我不清楚你的预期输出对我来说应该是什么。您的评论并没有真正的帮助。最好 明确地 包含您提供的示例数据的预期输出。
这里的问题是pois.byvar
returns a data.frame
。因此,为了 mutate
能够使用 pois.byvar
的输出,我们需要将 data.frame
存储在 list
.
这是您的代码的更简洁版本
library(tidyverse)
DATA %>%
mutate(POPN = case_when(
AREA == "A" ~ 2.5,
AREA == "B" ~ 3,
AREA == "C" ~ 7,
TRUE ~ 0)) %>%
group_by(DISEASE,AREA,POPN) %>%
count(AREA) %>%
mutate(res = list(pois.byar(n, POPN)))
这将创建一个列 res
,其中包含 pois.byar
的 data.frame
输出。
或者您可能希望 unnest
list
列将条目扩展到不同的列中?
library(tidyverse)
DATA %>%
mutate(POPN = case_when(
AREA == "A" ~ 2.5,
AREA == "B" ~ 3,
AREA == "C" ~ 7,
TRUE ~ 0)) %>%
group_by(DISEASE,AREA,POPN) %>%
count(AREA) %>%
mutate(res = list(pois.byar(n, POPN))) %>%
unnest()
## A tibble: 9 x 10
## Groups: DISEASE, AREA, POPN [9]
# DISEASE AREA POPN n x pt rate lower upper conf.level
# <fct> <fct> <dbl> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Chicky Pox A 2.5 1 1 2.5 0.4 0.0363 1.86 0.95
#2 Chicky Pox B 3 2 2 3 0.667 0.133 2.14 0.95
#3 Chicky Pox C 7 2 2 7 0.286 0.0570 0.916 0.95
#4 Marco Polio A 2.5 2 2 2.5 0.8 0.160 2.56 0.95
#5 Marco Polio B 3 2 2 3 0.667 0.133 2.14 0.95
#6 Marco Polio C 7 1 1 7 0.143 0.0130 0.666 0.95
#7 Mumps A 2.5 2 2 2.5 0.8 0.160 2.56 0.95
#8 Mumps B 3 1 1 3 0.333 0.0302 1.55 0.95
#9 Mumps C 7 2 2 7 0.286 0.0570 0.916 0.95