编写一个函数,通过变量条件导入数据和计算汇总统计并写入输出文件
Write a function that imports data and calculates summary statistics by variable conditions and writes output files
所以最初我有以下对象:
> head(gs)
year disturbance lek_id complex tot_male
1 2006 N 3T Diamond 3
2 2007 N 3T Diamond 17
3 1981 N bare 3corners 4
4 1982 N bare 3corners 7
5 1983 N bare 3corners 2
6 1985 N bare 3corners 5
据此我计算了一般统计数据:tot_male 年 [=30] 的 n、最小值、最大值、平均值和标准差=].然后我使用以下方法将这些按年合并到一个数据集中:
gsnew <- gs %>% group_by(year, complex) %>%
summarise(n = length(tot_male), male_min = min(tot_male), male_max = max(tot_male), male_mean = mean(tot_male), male_sd = sd(tot_male))
导致:
> gsnew
Source: local data frame [119 x 7]
Groups: year [?]
year complex n male_min male_max male_mean male_sd
(int) (fctr) (int) (int) (int) (dbl) (dbl)
1 1967 Diamond 2 33 101 67.000000 48.083261
2 1969 Diamond 2 29 69 49.000000 28.284271
3 1970 3corners 1 26 26 26.000000 NA
4 1970 Diamond 4 3 51 26.250000 21.093048
5 1971 3corners 3 6 22 12.333333 8.504901
我将如何编写以下格式的通用函数
FunctionName=function(Argument1,...,ArgumentN) {Statement1,...,StatementN}
• Argument1-N are any variable from object(s) • Statement1-N are any valid R statements
这让我可以:
• 导入数据
• Select 从数据中 指定年份 需要统计数据;
• 计算 lek complex 内 指定年份的 均值、2SD、n 和 90% 置信区间
• 将基于年度的输出写为单独的 *.csv 文件
year complex mean st.dev2 n lo90ci hi90ci
2007 3corners 26.28571 52.04760 7 -393.50827 446.07970
2007 Blue 18.87500 20.15476 8 -40.00856 77.75856
2007 book_cliffs 4.50000 13.19091 6 -24.62443 33.62443
2007 Diamond 13.25000 48.83431 20 -205.38461 231.88461
嗯,我觉得你很接近。它可能看起来像这样:
read_write = function(file_name, this_year) {
file_name %>%
read.csv %>%
filter(year == this_year) %>%
summarise(n = length(tot_male),
male_min = min(tot_male),
male_max = max(tot_male),
male_mean = mean(tot_male),
male_sd = sd(tot_male),
male_2sd = 2*male_sd,
male_upper_bound = male_mean + 1.645*male_sd,
male_lower_bound = male_mean - 1.645*male_sd) %>%
write.csv("out_" %>% paste0(filename), row.names = false)
}
感谢@bramtayl
这是最终代码:
> library(dplyr)
> annualleksummary = function(x1) {
+ x1 %>%
+ read.csv %>%
+ filter(tot_male, year == 2007) %>% group_by(year, complex) %>%
+ summarise(n = length(tot_male),
+ male_min = min(tot_male),
+ male_max = max(tot_male),
+ male_mean = mean(tot_male),
+ male_sd = sd(tot_male),
+ male_2sd = 2*male_sd,
+ male_upper_bound = male_mean + 1.645*male_sd,
+ male_lower_bound = male_mean - 1.645*male_sd) %>%
+ write.csv("2007_" %>% paste0(x1), row.names = F)
+ }
> annualleksummary("gsg_leks.csv")
所以最初我有以下对象:
> head(gs)
year disturbance lek_id complex tot_male
1 2006 N 3T Diamond 3
2 2007 N 3T Diamond 17
3 1981 N bare 3corners 4
4 1982 N bare 3corners 7
5 1983 N bare 3corners 2
6 1985 N bare 3corners 5
据此我计算了一般统计数据:tot_male 年 [=30] 的 n、最小值、最大值、平均值和标准差=].然后我使用以下方法将这些按年合并到一个数据集中:
gsnew <- gs %>% group_by(year, complex) %>%
summarise(n = length(tot_male), male_min = min(tot_male), male_max = max(tot_male), male_mean = mean(tot_male), male_sd = sd(tot_male))
导致:
> gsnew
Source: local data frame [119 x 7]
Groups: year [?]
year complex n male_min male_max male_mean male_sd
(int) (fctr) (int) (int) (int) (dbl) (dbl)
1 1967 Diamond 2 33 101 67.000000 48.083261
2 1969 Diamond 2 29 69 49.000000 28.284271
3 1970 3corners 1 26 26 26.000000 NA
4 1970 Diamond 4 3 51 26.250000 21.093048
5 1971 3corners 3 6 22 12.333333 8.504901
我将如何编写以下格式的通用函数
FunctionName=function(Argument1,...,ArgumentN) {Statement1,...,StatementN}
• Argument1-N are any variable from object(s) • Statement1-N are any valid R statements
这让我可以: • 导入数据 • Select 从数据中 指定年份 需要统计数据; • 计算 lek complex 内 指定年份的 均值、2SD、n 和 90% 置信区间 • 将基于年度的输出写为单独的 *.csv 文件
year complex mean st.dev2 n lo90ci hi90ci
2007 3corners 26.28571 52.04760 7 -393.50827 446.07970
2007 Blue 18.87500 20.15476 8 -40.00856 77.75856
2007 book_cliffs 4.50000 13.19091 6 -24.62443 33.62443
2007 Diamond 13.25000 48.83431 20 -205.38461 231.88461
嗯,我觉得你很接近。它可能看起来像这样:
read_write = function(file_name, this_year) {
file_name %>%
read.csv %>%
filter(year == this_year) %>%
summarise(n = length(tot_male),
male_min = min(tot_male),
male_max = max(tot_male),
male_mean = mean(tot_male),
male_sd = sd(tot_male),
male_2sd = 2*male_sd,
male_upper_bound = male_mean + 1.645*male_sd,
male_lower_bound = male_mean - 1.645*male_sd) %>%
write.csv("out_" %>% paste0(filename), row.names = false)
}
感谢@bramtayl
这是最终代码:
> library(dplyr)
> annualleksummary = function(x1) {
+ x1 %>%
+ read.csv %>%
+ filter(tot_male, year == 2007) %>% group_by(year, complex) %>%
+ summarise(n = length(tot_male),
+ male_min = min(tot_male),
+ male_max = max(tot_male),
+ male_mean = mean(tot_male),
+ male_sd = sd(tot_male),
+ male_2sd = 2*male_sd,
+ male_upper_bound = male_mean + 1.645*male_sd,
+ male_lower_bound = male_mean - 1.645*male_sd) %>%
+ write.csv("2007_" %>% paste0(x1), row.names = F)
+ }
> annualleksummary("gsg_leks.csv")