编写一个函数，通过变量条件导入数据和计算汇总统计并写入输出文件

Question

所以最初我有以下对象：

> head(gs)
  year disturbance lek_id  complex tot_male
1 2006           N     3T  Diamond        3
2 2007           N     3T  Diamond       17
3 1981           N   bare 3corners        4
4 1982           N   bare 3corners        7
5 1983           N   bare 3corners        2
6 1985           N   bare 3corners        5

据此我计算了一般统计数据：tot_male 年 [=30] 的 n、最小值、最大值、平均值和标准差=].然后我使用以下方法将这些按年合并到一个数据集中：

gsnew <- gs %>% group_by(year, complex) %>% summarise(n = length(tot_male), male_min = min(tot_male), male_max = max(tot_male), male_mean = mean(tot_male), male_sd = sd(tot_male))

导致：

> gsnew Source: local data frame [119 x 7] Groups: year [?] year complex n male_min male_max male_mean male_sd (int) (fctr) (int) (int) (int) (dbl) (dbl) 1 1967 Diamond 2 33 101 67.000000 48.083261 2 1969 Diamond 2 29 69 49.000000 28.284271 3 1970 3corners 1 26 26 26.000000 NA 4 1970 Diamond 4 3 51 26.250000 21.093048 5 1971 3corners 3 6 22 12.333333 8.504901

我将如何编写以下格式的通用函数

FunctionName=function(Argument1,...,ArgumentN) {Statement1,...,StatementN} • Argument1-N are any variable from object(s) • Statement1-N are any valid R statements

这让我可以： • 导入数据 • Select 从数据中 指定年份 需要统计数据； • 计算 lek complex 内 指定年份的 均值、2SD、n 和 90% 置信区间 • 将基于年度的输出写为单独的 *.csv 文件

year complex mean st.dev2 n lo90ci hi90ci 2007 3corners 26.28571 52.04760 7 -393.50827 446.07970 2007 Blue 18.87500 20.15476 8 -40.00856 77.75856 2007 book_cliffs 4.50000 13.19091 6 -24.62443 33.62443 2007 Diamond 13.25000 48.83431 20 -205.38461 231.88461

Answer 1

嗯，我觉得你很接近。它可能看起来像这样：

read_write = function(file_name, this_year) {
  file_name %>%
  read.csv %>%
  filter(year == this_year) %>%
  summarise(n = length(tot_male), 
            male_min = min(tot_male), 
            male_max = max(tot_male), 
            male_mean = mean(tot_male), 
            male_sd = sd(tot_male),
            male_2sd = 2*male_sd,
            male_upper_bound = male_mean + 1.645*male_sd,
            male_lower_bound = male_mean - 1.645*male_sd) %>%
  write.csv("out_" %>% paste0(filename), row.names = false)
  }

Answer 2

感谢@bramtayl

这是最终代码：

> library(dplyr)
> annualleksummary = function(x1) {
+   x1 %>%
+   read.csv %>% 
+   filter(tot_male, year == 2007) %>% group_by(year, complex) %>%
+   summarise(n = length(tot_male), 
+             male_min = min(tot_male), 
+             male_max = max(tot_male), 
+             male_mean = mean(tot_male), 
+             male_sd = sd(tot_male),
+             male_2sd = 2*male_sd,
+             male_upper_bound = male_mean + 1.645*male_sd,
+             male_lower_bound = male_mean - 1.645*male_sd) %>%
+   write.csv("2007_" %>% paste0(x1), row.names = F) 
+   }
> annualleksummary("gsg_leks.csv")

编写一个函数，通过变量条件导入数据和计算汇总统计并写入输出文件

Write a function that imports data and calculates summary statistics by variable conditions and writes output files

statistics

import

r

function