R中间隔不均匀的多个文件的分箱

Question

我有很多行数不等的文件。

一个文件的数据如下所示：

Height Temp
1014.0 22.4
992.0 23
850.0 15.2
557.0 -6.1
407.0 -17.1
314.0 -29.5
200 -51.9

我想取以下区间内的平均值。

我必须对具有不同高度值的多个文件执行此操作。

关于如何在 R 中正确执行此操作的任何建议？我将不胜感激任何帮助。

Answer 1

你可以用cut对你的身高进行分组，然后分组汇总：

library(dplyr)

mutate(df, category = cut(Height, c(seq(0, 700, 100), 850, 925, 1000, Inf))) %>%
  group_by(category) %>%
  summarise(average_height = mean(Height, na.rm = TRUE))

# A tibble: 7 x 2
  category    average_height
  <fct>                <dbl>
1 (100,200]              200
2 (300,400]              314
3 (400,500]              407
4 (500,600]              557
5 (700,850]              850
6 (925,1e+03]            992
7 (1e+03,Inf]           1014

Answer 2

这里的一个选择是使用 sqldf，使用日历 table 连接和聚合方法：

library(sqldf)
sql <- "SELECT c.min_temp, c.max_temp, AVG(t.Temp) AS temp_avg
        FROM temps c
        LEFT JOIN df t ON t.Temp > c.min_temp AND t.Temp <= c.max_temp
        GROUP BY c.min_temp, c.max_temp"
result <- sqldf(sql)

数据：

# this data frame stores the temperature ranges for the averages
temps <- data.frame(min_temp=c(0, 100, 200, 300, 400, 500, 600, 700, 850, 925),
                    max_temp=c(100, 200, 300, 400, 500, 600, 700, 850, 925, 1000))

Answer 3

Base R 中的另一种选择，使用 findInterval 创建组并使用 aggregate 按组获取 mean。

df$group <- findInterval(df$Height, c(seq(0, 700, 100), 850, 925, 1000))
aggregate(Height~group, df, mean)

R中间隔不均匀的多个文件的分箱

Binning of multiple files with uneven intervals in R

r

intervals

bin