基于另一个变量的变量的汇总统计

summary statistics for a variable based on another variable

我试图找到 ID 中重复某些值的 x 值的数量,然后根据新结果找到总体上的最小值、最大值、IQR 和中值;

ID <- c("ID004", "ID004", "ID004", "ID004", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID009", "ID009", "ID009", "ID009", "ID009", "ID009", "ID020", "ID020")
D <- c("CMP-001", "CMP-001","CMP-001","CMP-001","CMP-001", "CMP-001","CMP-002", "CMP-002", "CMP-002", "CMP-003", "CMP-003", "CMP-003", "CMP-004", "CMP-004", "CMP-004", "CMP-001", "CMP-001", "CMP-001", "CMP-001", "CMP-002", "CMP-002", "CMP-001", "CMP-001")
X <- c(3,3,3,3,1,1,3,3,3,1,1,1,4,4,4,4,4,4,4,2,2,2,2)
data <- data.frame(ID, D, X)

我们首先找出每个ID有多少个x值;

ID.       No. of X values
ID004.          1
ID006.          4
ID009           2
ID020           1

那么根据这个结果我们应该得到下面的结果;

                          Min.    Median.    Max.     IQR
Number of X per ID        1         1.5        4      3-1

我认为我们需要创建一个新变量,其中包含每个 ID 的 X 值。然后找到新变量的夏季统计数据

感谢您的帮助

希望这个回答:

> data %>% group_by(ID) %>% summarise(Min = min(X), Median = median(X), Max = max(X), IQR = IQR(X), No_of_X_values = length(rle(X)[[1]]))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 4 x 6
  ID      Min Median   Max   IQR No_of_X_values
  <chr> <dbl>  <dbl> <dbl> <dbl>          <int>
1 ID004     3      3     3   0                1
2 ID006     1      3     4   2.5              4
3 ID009     2      4     4   1.5              2
4 ID020     2      2     2   0                1
> 

可以在新数据框中存储 ID 和 x 值的数量,并对 x 值的数量进行汇总统计:

> x_values <- data %>% group_by(ID) %>% summarise(No_of_X_values = length(rle(X)[[1]]))
`summarise()` ungrouping output (override with `.groups` argument)
> x_values
# A tibble: 4 x 2
  ID    No_of_X_values
  <chr>          <int>
1 ID004              1
2 ID006              4
3 ID009              2
4 ID020              1
> summary(x_values$No_of_X_values)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    1.0     1.0     1.5     2.0     2.5     4.0