
R: Calculate percentage of observations in a column that are below a certain value for panel data

我有面板数据,我想获得列(大小)中低于 100 万的观测值百分比。


structure(list(Product = c("A", "A", "A", "A", "A", "A", "B", 
"B", "B", "B", "B", "B", "C", "C", "C", "C", "C", "C"), Date = c("02.05.2018", 
"04.05.2018", "05.05.2018", "06.05.2018", "07.05.2018", "08.05.2018", 
"02.05.2018", "04.05.2018", "05.05.2018", "06.05.2018", "07.05.2018", 
"08.05.2018", "02.05.2018", "04.05.2018", "05.05.2018", "06.05.2018", 
"07.05.2018", "08.05.2018"), Size = c(100023423, 1920, 2434324342, 
2342353566, 345345345, 432, 1.35135e+11, 312332, 23434, 4622436246, 
3252243, 234525, 57457457, 56848648, 36363546, 36535636, 2345, 
2.52646e+11)), class = "data.frame", row.names = c(NA, -18L))

因此,例如,对于产品 A,它将是 33.33%,因为 6 个观察值中有两个低于 100 万。

我在 R 中尝试了以下方法

df <- df %>%
  group_by(Product) %>%
  dplyr:: summarise(CountDate = n(), SmallSize = count(Size<1000000))

但是,我收到一条错误消息,指出“没有适用于 'count' 的方法应用于 class 的对象“合乎逻辑””,尽管 Size 列的格式为双精度。

在上面的代码之后,我将计算 SmallSize/CountDate 以获得百分比。


而不是需要数据的 count。frame/tibble,在逻辑向量上使用 sum 来获取计数 - TRUE 值将被计为 1和 FALSE 作为 0

df %>%
  group_by(Product) %>%
  dplyr:: summarise(CountDate = n(),
     SmallSize = sum(Size<1000000, na.rm = TRUE), .groups = "drop") %>%
  dplyr::mutate(Percent = SmallSize/CountDate)
# A tibble: 3 × 4
  Product CountDate SmallSize Percent
  <chr>       <int>     <int>   <dbl>
1 A               6         2   0.333
2 B               6         3   0.5  
3 C               6         1   0.167


df %>%
    group_by(Product) %>%
    dplyr::summarise(Percent = mean(Size < 1000000, na.rm = TRUE))
# A tibble: 3 × 2
  Product Percent
  <chr>     <dbl>
1 A         0.333
2 B         0.5  
3 C         0.167