使用 R 计算单元格中的月数、季度数和计数

Counting the number of months, quarters and counts from a cell using R

来自给定数据框 (df$desc) 示例的单元格包含以下内容:Month:High=Mar_May Low=Jul_Oct | Qtr:High=Q3

如何从上面给出的单元格生成字段(必需 Table)-

必填table:

Fields                 Count
Month Count               4
Quarter Count             1
Month High Count          2
Month Low Count           2
Quarter High Count        1
Quarter Low Count         0

逻辑:

  1. 月计数 - 4 ;因为我们在单元格上有 4 个月,即三月、五月、七月和十月。
  2. 季度计数 - 1 ;因为我们在单元格上有 1 个季度,即 Q3
  3. 月最高计数 - 2;因为我们在设置为 High
  4. 的单元格上有 2 个月的时间
  5. Qtr 高计数 - 1;因为我们在设置为 High
  6. 的单元格上有 1 个季度
  7. 月低计数 - 2 ;因为我们在设置为 Low
  8. 的单元格上有 2 个月的时间
  9. Qtr 低计数 - 0;因为我们在设置为 Low
  10. 的单元格上有 0 Quarter

这样的事情似乎可行。它比需要的要长很多,这样您就可以查看每个转换。

library(dplyr)
library(tidyr)
library(stringr)

df <- data.frame(desc=c(
    'Month:High=Mar_May  Low=Jul_Oct | Qtr:High=Q3',
    'Month:High=Jan_Feb_Jun_Sep Low=Aug |',
    ' | Qtr:High=Q2',
    ' | Qtr:Low=Q2_Q3'
))

df %>% 
  # Split the month from the quarter, using the pipe
  separate(desc, into = c('MonthPart', 'QuarterPart'), sep = '\|', remove = FALSE) %>%
  
  # Get the high and low parts for month and quarter
  # These will be NA where the values are missing
  mutate(HighMonth = str_extract(MonthPart, '(?<=High=)[^ ]+'),
         LowMonth = str_extract(MonthPart, '(?<=Low=)[^ ]+'),
         HighQuarter = str_extract(QuarterPart, '(?<=High=)[^ ]+'),
         LowQuarter = str_extract(QuarterPart, '(?<=Low=)[^ ]+')) %>%
  
  # Work out the number of months/quarters using the length of the string
  # e.g. each month is 3 characters plus the _ (add 1 because the last month
  # or quarter has no _)
  mutate(HighMonthCount = (nchar(HighMonth) + 1) / 4,
         LowMonthCount = (nchar(LowMonth) + 1) / 4,
         HighQuarterCount = (nchar(HighQuarter) + 1) / 3,
         LowQuarterCount = (nchar(LowQuarter) + 1) / 3) %>%
  
  # NAs to 0
  mutate_if(is.numeric, ~if_else(is.na(.), 0, .)) %>% 
  
  # Work out total month and quarter counts
  mutate(MonthCount = HighMonthCount + LowMonthCount,
         QuarterCount = HighQuarterCount + LowQuarterCount) %>% 
  
  # Just keep the columns of interest
  select(desc, contains('Count')) %>% 
  
  # Pivot to required format
  pivot_longer(!desc, names_to = 'Fields', values_to = 'Count')