使用 R 计算单元格中的月数、季度数和计数
Counting the number of months, quarters and counts from a cell using R
来自给定数据框 (df$desc) 示例的单元格包含以下内容:Month:High=Mar_May Low=Jul_Oct | Qtr:High=Q3
如何从上面给出的单元格生成字段(必需 Table)-
必填table:
Fields Count
Month Count 4
Quarter Count 1
Month High Count 2
Month Low Count 2
Quarter High Count 1
Quarter Low Count 0
逻辑:
- 月计数 - 4 ;因为我们在单元格上有 4 个月,即三月、五月、七月和十月。
- 季度计数 - 1 ;因为我们在单元格上有 1 个季度,即 Q3
- 月最高计数 - 2;因为我们在设置为 High
的单元格上有 2 个月的时间
- Qtr 高计数 - 1;因为我们在设置为 High
的单元格上有 1 个季度
- 月低计数 - 2 ;因为我们在设置为 Low
的单元格上有 2 个月的时间
- Qtr 低计数 - 0;因为我们在设置为 Low
的单元格上有 0 Quarter
这样的事情似乎可行。它比需要的要长很多,这样您就可以查看每个转换。
library(dplyr)
library(tidyr)
library(stringr)
df <- data.frame(desc=c(
'Month:High=Mar_May Low=Jul_Oct | Qtr:High=Q3',
'Month:High=Jan_Feb_Jun_Sep Low=Aug |',
' | Qtr:High=Q2',
' | Qtr:Low=Q2_Q3'
))
df %>%
# Split the month from the quarter, using the pipe
separate(desc, into = c('MonthPart', 'QuarterPart'), sep = '\|', remove = FALSE) %>%
# Get the high and low parts for month and quarter
# These will be NA where the values are missing
mutate(HighMonth = str_extract(MonthPart, '(?<=High=)[^ ]+'),
LowMonth = str_extract(MonthPart, '(?<=Low=)[^ ]+'),
HighQuarter = str_extract(QuarterPart, '(?<=High=)[^ ]+'),
LowQuarter = str_extract(QuarterPart, '(?<=Low=)[^ ]+')) %>%
# Work out the number of months/quarters using the length of the string
# e.g. each month is 3 characters plus the _ (add 1 because the last month
# or quarter has no _)
mutate(HighMonthCount = (nchar(HighMonth) + 1) / 4,
LowMonthCount = (nchar(LowMonth) + 1) / 4,
HighQuarterCount = (nchar(HighQuarter) + 1) / 3,
LowQuarterCount = (nchar(LowQuarter) + 1) / 3) %>%
# NAs to 0
mutate_if(is.numeric, ~if_else(is.na(.), 0, .)) %>%
# Work out total month and quarter counts
mutate(MonthCount = HighMonthCount + LowMonthCount,
QuarterCount = HighQuarterCount + LowQuarterCount) %>%
# Just keep the columns of interest
select(desc, contains('Count')) %>%
# Pivot to required format
pivot_longer(!desc, names_to = 'Fields', values_to = 'Count')
来自给定数据框 (df$desc) 示例的单元格包含以下内容:Month:High=Mar_May Low=Jul_Oct | Qtr:High=Q3
如何从上面给出的单元格生成字段(必需 Table)-
必填table:
Fields Count
Month Count 4
Quarter Count 1
Month High Count 2
Month Low Count 2
Quarter High Count 1
Quarter Low Count 0
逻辑:
- 月计数 - 4 ;因为我们在单元格上有 4 个月,即三月、五月、七月和十月。
- 季度计数 - 1 ;因为我们在单元格上有 1 个季度,即 Q3
- 月最高计数 - 2;因为我们在设置为 High 的单元格上有 2 个月的时间
- Qtr 高计数 - 1;因为我们在设置为 High 的单元格上有 1 个季度
- 月低计数 - 2 ;因为我们在设置为 Low 的单元格上有 2 个月的时间
- Qtr 低计数 - 0;因为我们在设置为 Low 的单元格上有 0 Quarter
这样的事情似乎可行。它比需要的要长很多,这样您就可以查看每个转换。
library(dplyr)
library(tidyr)
library(stringr)
df <- data.frame(desc=c(
'Month:High=Mar_May Low=Jul_Oct | Qtr:High=Q3',
'Month:High=Jan_Feb_Jun_Sep Low=Aug |',
' | Qtr:High=Q2',
' | Qtr:Low=Q2_Q3'
))
df %>%
# Split the month from the quarter, using the pipe
separate(desc, into = c('MonthPart', 'QuarterPart'), sep = '\|', remove = FALSE) %>%
# Get the high and low parts for month and quarter
# These will be NA where the values are missing
mutate(HighMonth = str_extract(MonthPart, '(?<=High=)[^ ]+'),
LowMonth = str_extract(MonthPart, '(?<=Low=)[^ ]+'),
HighQuarter = str_extract(QuarterPart, '(?<=High=)[^ ]+'),
LowQuarter = str_extract(QuarterPart, '(?<=Low=)[^ ]+')) %>%
# Work out the number of months/quarters using the length of the string
# e.g. each month is 3 characters plus the _ (add 1 because the last month
# or quarter has no _)
mutate(HighMonthCount = (nchar(HighMonth) + 1) / 4,
LowMonthCount = (nchar(LowMonth) + 1) / 4,
HighQuarterCount = (nchar(HighQuarter) + 1) / 3,
LowQuarterCount = (nchar(LowQuarter) + 1) / 3) %>%
# NAs to 0
mutate_if(is.numeric, ~if_else(is.na(.), 0, .)) %>%
# Work out total month and quarter counts
mutate(MonthCount = HighMonthCount + LowMonthCount,
QuarterCount = HighQuarterCount + LowQuarterCount) %>%
# Just keep the columns of interest
select(desc, contains('Count')) %>%
# Pivot to required format
pivot_longer(!desc, names_to = 'Fields', values_to = 'Count')