R中间隔的繁忙百分比
busy percentage for intervals in R
我在医院工作。我们的医生在夜间和傍晚时间随叫随到。有时可能没有病人来,所以他们可以休息。其他时候,会有很多病人同时出现。
他们写下开始和停止治疗患者的时间。使用 lubridate 包,我可以将这些数据转换为具有特定日期的时间间隔。这些间隔的长度会有很大差异,因为治疗可能或多或少复杂。此外,当发生很多事情时,医生可能会在患者之间来回走动。因此,典型的条目将如下所示:“2016-06-11 21:45:00 UTC” “2016-06-11 22:35:00 UTC”
要查看一天中哪些时间通常很忙,哪些时间比较慢,我想使用这些数据。这也应该适用于一周中的不同日子。
整个事情可能看起来像一个条形图,显示一天中任何时间的平均占用率(例如,晚上 8 点到 9 点之间占用 100%,凌晨 1 点到 2 点之间占用 40%)。
我的问题是我不知道该怎么做。 ggplot 不会处理间隔,我还没有找到任何包可以对间隔执行此平均值或百分比。
我希望我能够弄清楚我需要什么以及我的问题是什么。我不是一个经验丰富的程序员,但乐于学习。
非常感谢
瓦伦丁
编辑:
抱歉,我早该想到的。所以这是我所到之处:
>Daten<-read.csv2("Dienstdatum.csv")
>Beginn<-parse_date_time(Daten$Beginn,"dmy HM“,tz="CET“)
>Ende<-parse_date_time(Daten$Ende,"dmy HM",tz="CET“)
##Interval with date information
>Daten$Intervalle<-interval(Beginn,Ende)
##Intervals stripped of date
>Daten$Beg<-as.POSIXct(strftime(Beginn, format="%H:%M:%S"), format="%H:%M:%S")
> Daten$dur<-as.duration(Daten$Intervall)
> Daten$Interv<-as.interval(Daten$dur , Daten$Beg)
## add weekdays
>Daten$Wochentage<-weekdays(Beginn)
这样我就有了指向同一日期的时间间隔,并且我有工作日来对数据进行排序。这就是我被困住的地方,因为我不知道如何按时间间隔绘制某种直方图。我可以只使用开始日期,但这会严重扭曲,因为间隔可能在 5 分钟到 2 小时之间。
希望代码对您有所帮助。如果您需要一些示例数据,请告诉我。
编辑(2):
这些是原始数据
https://www.dropbox.com/s/tok32wzt9wjmjih/Dienstdatum.csv?dl=0
和 dput 的输出:
https://www.dropbox.com/s/wgtw68rw9n0ksct/Output%20Dput.rtf?dl=0
恐怕数据结构不尽如人意,但它应该仍然有效。不确定 post 内联输出是否是个好主意,所以我提供了文件。
所以在四处挖掘并尝试不同的东西之后,这就是我想出的并且对我有用的东西。解决方案有些复杂,冗长的总线显然给出了准确的结果。在试图找到答案的过程中,我了解了向量化的 vonders(请原谅我的德国口音),因为生成向量化代码将计算结果所需的时间缩短到大约 3 分钟,而之前我在大约 96 小时后停止计算而没有完成。
请注意,记录日期列表(并非每个医生都会完成其轮班的记录)是一个 excel sheet 的简单日期。工作间隔记录时间列表是某人开始在一列中看望患者并在另一列中停止看望该患者的日期和时间。下一行将是类似的开始和停止时间和日期。
文本中的所有变量都是德语或德语单词的缩写,但我希望我的评论足以理解发生了什么。另外,很多代码都是针对我具体情况的问题。
特别感谢用户 PhiSeu 和 user3507085,他们在解决方案的各个方面帮助了我。
#read dates
package(lubridate)
Daten<-read.csv2(„file.csv")
#convert start dates to POSIX
Daten$Beginn<-parse_date_time(Daten$Beginn,"dmy HM",tz="CET")
#prevent overlap by adding one second
Daten$Beginn<-Daten$Beginn+1
#convert end dates to POSIX
Daten$Ende<-parse_date_time(Daten$Ende,"dmy HM",tz="CET")
#remove empty rows
Daten<-na.omit(Daten)
#create intervals in which people worked
Daten$Intervall<-interval(Daten$Beginn,Daten$Ende)
#read dates on which people worked
doku<-read.csv2(„dates.csv“,header=FALSE)
doku<-parse_date_time(doku$V1,"%d.%m.%Y",tz="cet")
#create a start time of 09 A.M. for shifts
doku<-data.frame(cbind(doku,doku+32400))
#add column names
names(doku)<-c("Datum","Beginn")
#convert to POSIX
doku$Datum<-as.POSIXct(doku$Datum,origin="1970-01-01",tz="cet")
doku$Beginn<-as.POSIXct(doku$Beginn,origin="1970-01-01",tz="cet")
#Loop to create 15 min intervals for each documented shift spanning 24 hour against which actual working hours will be checked
begin <- as.POSIXct(doku$Beginn)
# copy begin time for loop
begin_new <- begin
# create duration object
aufl <- duration(15, "mins")
# count times for loop
times <- 24*60/15
# create dataframe with begin time
Intervall <- data.frame(begin,stringsAsFactors = FALSE)
for (i in 1:times){
cat("test",i,"\n")
# save old time for interval calculation
begin_start <- begin_new
# add 15 Minutes to original time
begin_new <- begin_new + aufl
cat(begin_new,"\n")
# create an interval object between
new_dur <- interval(begin_start,begin_new)
# bind to original dataframe
Intervall <- cbind(Intervall,new_dur)
}
# Add column names
vec_names <- paste0("v",c(1:(times+1)))
colnames(Intervall) <- vec_names
#create a matrix of the number of seconds worked in each of the above 15 intervals by checking the amount of intersection between 15 intervals and documented intervals of work
test<-vector()
Tabelle<-matrix(nrow=length(doku$Beginn),ncol=times)
Tabelle[is.na(Tabelle)]<-0
for (j in 1:length(doku$Beginn)){
for (k in 1:times){
test<-as.duration(intersect(Daten$Intervall,Intervall[j,k+1]))
test[is.na(test)]<-0
test<-sum(test)
Tabelle[j,k]<-test}}
#cadd start time to the above matrix
Ausw<-data.frame(cbind(Tabelle,begin))
#convert to POSIX
Ausw$begin<-as.POSIXct(Ausw$begin,origin="1970-01-01",tz="cet")
##analysis of data
#common to all days of the week
#create labels for 15 min intervals
Labels<-c("09","09:15","09:30","09:45","10","10:15","10:30","10:45","11","11:15","11:30","11:45","12","12:15","12:30","12:45","13","13:15","13:30","13:45","14","14:15","14:30","14:45","15","15:15","15:30","15:45","16","16:15","16:30","16:45","17","17:15","17:30","17:45","18","18:15","18:30","18:45","19","19:15","19:30","19:45","20","20:15","20:30","20:45","21","21:15","21:30","21:45","22","22:15","22:30","22:45","23","23:15","23:30","23:45","00","00:15","00:30","00:45","01","01:15","01:30","01:45","02","02:15","02:30","02:45","03","03:15","03:30","03:45","04","04:15","04:30","04:45","05","05:15","05:30","05:45","06","06:15","06:30","06:45","07","07:15","07:30","07:45","08","08:15","08:30","08:45")
##analysis for weekends
#how many percent people worked on average in any of the 15 min intervals on a saturday or sunday
Wochenende<-apply(Ausw[Ausw$wtag==c(1,7),1:times],MARGIN=2,FUN=sum)
Prozent<-Wochenende/length(Ausw$begin[Ausw$wtag==c(1,7)]) /as.numeric(aufl)*100
#add labels
names(Prozent)<-Labels
#plot as barplot and add axis labels
b=barplot(Prozent,axes = F,axisnames=F,main="Durchschnittliche Arbeitsbelastung am Wochenende",sub="über 100%: Übergabezeiten",xlab="Uhrzeit",ylab="Prozent")
axis(1,at=c(b[seq(1,length(Labels),4)],b[length(b)]+diff(b)[1]),labels = c(Labels[seq(1,length(Labels),4)],"09"))
axis(2,at=seq(0,160,25),las=2)
##analysos monday to friday
Woche<-apply(Ausw[Ausw$wtag==c(2,3,4,5,6),1:times],MARGIN=2,FUN=sum)
Prozent2<-Woche/length(Ausw$begin[Ausw$wtag==c(2,3,4,5,6)]) /as.numeric(aufl)*100
#add labels
names(Prozent2)<-Labels
#plot as barplot and add axis labels
b2=barplot(Prozent2,axes = F,axisnames=F,main="Durchschnittliche Arbeitsbelastung Montag - Freitag",,xlab="Uhrzeit",ylab="Prozent“,ylim=c(0,100))
axis(1,at=c(b2[seq(1,length(Labels),4)],b2[length(b2)]+diff(b2)[1]),labels = c(Labels[seq(1,length(Labels),4)],"09"))
axis(2,at=seq(0,160,25),las=2)
我在医院工作。我们的医生在夜间和傍晚时间随叫随到。有时可能没有病人来,所以他们可以休息。其他时候,会有很多病人同时出现。 他们写下开始和停止治疗患者的时间。使用 lubridate 包,我可以将这些数据转换为具有特定日期的时间间隔。这些间隔的长度会有很大差异,因为治疗可能或多或少复杂。此外,当发生很多事情时,医生可能会在患者之间来回走动。因此,典型的条目将如下所示:“2016-06-11 21:45:00 UTC” “2016-06-11 22:35:00 UTC”
要查看一天中哪些时间通常很忙,哪些时间比较慢,我想使用这些数据。这也应该适用于一周中的不同日子。 整个事情可能看起来像一个条形图,显示一天中任何时间的平均占用率(例如,晚上 8 点到 9 点之间占用 100%,凌晨 1 点到 2 点之间占用 40%)。 我的问题是我不知道该怎么做。 ggplot 不会处理间隔,我还没有找到任何包可以对间隔执行此平均值或百分比。
我希望我能够弄清楚我需要什么以及我的问题是什么。我不是一个经验丰富的程序员,但乐于学习。
非常感谢
瓦伦丁
编辑:
抱歉,我早该想到的。所以这是我所到之处:
>Daten<-read.csv2("Dienstdatum.csv")
>Beginn<-parse_date_time(Daten$Beginn,"dmy HM“,tz="CET“)
>Ende<-parse_date_time(Daten$Ende,"dmy HM",tz="CET“)
##Interval with date information
>Daten$Intervalle<-interval(Beginn,Ende)
##Intervals stripped of date
>Daten$Beg<-as.POSIXct(strftime(Beginn, format="%H:%M:%S"), format="%H:%M:%S")
> Daten$dur<-as.duration(Daten$Intervall)
> Daten$Interv<-as.interval(Daten$dur , Daten$Beg)
## add weekdays
>Daten$Wochentage<-weekdays(Beginn)
这样我就有了指向同一日期的时间间隔,并且我有工作日来对数据进行排序。这就是我被困住的地方,因为我不知道如何按时间间隔绘制某种直方图。我可以只使用开始日期,但这会严重扭曲,因为间隔可能在 5 分钟到 2 小时之间。
希望代码对您有所帮助。如果您需要一些示例数据,请告诉我。
编辑(2): 这些是原始数据 https://www.dropbox.com/s/tok32wzt9wjmjih/Dienstdatum.csv?dl=0
和 dput 的输出: https://www.dropbox.com/s/wgtw68rw9n0ksct/Output%20Dput.rtf?dl=0
恐怕数据结构不尽如人意,但它应该仍然有效。不确定 post 内联输出是否是个好主意,所以我提供了文件。
所以在四处挖掘并尝试不同的东西之后,这就是我想出的并且对我有用的东西。解决方案有些复杂,冗长的总线显然给出了准确的结果。在试图找到答案的过程中,我了解了向量化的 vonders(请原谅我的德国口音),因为生成向量化代码将计算结果所需的时间缩短到大约 3 分钟,而之前我在大约 96 小时后停止计算而没有完成。
请注意,记录日期列表(并非每个医生都会完成其轮班的记录)是一个 excel sheet 的简单日期。工作间隔记录时间列表是某人开始在一列中看望患者并在另一列中停止看望该患者的日期和时间。下一行将是类似的开始和停止时间和日期。
文本中的所有变量都是德语或德语单词的缩写,但我希望我的评论足以理解发生了什么。另外,很多代码都是针对我具体情况的问题。
特别感谢用户 PhiSeu 和 user3507085,他们在解决方案的各个方面帮助了我。
#read dates
package(lubridate)
Daten<-read.csv2(„file.csv")
#convert start dates to POSIX
Daten$Beginn<-parse_date_time(Daten$Beginn,"dmy HM",tz="CET")
#prevent overlap by adding one second
Daten$Beginn<-Daten$Beginn+1
#convert end dates to POSIX
Daten$Ende<-parse_date_time(Daten$Ende,"dmy HM",tz="CET")
#remove empty rows
Daten<-na.omit(Daten)
#create intervals in which people worked
Daten$Intervall<-interval(Daten$Beginn,Daten$Ende)
#read dates on which people worked
doku<-read.csv2(„dates.csv“,header=FALSE)
doku<-parse_date_time(doku$V1,"%d.%m.%Y",tz="cet")
#create a start time of 09 A.M. for shifts
doku<-data.frame(cbind(doku,doku+32400))
#add column names
names(doku)<-c("Datum","Beginn")
#convert to POSIX
doku$Datum<-as.POSIXct(doku$Datum,origin="1970-01-01",tz="cet")
doku$Beginn<-as.POSIXct(doku$Beginn,origin="1970-01-01",tz="cet")
#Loop to create 15 min intervals for each documented shift spanning 24 hour against which actual working hours will be checked
begin <- as.POSIXct(doku$Beginn)
# copy begin time for loop
begin_new <- begin
# create duration object
aufl <- duration(15, "mins")
# count times for loop
times <- 24*60/15
# create dataframe with begin time
Intervall <- data.frame(begin,stringsAsFactors = FALSE)
for (i in 1:times){
cat("test",i,"\n")
# save old time for interval calculation
begin_start <- begin_new
# add 15 Minutes to original time
begin_new <- begin_new + aufl
cat(begin_new,"\n")
# create an interval object between
new_dur <- interval(begin_start,begin_new)
# bind to original dataframe
Intervall <- cbind(Intervall,new_dur)
}
# Add column names
vec_names <- paste0("v",c(1:(times+1)))
colnames(Intervall) <- vec_names
#create a matrix of the number of seconds worked in each of the above 15 intervals by checking the amount of intersection between 15 intervals and documented intervals of work
test<-vector()
Tabelle<-matrix(nrow=length(doku$Beginn),ncol=times)
Tabelle[is.na(Tabelle)]<-0
for (j in 1:length(doku$Beginn)){
for (k in 1:times){
test<-as.duration(intersect(Daten$Intervall,Intervall[j,k+1]))
test[is.na(test)]<-0
test<-sum(test)
Tabelle[j,k]<-test}}
#cadd start time to the above matrix
Ausw<-data.frame(cbind(Tabelle,begin))
#convert to POSIX
Ausw$begin<-as.POSIXct(Ausw$begin,origin="1970-01-01",tz="cet")
##analysis of data
#common to all days of the week
#create labels for 15 min intervals
Labels<-c("09","09:15","09:30","09:45","10","10:15","10:30","10:45","11","11:15","11:30","11:45","12","12:15","12:30","12:45","13","13:15","13:30","13:45","14","14:15","14:30","14:45","15","15:15","15:30","15:45","16","16:15","16:30","16:45","17","17:15","17:30","17:45","18","18:15","18:30","18:45","19","19:15","19:30","19:45","20","20:15","20:30","20:45","21","21:15","21:30","21:45","22","22:15","22:30","22:45","23","23:15","23:30","23:45","00","00:15","00:30","00:45","01","01:15","01:30","01:45","02","02:15","02:30","02:45","03","03:15","03:30","03:45","04","04:15","04:30","04:45","05","05:15","05:30","05:45","06","06:15","06:30","06:45","07","07:15","07:30","07:45","08","08:15","08:30","08:45")
##analysis for weekends
#how many percent people worked on average in any of the 15 min intervals on a saturday or sunday
Wochenende<-apply(Ausw[Ausw$wtag==c(1,7),1:times],MARGIN=2,FUN=sum)
Prozent<-Wochenende/length(Ausw$begin[Ausw$wtag==c(1,7)]) /as.numeric(aufl)*100
#add labels
names(Prozent)<-Labels
#plot as barplot and add axis labels
b=barplot(Prozent,axes = F,axisnames=F,main="Durchschnittliche Arbeitsbelastung am Wochenende",sub="über 100%: Übergabezeiten",xlab="Uhrzeit",ylab="Prozent")
axis(1,at=c(b[seq(1,length(Labels),4)],b[length(b)]+diff(b)[1]),labels = c(Labels[seq(1,length(Labels),4)],"09"))
axis(2,at=seq(0,160,25),las=2)
##analysos monday to friday
Woche<-apply(Ausw[Ausw$wtag==c(2,3,4,5,6),1:times],MARGIN=2,FUN=sum)
Prozent2<-Woche/length(Ausw$begin[Ausw$wtag==c(2,3,4,5,6)]) /as.numeric(aufl)*100
#add labels
names(Prozent2)<-Labels
#plot as barplot and add axis labels
b2=barplot(Prozent2,axes = F,axisnames=F,main="Durchschnittliche Arbeitsbelastung Montag - Freitag",,xlab="Uhrzeit",ylab="Prozent“,ylim=c(0,100))
axis(1,at=c(b2[seq(1,length(Labels),4)],b2[length(b2)]+diff(b2)[1]),labels = c(Labels[seq(1,length(Labels),4)],"09"))
axis(2,at=seq(0,160,25),las=2)