R:在没有for循环的情况下聚合日期
R: Aggregating between dates without for loop
我希望在不使用 for 循环的情况下对两个日期之间活跃的租赁所赚取的所有租金求和。
这是租赁数据的示例
DataFrame1
StartDate EndDate MonthlyRental
2015-07-01 2015-09-30 500
2015-06-01 2015-10-31 600
2015-07-15 2016-01-31 400
2015-08-01 2015-12-31 800
我想计算每个月我能得到的租金数额,如果可能的话按比例计算(如果太难则不NB)。例如:
DataFrame2
Month RentalIncome
2015-07-31 500+600+(400*15/31)
2015-08-31 500+600+400+800
2015-09-30 500+600+400+800
2015-10-31 600+400+800
2015-11-30 600+400+800
etc.
有谁知道比简单地循环 Dataframe2 更好的方法吗?
谢谢,
麦克
我不确定这是否比 "simply looping through the dataframe" 更好 - 因为我实际上 do 循环它 - 但这是产生所需输出的方法。
(输出与2015年7月的问题有偏差,因为7月要支付17天的租金,而不是15天。)
将给定的时间间隔转换为天数,计算出每天的租金,然后按月汇总每天的租金:
library(zoo)
df1 <- data.frame(
StartDate = as.Date(c("2015-07-01", "2015-06-01", "2015-07-15", "2015-08-01")),
EndDate = as.Date(c("2015-09-30", "2015-10-31", "2016-01-31", "2015-12-31")),
MonthlyRental = c(500, 600, 400, 800)
)
df1LongList <- apply(df1, MARGIN = 1, FUN = function(row) {
return(data.frame(
date = seq(from = as.Date(row["StartDate"]), to = as.Date(row["EndDate"]), by = "day"),
MonthlyRental = as.numeric(row["MonthlyRental"])))
})
df1Long <- do.call("rbind", df1LongList)
df1Long$yearMon <- as.yearmon(df1Long$date)
df1Long$maxDays <- as.numeric(as.Date(df1Long$yearMon, frac = 1) - as.Date(df1Long$yearMon) + 1) # Thanks:
df1Long$rental <- df1Long$MonthlyRental / df1Long$maxDays
tapply(X = df1Long$rental, INDEX = df1Long$yearMon, FUN = sum)
# Jun 2015 Jul 2015 Aug 2015 Sep 2015 Okt 2015 Nov 2015 Dez 2015 Jan 2016
# 600.000 1319.355 2300.000 2300.000 1800.000 1200.000 1200.000 400.000
这是一个可能的 data.table
解决方案(在 Hmisc
包的帮助下)。如果没有半个月的租金,这可能是一个非常简单的问题,但由于这种限制,它变得很困难。
附带说明一下,根据您的示例,我只假设 StartDate
半个月
library(data.table)
require(Hmisc)
# Converting to valid date classes
Dates <- names(df)[1:2]
setDT(df)[, (Dates) := lapply(.SD, as.Date), .SDcols = Dates]
# Handling half months
df[mday(StartDate) != 1, `:=`(GRP = seq_len(.N),
mDays = mday(StartDate),
StartDate = StartDate - mday(StartDate) + 1L)]
## Converting to long format
res <- df[, .(Month = seq(StartDate, EndDate, by = "month")),
by = .(MonthlyRental, GRP, mDays)]
## Dividing not full months by the number of days (that could be modified as per other post)
res[match(na.omit(df$GRP), GRP), MonthlyRental := MonthlyRental*mDays/monthDays(Month)]
res[, .(RentalIncome = sum(MonthlyRental)), keyby = .(year(Month), month(Month))]
# year month RentalIncome
# 1: 2015 6 600
# 2: 2015 7 1293
# 3: 2015 8 2300
# 4: 2015 9 2300
# 5: 2015 10 1800
# 6: 2015 11 1200
# 7: 2015 12 1200
# 8: 2016 1 400
我使用外积,'pmin',和'pmax'来避免循环。部分涵盖的月份既困难又有趣:
library(lubridate)
df1 <- data.frame(
StartDate = as.Date(c("2015-07-01", "2015-06-01", "2015-07-15", "2015-08-01")),
EndDate = as.Date(c("2015-09-30", "2015-10-31", "2016-01-31", "2015-12-31")),
MonthlyRental = c(500, 600, 400, 800)
)
d <- c( as.Date("2015-07-31"),
as.Date("2015-08-31"),
as.Date("2015-09-30"),
as.Date("2015-10-31"),
as.Date("2015-11-30"),
as.Date("2015-12-31"),
as.Date("2016-01-31"),
as.Date("2016-02-29") )
RentPerDay <- outer( df1$"MonthlyRental", days_in_month(d), "/" )
countDays <- pmin( pmax( outer( d, df1$"StartDate", "-") + 1, 0 ), days_in_month(d) ) -
pmin( pmax( outer( d, df1$"EndDate" , "-"), 0 ), days_in_month(d) )
rentalIncome <- colSums( t(countDays) * RentPerDay )
矩阵 't(countDays)' 的列对应于 'DataFrame_2' 的行,即月份。这些行对应于 'DataFrame_1' 的行,即租金收入的来源。 (i,j) 处的条目是第 j 个月中第 i 个来源贡献租金收入的天数。矩阵 'RentPerDay' 具有相同的结构。 (i,j) 处的条目是第 j 个月中一天从第 i 个来源获得的金额。然后将这两个矩阵的元素乘积在第j列上求和就是第j个月的总租金收入。
> t(countDays)
Time differences in days
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 31 31 30 0 0 0 0 0
[2,] 31 31 30 31 0 0 0 0
[3,] 17 31 30 31 30 31 31 0
[4,] 0 31 30 31 30 31 0 0
> RentPerDay
Jul Aug Sep Oct Nov Dec Jan Feb
[1,] 16.12903 16.12903 16.66667 16.12903 16.66667 16.12903 16.12903 17.24138
[2,] 19.35484 19.35484 20.00000 19.35484 20.00000 19.35484 19.35484 20.68966
[3,] 12.90323 12.90323 13.33333 12.90323 13.33333 12.90323 12.90323 13.79310
[4,] 25.80645 25.80645 26.66667 25.80645 26.66667 25.80645 25.80645 27.58621
> rentalIncome
Jul Aug Sep Oct Nov Dec Jan Feb
1319.355 2300.000 2300.000 1800.000 1200.000 1200.000 400.000 0.000
>
我稍微修改了之前的回答。矩阵 "RentPerDay" 不是必需的。 "colSums(t(countDays)*RentPerDay)" 可以用矩阵向量积代替。此解决方案计算的租金收入与之前的解决方案相同。
library(lubridate)
ultimo_day <- function( start, end )
{
N <- 12*(year(end) - year(start)) + month(end) - month(start) + 1
d <- start
day(d) <- 1
month(d) <- month(d) + (1:N)
return( d - as.difftime(1,units="days"))
}
countDays <- function( data, d )
{
return( pmin( pmax( outer( d, data$"StartDate", "-") + 1, 0 ), day(d) ) -
pmin( pmax( outer( d, data$"EndDate" , "-"), 0 ), day(d) ) )
}
rentalIncome <- function( data,
d = ultimo_day( min(data$StartDate), max(data$EndDate) ) )
{
return ( data.frame( date = d,
income = ( countDays(data,d) / days_in_month(d) ) %*% data$"MonthlyRental" ) )
}
# -------- Example Data: --------
df1 <- data.frame(
StartDate = as.Date(c("2015-07-01", "2015-06-01", "2015-07-15", "2015-08-01", "2014-06-20")),
EndDate = as.Date(c("2015-09-30", "2015-10-31", "2016-01-31", "2015-12-31", "2015-07-31")),
MonthlyRental = c(500, 600, 400, 800, 300)
)
示例中我又添加了一个租约,有效期超过一年:
> df1
StartDate EndDate MonthlyRental
1 2015-07-01 2015-09-30 500
2 2015-06-01 2015-10-31 600
3 2015-07-15 2016-01-31 400
4 2015-08-01 2015-12-31 800
5 2014-06-20 2015-07-31 300
"ultimo_day(start,end)" 是 "start" 和 "end" 之间支付租金的天数向量:
> d <- ultimo_day( min(df1$StartDate), max(df1$EndDate))
> d
[1] "2014-06-30" "2014-07-31" "2014-08-31" "2014-09-30" "2014-10-31" "2014-11-30" "2014-12-31" "2015-01-31" "2015-02-28" "2015-03-31" "2015-04-30"
[12] "2015-05-31" "2015-06-30" "2015-07-31" "2015-08-31" "2015-09-30" "2015-10-31" "2015-11-30" "2015-12-31" "2016-01-31"
矩阵的行 "countDays" 对应于最后几天,因此对应于月份:
> countDays(df1,d)
Time differences in days
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0 0 11
[2,] 0 0 0 0 31
[3,] 0 0 0 0 31
[4,] 0 0 0 0 30
[5,] 0 0 0 0 31
[6,] 0 0 0 0 30
[7,] 0 0 0 0 31
[8,] 0 0 0 0 31
[9,] 0 0 0 0 28
[10,] 0 0 0 0 31
[11,] 0 0 0 0 30
[12,] 0 0 0 0 31
[13,] 0 30 0 0 30
[14,] 31 31 17 0 31
[15,] 31 31 31 31 0
[16,] 30 30 30 30 0
[17,] 0 31 31 31 0
[18,] 0 0 30 30 0
[19,] 0 0 31 31 0
[20,] 0 0 31 0 0
第 1 行属于 2014 年 6 月,第 2 行属于 2014 年 7 月,...,第 20 行属于 2016 年 1 月。
"countDays(df1,d) / days_in_month(d)" 又是一个矩阵。
该矩阵的 (i,j) 分量不是天数
第 j 个租约在第 i 个月处于活动状态,但是这个数字的分数是
第 i 个月的长度:
> countDays(df1,d) / days_in_month(d)
Time differences in days
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0.0000000 0 0.3666667
[2,] 0 0 0.0000000 0 1.0000000
[3,] 0 0 0.0000000 0 1.0000000
[4,] 0 0 0.0000000 0 1.0000000
[5,] 0 0 0.0000000 0 1.0000000
[6,] 0 0 0.0000000 0 1.0000000
[7,] 0 0 0.0000000 0 1.0000000
[8,] 0 0 0.0000000 0 1.0000000
[9,] 0 0 0.0000000 0 1.0000000
[10,] 0 0 0.0000000 0 1.0000000
[11,] 0 0 0.0000000 0 1.0000000
[12,] 0 0 0.0000000 0 1.0000000
[13,] 0 1 0.0000000 0 1.0000000
[14,] 1 1 0.5483871 0 1.0000000
[15,] 1 1 1.0000000 1 0.0000000
[16,] 1 1 1.0000000 1 0.0000000
[17,] 0 1 1.0000000 1 0.0000000
[18,] 0 0 1.0000000 1 0.0000000
[19,] 0 0 1.0000000 1 0.0000000
[20,] 0 0 1.0000000 0 0.0000000
这个矩阵乘以向量"df1$MonthlyRental",得到的向量作为"income"存储在租金收入的data.frame中:
> rentalIncome(df1)
date income
1 2014-06-30 110.000
2 2014-07-31 300.000
3 2014-08-31 300.000
4 2014-09-30 300.000
5 2014-10-31 300.000
6 2014-11-30 300.000
7 2014-12-31 300.000
8 2015-01-31 300.000
9 2015-02-28 300.000
10 2015-03-31 300.000
11 2015-04-30 300.000
12 2015-05-31 300.000
13 2015-06-30 900.000
14 2015-07-31 1619.355
15 2015-08-31 2300.000
16 2015-09-30 2300.000
17 2015-10-31 1800.000
18 2015-11-30 1200.000
19 2015-12-31 1200.000
20 2016-01-31 400.000
我希望在不使用 for 循环的情况下对两个日期之间活跃的租赁所赚取的所有租金求和。
这是租赁数据的示例
DataFrame1
StartDate EndDate MonthlyRental
2015-07-01 2015-09-30 500
2015-06-01 2015-10-31 600
2015-07-15 2016-01-31 400
2015-08-01 2015-12-31 800
我想计算每个月我能得到的租金数额,如果可能的话按比例计算(如果太难则不NB)。例如:
DataFrame2
Month RentalIncome
2015-07-31 500+600+(400*15/31)
2015-08-31 500+600+400+800
2015-09-30 500+600+400+800
2015-10-31 600+400+800
2015-11-30 600+400+800
etc.
有谁知道比简单地循环 Dataframe2 更好的方法吗?
谢谢,
麦克
我不确定这是否比 "simply looping through the dataframe" 更好 - 因为我实际上 do 循环它 - 但这是产生所需输出的方法。
(输出与2015年7月的问题有偏差,因为7月要支付17天的租金,而不是15天。)
将给定的时间间隔转换为天数,计算出每天的租金,然后按月汇总每天的租金:
library(zoo)
df1 <- data.frame(
StartDate = as.Date(c("2015-07-01", "2015-06-01", "2015-07-15", "2015-08-01")),
EndDate = as.Date(c("2015-09-30", "2015-10-31", "2016-01-31", "2015-12-31")),
MonthlyRental = c(500, 600, 400, 800)
)
df1LongList <- apply(df1, MARGIN = 1, FUN = function(row) {
return(data.frame(
date = seq(from = as.Date(row["StartDate"]), to = as.Date(row["EndDate"]), by = "day"),
MonthlyRental = as.numeric(row["MonthlyRental"])))
})
df1Long <- do.call("rbind", df1LongList)
df1Long$yearMon <- as.yearmon(df1Long$date)
df1Long$maxDays <- as.numeric(as.Date(df1Long$yearMon, frac = 1) - as.Date(df1Long$yearMon) + 1) # Thanks:
df1Long$rental <- df1Long$MonthlyRental / df1Long$maxDays
tapply(X = df1Long$rental, INDEX = df1Long$yearMon, FUN = sum)
# Jun 2015 Jul 2015 Aug 2015 Sep 2015 Okt 2015 Nov 2015 Dez 2015 Jan 2016
# 600.000 1319.355 2300.000 2300.000 1800.000 1200.000 1200.000 400.000
这是一个可能的 data.table
解决方案(在 Hmisc
包的帮助下)。如果没有半个月的租金,这可能是一个非常简单的问题,但由于这种限制,它变得很困难。
附带说明一下,根据您的示例,我只假设 StartDate
半个月
library(data.table)
require(Hmisc)
# Converting to valid date classes
Dates <- names(df)[1:2]
setDT(df)[, (Dates) := lapply(.SD, as.Date), .SDcols = Dates]
# Handling half months
df[mday(StartDate) != 1, `:=`(GRP = seq_len(.N),
mDays = mday(StartDate),
StartDate = StartDate - mday(StartDate) + 1L)]
## Converting to long format
res <- df[, .(Month = seq(StartDate, EndDate, by = "month")),
by = .(MonthlyRental, GRP, mDays)]
## Dividing not full months by the number of days (that could be modified as per other post)
res[match(na.omit(df$GRP), GRP), MonthlyRental := MonthlyRental*mDays/monthDays(Month)]
res[, .(RentalIncome = sum(MonthlyRental)), keyby = .(year(Month), month(Month))]
# year month RentalIncome
# 1: 2015 6 600
# 2: 2015 7 1293
# 3: 2015 8 2300
# 4: 2015 9 2300
# 5: 2015 10 1800
# 6: 2015 11 1200
# 7: 2015 12 1200
# 8: 2016 1 400
我使用外积,'pmin',和'pmax'来避免循环。部分涵盖的月份既困难又有趣:
library(lubridate)
df1 <- data.frame(
StartDate = as.Date(c("2015-07-01", "2015-06-01", "2015-07-15", "2015-08-01")),
EndDate = as.Date(c("2015-09-30", "2015-10-31", "2016-01-31", "2015-12-31")),
MonthlyRental = c(500, 600, 400, 800)
)
d <- c( as.Date("2015-07-31"),
as.Date("2015-08-31"),
as.Date("2015-09-30"),
as.Date("2015-10-31"),
as.Date("2015-11-30"),
as.Date("2015-12-31"),
as.Date("2016-01-31"),
as.Date("2016-02-29") )
RentPerDay <- outer( df1$"MonthlyRental", days_in_month(d), "/" )
countDays <- pmin( pmax( outer( d, df1$"StartDate", "-") + 1, 0 ), days_in_month(d) ) -
pmin( pmax( outer( d, df1$"EndDate" , "-"), 0 ), days_in_month(d) )
rentalIncome <- colSums( t(countDays) * RentPerDay )
矩阵 't(countDays)' 的列对应于 'DataFrame_2' 的行,即月份。这些行对应于 'DataFrame_1' 的行,即租金收入的来源。 (i,j) 处的条目是第 j 个月中第 i 个来源贡献租金收入的天数。矩阵 'RentPerDay' 具有相同的结构。 (i,j) 处的条目是第 j 个月中一天从第 i 个来源获得的金额。然后将这两个矩阵的元素乘积在第j列上求和就是第j个月的总租金收入。
> t(countDays)
Time differences in days
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 31 31 30 0 0 0 0 0
[2,] 31 31 30 31 0 0 0 0
[3,] 17 31 30 31 30 31 31 0
[4,] 0 31 30 31 30 31 0 0
> RentPerDay
Jul Aug Sep Oct Nov Dec Jan Feb
[1,] 16.12903 16.12903 16.66667 16.12903 16.66667 16.12903 16.12903 17.24138
[2,] 19.35484 19.35484 20.00000 19.35484 20.00000 19.35484 19.35484 20.68966
[3,] 12.90323 12.90323 13.33333 12.90323 13.33333 12.90323 12.90323 13.79310
[4,] 25.80645 25.80645 26.66667 25.80645 26.66667 25.80645 25.80645 27.58621
> rentalIncome
Jul Aug Sep Oct Nov Dec Jan Feb
1319.355 2300.000 2300.000 1800.000 1200.000 1200.000 400.000 0.000
>
我稍微修改了之前的回答。矩阵 "RentPerDay" 不是必需的。 "colSums(t(countDays)*RentPerDay)" 可以用矩阵向量积代替。此解决方案计算的租金收入与之前的解决方案相同。
library(lubridate)
ultimo_day <- function( start, end )
{
N <- 12*(year(end) - year(start)) + month(end) - month(start) + 1
d <- start
day(d) <- 1
month(d) <- month(d) + (1:N)
return( d - as.difftime(1,units="days"))
}
countDays <- function( data, d )
{
return( pmin( pmax( outer( d, data$"StartDate", "-") + 1, 0 ), day(d) ) -
pmin( pmax( outer( d, data$"EndDate" , "-"), 0 ), day(d) ) )
}
rentalIncome <- function( data,
d = ultimo_day( min(data$StartDate), max(data$EndDate) ) )
{
return ( data.frame( date = d,
income = ( countDays(data,d) / days_in_month(d) ) %*% data$"MonthlyRental" ) )
}
# -------- Example Data: --------
df1 <- data.frame(
StartDate = as.Date(c("2015-07-01", "2015-06-01", "2015-07-15", "2015-08-01", "2014-06-20")),
EndDate = as.Date(c("2015-09-30", "2015-10-31", "2016-01-31", "2015-12-31", "2015-07-31")),
MonthlyRental = c(500, 600, 400, 800, 300)
)
示例中我又添加了一个租约,有效期超过一年:
> df1
StartDate EndDate MonthlyRental
1 2015-07-01 2015-09-30 500
2 2015-06-01 2015-10-31 600
3 2015-07-15 2016-01-31 400
4 2015-08-01 2015-12-31 800
5 2014-06-20 2015-07-31 300
"ultimo_day(start,end)" 是 "start" 和 "end" 之间支付租金的天数向量:
> d <- ultimo_day( min(df1$StartDate), max(df1$EndDate))
> d
[1] "2014-06-30" "2014-07-31" "2014-08-31" "2014-09-30" "2014-10-31" "2014-11-30" "2014-12-31" "2015-01-31" "2015-02-28" "2015-03-31" "2015-04-30"
[12] "2015-05-31" "2015-06-30" "2015-07-31" "2015-08-31" "2015-09-30" "2015-10-31" "2015-11-30" "2015-12-31" "2016-01-31"
矩阵的行 "countDays" 对应于最后几天,因此对应于月份:
> countDays(df1,d)
Time differences in days
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0 0 11
[2,] 0 0 0 0 31
[3,] 0 0 0 0 31
[4,] 0 0 0 0 30
[5,] 0 0 0 0 31
[6,] 0 0 0 0 30
[7,] 0 0 0 0 31
[8,] 0 0 0 0 31
[9,] 0 0 0 0 28
[10,] 0 0 0 0 31
[11,] 0 0 0 0 30
[12,] 0 0 0 0 31
[13,] 0 30 0 0 30
[14,] 31 31 17 0 31
[15,] 31 31 31 31 0
[16,] 30 30 30 30 0
[17,] 0 31 31 31 0
[18,] 0 0 30 30 0
[19,] 0 0 31 31 0
[20,] 0 0 31 0 0
第 1 行属于 2014 年 6 月,第 2 行属于 2014 年 7 月,...,第 20 行属于 2016 年 1 月。
"countDays(df1,d) / days_in_month(d)" 又是一个矩阵。 该矩阵的 (i,j) 分量不是天数 第 j 个租约在第 i 个月处于活动状态,但是这个数字的分数是 第 i 个月的长度:
> countDays(df1,d) / days_in_month(d)
Time differences in days
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0.0000000 0 0.3666667
[2,] 0 0 0.0000000 0 1.0000000
[3,] 0 0 0.0000000 0 1.0000000
[4,] 0 0 0.0000000 0 1.0000000
[5,] 0 0 0.0000000 0 1.0000000
[6,] 0 0 0.0000000 0 1.0000000
[7,] 0 0 0.0000000 0 1.0000000
[8,] 0 0 0.0000000 0 1.0000000
[9,] 0 0 0.0000000 0 1.0000000
[10,] 0 0 0.0000000 0 1.0000000
[11,] 0 0 0.0000000 0 1.0000000
[12,] 0 0 0.0000000 0 1.0000000
[13,] 0 1 0.0000000 0 1.0000000
[14,] 1 1 0.5483871 0 1.0000000
[15,] 1 1 1.0000000 1 0.0000000
[16,] 1 1 1.0000000 1 0.0000000
[17,] 0 1 1.0000000 1 0.0000000
[18,] 0 0 1.0000000 1 0.0000000
[19,] 0 0 1.0000000 1 0.0000000
[20,] 0 0 1.0000000 0 0.0000000
这个矩阵乘以向量"df1$MonthlyRental",得到的向量作为"income"存储在租金收入的data.frame中:
> rentalIncome(df1)
date income
1 2014-06-30 110.000
2 2014-07-31 300.000
3 2014-08-31 300.000
4 2014-09-30 300.000
5 2014-10-31 300.000
6 2014-11-30 300.000
7 2014-12-31 300.000
8 2015-01-31 300.000
9 2015-02-28 300.000
10 2015-03-31 300.000
11 2015-04-30 300.000
12 2015-05-31 300.000
13 2015-06-30 900.000
14 2015-07-31 1619.355
15 2015-08-31 2300.000
16 2015-09-30 2300.000
17 2015-10-31 1800.000
18 2015-11-30 1200.000
19 2015-12-31 1200.000
20 2016-01-31 400.000