如何通过唯一 ID 将 R 中一列中的某些行加在一起?
How to add together certain rows within a column in R by unique IDs?
我是新手,如果我的问题措辞不当,我深表歉意。
我在 r 工作,我 table 称为 Rent,可能看起来像这样:
Rent
ID Invoice Payment Paid Date
lucy 7/1/2018 100 9/1/2018
lucy 7/1/2018 150 10/1/2018
lucy 8/1/2018 100 11/1/2018
所以我想做的是,由于 Lucy 在 2018 年 7 月 1 日有两笔付款,我想将它们合并在一起,然后对付款求和,并使用最新的付款日期。
到目前为止我所知道的是
#to create a row that has the sum of the sales prices
Rent[,sum_late:=sum( as.numeric(("Sales Price"))),
by= c("Id","Invoice Date")]
#take the first of the unique IDs by the max paid date
head (SD,1) by=c("ID", "Invoice Date", max("Paid Date")
但是当我 运行 第一行时,所有 sum_late 列都是 N/A。我不确定我做错了什么。理想情况下,我想要一个 table 就像这样。
Rent
ID Invoice Payment Paid Date
lucy 7/1/2018 250 10/1/2018
lucy 8/1/2018 100 11/1/2018
抱歉,如果这是一个愚蠢的问题,我感谢任何帮助和反馈!!谢谢大家的宝贵时间!
我们可以将 Paid_Date
更改为日期 class、group_by
ID
和 Invoice
、sum
Payment
和 select max
Paid_Date
.
library(dplyr)
Rent %>%
mutate_at(vars(Invoice, Paid_Date), as.Date, '%d/%m/%Y') %>%
group_by(ID, Invoice) %>%
summarise(Payment = sum(Payment),
Paid_Date = max(Paid_Date))
# ID Invoice Payment Paid_Date
# <chr> <date> <int> <date>
#1 lucy 2018-01-07 250 2018-01-10
#2 lucy 2018-01-08 100 2018-01-11
或者如果您更喜欢 data.table
使用相同的逻辑。
library(data.table)
setDT(Rent)[, c("Invoice", "Paid_Date") := .(as.IDate(Invoice, '%d/%m/%Y'),
as.IDate(Paid_Date, '%d/%m/%Y'))]
Rent[, .(Payment = sum(Payment), Paid_Date = max(Paid_Date)), .(ID, Invoice)]
数据
Rent <- structure(list(ID = c("lucy", "lucy", "lucy"), Invoice = c("7/1/2018",
"7/1/2018", "8/1/2018"), Payment = c(100L, 150L, 100L), Paid_Date = c("9/1/2018",
"10/1/2018", "11/1/2018")), class = "data.frame", row.names = c(NA, -3L))
有多种方法可以完成这项任务,我将使用 for 循环来创建所需的输出。我使用 dplyr 方法回应@Ronak Shah,这减少了处理时间,感谢使用 for-loops
数据
Rent <- structure(list(ID = c("lucy", "lucy", "lucy"), Invoice = c("7/1/2018",
"7/1/2018", "8/1/2018"), Payment = c(100L, 150L, 100L), Paid_Date = c("9/1/2018",
"10/1/2018", "11/1/2018")), class = "data.frame", row.names = c(NA, -3L))
正在将 Paid_date 转换为日期格式
Rent$Paid_Date <- as.Date(Rent$Paid_Date, "%d/%m/%Y")
For 循环
for ( i in unique (Rent$ID)){
for (j in unique(Rent$Invoice[Rent$ID == i])){
Rent$Payment_[Rent$ID==i & Rent$Invoice ==j ] <- sum (Rent$Payment [Rent$ID==i & Rent$Invoice ==j])
Rent$Paid_dt[Rent$ID==i & Rent$Invoice ==j ] <- max(Rent$Paid_Date[Rent$ID==i & Rent$Invoice ==j])
}
}
Rent$Paid_dt <- as.Date(Rent$Paid_dt ,origin = "1970-01-01") # converting into date format
Rent1 <- Rent[, unique(c("ID", "Invoice", "Payment_", "Paid_dt"))]
print (Rent1)
ID Invoice Payment_ Paid_dt
1 lucy 7/1/2018 250 2018-01-10
2 lucy 7/1/2018 250 2018-01-10
3 lucy 8/1/2018 100 2018-01-11
我是新手,如果我的问题措辞不当,我深表歉意。
我在 r 工作,我 table 称为 Rent,可能看起来像这样:
Rent
ID Invoice Payment Paid Date
lucy 7/1/2018 100 9/1/2018
lucy 7/1/2018 150 10/1/2018
lucy 8/1/2018 100 11/1/2018
所以我想做的是,由于 Lucy 在 2018 年 7 月 1 日有两笔付款,我想将它们合并在一起,然后对付款求和,并使用最新的付款日期。
到目前为止我所知道的是
#to create a row that has the sum of the sales prices
Rent[,sum_late:=sum( as.numeric(("Sales Price"))),
by= c("Id","Invoice Date")]
#take the first of the unique IDs by the max paid date
head (SD,1) by=c("ID", "Invoice Date", max("Paid Date")
但是当我 运行 第一行时,所有 sum_late 列都是 N/A。我不确定我做错了什么。理想情况下,我想要一个 table 就像这样。
Rent
ID Invoice Payment Paid Date
lucy 7/1/2018 250 10/1/2018
lucy 8/1/2018 100 11/1/2018
抱歉,如果这是一个愚蠢的问题,我感谢任何帮助和反馈!!谢谢大家的宝贵时间!
我们可以将 Paid_Date
更改为日期 class、group_by
ID
和 Invoice
、sum
Payment
和 select max
Paid_Date
.
library(dplyr)
Rent %>%
mutate_at(vars(Invoice, Paid_Date), as.Date, '%d/%m/%Y') %>%
group_by(ID, Invoice) %>%
summarise(Payment = sum(Payment),
Paid_Date = max(Paid_Date))
# ID Invoice Payment Paid_Date
# <chr> <date> <int> <date>
#1 lucy 2018-01-07 250 2018-01-10
#2 lucy 2018-01-08 100 2018-01-11
或者如果您更喜欢 data.table
使用相同的逻辑。
library(data.table)
setDT(Rent)[, c("Invoice", "Paid_Date") := .(as.IDate(Invoice, '%d/%m/%Y'),
as.IDate(Paid_Date, '%d/%m/%Y'))]
Rent[, .(Payment = sum(Payment), Paid_Date = max(Paid_Date)), .(ID, Invoice)]
数据
Rent <- structure(list(ID = c("lucy", "lucy", "lucy"), Invoice = c("7/1/2018",
"7/1/2018", "8/1/2018"), Payment = c(100L, 150L, 100L), Paid_Date = c("9/1/2018",
"10/1/2018", "11/1/2018")), class = "data.frame", row.names = c(NA, -3L))
有多种方法可以完成这项任务,我将使用 for 循环来创建所需的输出。我使用 dplyr 方法回应@Ronak Shah,这减少了处理时间,感谢使用 for-loops
数据
Rent <- structure(list(ID = c("lucy", "lucy", "lucy"), Invoice = c("7/1/2018",
"7/1/2018", "8/1/2018"), Payment = c(100L, 150L, 100L), Paid_Date = c("9/1/2018",
"10/1/2018", "11/1/2018")), class = "data.frame", row.names = c(NA, -3L))
正在将 Paid_date 转换为日期格式
Rent$Paid_Date <- as.Date(Rent$Paid_Date, "%d/%m/%Y")
For 循环
for ( i in unique (Rent$ID)){
for (j in unique(Rent$Invoice[Rent$ID == i])){
Rent$Payment_[Rent$ID==i & Rent$Invoice ==j ] <- sum (Rent$Payment [Rent$ID==i & Rent$Invoice ==j])
Rent$Paid_dt[Rent$ID==i & Rent$Invoice ==j ] <- max(Rent$Paid_Date[Rent$ID==i & Rent$Invoice ==j])
}
}
Rent$Paid_dt <- as.Date(Rent$Paid_dt ,origin = "1970-01-01") # converting into date format
Rent1 <- Rent[, unique(c("ID", "Invoice", "Payment_", "Paid_dt"))]
print (Rent1)
ID Invoice Payment_ Paid_dt
1 lucy 7/1/2018 250 2018-01-10
2 lucy 7/1/2018 250 2018-01-10
3 lucy 8/1/2018 100 2018-01-11