将部分总和列添加到数据框
add partial cum sum column to data frame
我有一个按产品 ID prodID
和日期 Date
排序的 df。我需要添加一个列来显示每个 prodID
在 df 中出现多少次的累积索引。例如:如果 proID
只出现一次,则该行的索引将为 1。如果另一个 prodID
出现在 3 行中(在 df 中是连续的,因为 df 是排序的),那么对于那个 prodID
,第一行的索引应该是 1,然后是 2,然后是 3在以下几行中。
基本上我需要我的初始 df:
initial.df <- structure(list(prodID = c("009hpOpzwl", "00An0zNeEQ", "00An0zNeEQ", "00An0zNeEQ", "00An0zNeEQ", "00DtU3Bk6O", "00DtU3Bk6O", "00FyjrH1kk", "00FyjrH1kk", "00FyjrH1kk", "00FyjrH1kk", "00FyjrH1kk"), Date = c("2012-06", "2014-09", "2014-09", "2014-09", "2014-09", "2001-11", "2001-11", "2002-11", "2002-12", "2003-01", "2003-02", "2003-03"), status = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 5L, 5L, 5L, 5L, 5L), .Label = c("rare", "occasional", "amateur", "connoisseur", "expert", "fool"), class = "factor"), rating = c(2.5, 4.7, 4.7, 4.7, 4.7, 4.4, 4.4, 3.5, 3.83, 3.36, 3.53, 3.78), over = c(68, 49, 49, 49, 49, 22, 22, 29, 38.33, 43.3, 39.53, 30.58)), class = "data.frame", row.names = c(NA, -12L), .Names = c("prodID", "Date", "status", "rating", "over"))
变成
new.df <- structure(list(prodID = c("009hpOpzwl", "00An0zNeEQ", "00An0zNeEQ", "00An0zNeEQ", "00An0zNeEQ", "00DtU3Bk6O", "00DtU3Bk6O", "00FyjrH1kk", "00FyjrH1kk", "00FyjrH1kk", "00FyjrH1kk", "00FyjrH1kk"), Date = c("2012-06", "2014-09", "2014-09", "2014-09", "2014-09", "2001-11", "2001-11", "2002-11", "2002-12", "2003-01", "2003-02", "2003-03"), status = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 5L, 5L, 5L, 5L, 5L), .Label = c("rare", "occasional", "amateur", "connoisseur", "expert", "fool"), class = "factor"), rating = c(2.5, 4.7, 4.7, 4.7, 4.7, 4.4, 4.4, 3.5, 3.83, 3.36, 3.53, 3.78), over = c(68, 49, 49, 49, 49, 22, 22, 29, 38.33, 43.3, 39.53, 30.58), index = c(1, 1, 2, 3, 4, 1, 2, 1, 2, 3, 4, 5)), .Names = c("prodID", "Date", "status", "rating", "over", "index"), row.names = c(NA, -12L), class = "data.frame")
提前感谢您的任何建议
怎么样
do.call(rbind, lapply(split(initial.df, initial.df$prodID), function(x) cbind(x, 1:nrow(x))))
你可以使用ave
函数来实现这个,如果保证数据按照你声称的那样排序:
initial.df$index <- ave(initial.df$prodID, initial.df$prodID, FUN=function(x) seq(along=x))
只是为了完整起见,使用 data.table
时这是非常简单的操作,并且既高效又简短的语法并通过引用创建列,简单地说:
library(data.table)
setDT(initial.df)[, index := seq_len(.N), prodID]
如果这个问题没有作为其他问题的副本被关闭并且我们已经有了 data.table 答案,这里是 dplyr 版本:
library(dplyr)
df %>% group_by(prodID) %>% mutate(index = row_number())
我有一个按产品 ID prodID
和日期 Date
排序的 df。我需要添加一个列来显示每个 prodID
在 df 中出现多少次的累积索引。例如:如果 proID
只出现一次,则该行的索引将为 1。如果另一个 prodID
出现在 3 行中(在 df 中是连续的,因为 df 是排序的),那么对于那个 prodID
,第一行的索引应该是 1,然后是 2,然后是 3在以下几行中。
基本上我需要我的初始 df:
initial.df <- structure(list(prodID = c("009hpOpzwl", "00An0zNeEQ", "00An0zNeEQ", "00An0zNeEQ", "00An0zNeEQ", "00DtU3Bk6O", "00DtU3Bk6O", "00FyjrH1kk", "00FyjrH1kk", "00FyjrH1kk", "00FyjrH1kk", "00FyjrH1kk"), Date = c("2012-06", "2014-09", "2014-09", "2014-09", "2014-09", "2001-11", "2001-11", "2002-11", "2002-12", "2003-01", "2003-02", "2003-03"), status = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 5L, 5L, 5L, 5L, 5L), .Label = c("rare", "occasional", "amateur", "connoisseur", "expert", "fool"), class = "factor"), rating = c(2.5, 4.7, 4.7, 4.7, 4.7, 4.4, 4.4, 3.5, 3.83, 3.36, 3.53, 3.78), over = c(68, 49, 49, 49, 49, 22, 22, 29, 38.33, 43.3, 39.53, 30.58)), class = "data.frame", row.names = c(NA, -12L), .Names = c("prodID", "Date", "status", "rating", "over"))
变成
new.df <- structure(list(prodID = c("009hpOpzwl", "00An0zNeEQ", "00An0zNeEQ", "00An0zNeEQ", "00An0zNeEQ", "00DtU3Bk6O", "00DtU3Bk6O", "00FyjrH1kk", "00FyjrH1kk", "00FyjrH1kk", "00FyjrH1kk", "00FyjrH1kk"), Date = c("2012-06", "2014-09", "2014-09", "2014-09", "2014-09", "2001-11", "2001-11", "2002-11", "2002-12", "2003-01", "2003-02", "2003-03"), status = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 5L, 5L, 5L, 5L, 5L), .Label = c("rare", "occasional", "amateur", "connoisseur", "expert", "fool"), class = "factor"), rating = c(2.5, 4.7, 4.7, 4.7, 4.7, 4.4, 4.4, 3.5, 3.83, 3.36, 3.53, 3.78), over = c(68, 49, 49, 49, 49, 22, 22, 29, 38.33, 43.3, 39.53, 30.58), index = c(1, 1, 2, 3, 4, 1, 2, 1, 2, 3, 4, 5)), .Names = c("prodID", "Date", "status", "rating", "over", "index"), row.names = c(NA, -12L), class = "data.frame")
提前感谢您的任何建议
怎么样
do.call(rbind, lapply(split(initial.df, initial.df$prodID), function(x) cbind(x, 1:nrow(x))))
你可以使用ave
函数来实现这个,如果保证数据按照你声称的那样排序:
initial.df$index <- ave(initial.df$prodID, initial.df$prodID, FUN=function(x) seq(along=x))
只是为了完整起见,使用 data.table
时这是非常简单的操作,并且既高效又简短的语法并通过引用创建列,简单地说:
library(data.table)
setDT(initial.df)[, index := seq_len(.N), prodID]
如果这个问题没有作为其他问题的副本被关闭并且我们已经有了 data.table 答案,这里是 dplyr 版本:
library(dplyr)
df %>% group_by(prodID) %>% mutate(index = row_number())