有条件地创建数据行
Create rows of data conditionally
示例数据集:
Price=c(6651, 7255, 25465, 35645, 2556, 3665)
NumberPurchased=c(25, 30, 156, 250, 12, 16)
Type=c("A", "A", "C", "C", "B", "B")
Source=c("GSC", "MYL", "TTC", "ZAF", "CAN", "HLT")
df1 <- data.frame(Price, NumberPurchased, Type, Source)
我希望能够使用两个附加变量(ID
、PurchaseDate
)创建一个新的数据框,但基于变量 Type
的数据行更多。
我要应用的规则:
如果 Type=A,则 PurchaseDate
为“2013”、“2014”。
如果 Type=B,则 PurchaseDate
为“2013”。
如果 Type=C,PurchaseDate
为“2013”、“2014”、“2015”。
如果Type
是A,将Price
和NumberPurchased
除以2,得到2行不同的PurchaseDate
,如上所述。
如果 Type
是 B,则 PurhcaseDate
保留为 2013。
如果 Type
是 C,将 Price
和 NumberPurchased
除以 3,并有 3 行不同的 PurchaseDate
如上所述。
因此,我想要这样的东西作为新数据集:
Price=c(3325.5, 3325.5, 3627.5, 3627.5, 8488.3, 8488.3, 8488.3, 11881.6, 11881.6, 11881.6, 2556, 3665)
NumberPurchased=c(12.5, 12.5, 15, 15, 52, 52, 52, 83.3, 83.3, 83.3, 12, 16)
Type=c("A", "A", "A", "A", "C", "C", "C", "C", "C", "C","B", "B")
Source=c("GSC", "GSC", "MYL", "MYL", "TTC","TTC", "TTC", "ZAF", "ZAF","ZAF", "CAN", "HLT")
PurchaseDate=c("2013", "2014", "2013", "2014", "2013", "2014", "2015", "2013", "2014", "2015", "2013", "2013")
ID=c(1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 6)
df2 <- data.frame(Price, NumberPurchased, Type, Source, PurchaseDate, ID)
有什么见解吗?
这是一种可能的方法。首先,我们将为 Type
创建一个索引,然后我们将相应地增长数据,然后我们将使用 data.table
包来计算新变量。
library(data.table)
setDT(df1)[, indx := as.numeric(factor(Type, levels = c("B", "A", "C")))]
# setDT(df1)[, indx := ifelse(Type == "C", 3, 2)] # Alternative index per your comment
df2 <- df1[rep(seq_len(.N), indx)]
df2[, `:=`(
Price = Price/.N,
PurchaseDate = 2013:(2013 + (.N - 1)),
NumberPurchased = NumberPurchased/.N,
ID = .GRP
),
by = .(Source, Type)][]
# Price NumberPurchased Type Source indx PurchaseDate ID
# 1: 3325.500 12.50000 A GSC 2 2013 1
# 2: 3325.500 12.50000 A GSC 2 2014 1
# 3: 3627.500 15.00000 A MYL 2 2013 2
# 4: 3627.500 15.00000 A MYL 2 2014 2
# 5: 8488.333 52.00000 C TTC 3 2013 3
# 6: 8488.333 52.00000 C TTC 3 2014 3
# 7: 8488.333 52.00000 C TTC 3 2015 3
# 8: 11881.667 83.33333 C ZAF 3 2013 4
# 9: 11881.667 83.33333 C ZAF 3 2014 4
# 10: 11881.667 83.33333 C ZAF 3 2015 4
# 11: 2556.000 12.00000 B CAN 1 2013 5
# 12: 3665.000 16.00000 B HLT 1 2013 6
示例数据集:
Price=c(6651, 7255, 25465, 35645, 2556, 3665)
NumberPurchased=c(25, 30, 156, 250, 12, 16)
Type=c("A", "A", "C", "C", "B", "B")
Source=c("GSC", "MYL", "TTC", "ZAF", "CAN", "HLT")
df1 <- data.frame(Price, NumberPurchased, Type, Source)
我希望能够使用两个附加变量(ID
、PurchaseDate
)创建一个新的数据框,但基于变量 Type
的数据行更多。
我要应用的规则:
如果 Type=A,则 PurchaseDate
为“2013”、“2014”。
如果 Type=B,则 PurchaseDate
为“2013”。
如果 Type=C,PurchaseDate
为“2013”、“2014”、“2015”。
如果Type
是A,将Price
和NumberPurchased
除以2,得到2行不同的PurchaseDate
,如上所述。
如果 Type
是 B,则 PurhcaseDate
保留为 2013。
如果 Type
是 C,将 Price
和 NumberPurchased
除以 3,并有 3 行不同的 PurchaseDate
如上所述。
因此,我想要这样的东西作为新数据集:
Price=c(3325.5, 3325.5, 3627.5, 3627.5, 8488.3, 8488.3, 8488.3, 11881.6, 11881.6, 11881.6, 2556, 3665)
NumberPurchased=c(12.5, 12.5, 15, 15, 52, 52, 52, 83.3, 83.3, 83.3, 12, 16)
Type=c("A", "A", "A", "A", "C", "C", "C", "C", "C", "C","B", "B")
Source=c("GSC", "GSC", "MYL", "MYL", "TTC","TTC", "TTC", "ZAF", "ZAF","ZAF", "CAN", "HLT")
PurchaseDate=c("2013", "2014", "2013", "2014", "2013", "2014", "2015", "2013", "2014", "2015", "2013", "2013")
ID=c(1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 6)
df2 <- data.frame(Price, NumberPurchased, Type, Source, PurchaseDate, ID)
有什么见解吗?
这是一种可能的方法。首先,我们将为 Type
创建一个索引,然后我们将相应地增长数据,然后我们将使用 data.table
包来计算新变量。
library(data.table)
setDT(df1)[, indx := as.numeric(factor(Type, levels = c("B", "A", "C")))]
# setDT(df1)[, indx := ifelse(Type == "C", 3, 2)] # Alternative index per your comment
df2 <- df1[rep(seq_len(.N), indx)]
df2[, `:=`(
Price = Price/.N,
PurchaseDate = 2013:(2013 + (.N - 1)),
NumberPurchased = NumberPurchased/.N,
ID = .GRP
),
by = .(Source, Type)][]
# Price NumberPurchased Type Source indx PurchaseDate ID
# 1: 3325.500 12.50000 A GSC 2 2013 1
# 2: 3325.500 12.50000 A GSC 2 2014 1
# 3: 3627.500 15.00000 A MYL 2 2013 2
# 4: 3627.500 15.00000 A MYL 2 2014 2
# 5: 8488.333 52.00000 C TTC 3 2013 3
# 6: 8488.333 52.00000 C TTC 3 2014 3
# 7: 8488.333 52.00000 C TTC 3 2015 3
# 8: 11881.667 83.33333 C ZAF 3 2013 4
# 9: 11881.667 83.33333 C ZAF 3 2014 4
# 10: 11881.667 83.33333 C ZAF 3 2015 4
# 11: 2556.000 12.00000 B CAN 1 2013 5
# 12: 3665.000 16.00000 B HLT 1 2013 6