桶中的数据分类
Data classification in buckets
我有一个名为 Data 的数据框,它包含以下元素:
Model Garage City Unit.Price Invoice.Date Components
Hyundai A NY 500 31/12/2016 HL
Honda B NJ 700 31/12/2016 TL
Porsche A NY 800 30/12/2016 TL
BMW B NJ 800 30/12/2016 HL
BMW A NJ 700 31/12/2016 HL
Porsche B NY 800 30/12/2016 TL
Honda A NY 400 30/12/2016 TL
Honda A NY 500 30/12/2016 HL
Honda B NY 600 30/12/2016 HL
Honda A NY 200 29/12/2016 TL
Honda A NY 300 29/12/2016 HL
我希望按 Invoice.Date
排序的数据 broken into cars 的输出,以便首先捕获当前成本。
Ex:Honda
Components GarageA GarageB
HL 500 600
TL 400 700
我是这样开始的:
Category <- as.data.frame(c("BMW","Honda","Porsche","Hyundai"))
for(i in 1:nrow(Category))
{
m <- Category[i,1]
X <- subset(Data,Model==m)
X <- Data[order(Data$Invoice.Date,decreasing = T),]
Pivot_A<-dcast(X,Name~Garage,value.var = "Unit.Price",function(x) length((x)))
write.csv(Pivot,file = paste(X,"Cars.csv",sep = "_"))
}
我遇到的唯一问题是映射正确的单价。 dcast
是否有任何代码或函数可以做到这一点? dcast
有 sum
、count
选项。如果我想要确切的数量而不是 sum
、average
、
怎么办
您可以通过以下方式做到这一点:
require(tidyverse) # dplyr would be enough...
dat %>%
mutate(Invoice.Date = as.Date(Invoice.Date, "%d/%m/%Y")) %>%
group_by(Model, Garage, Components) %>%
summarise(Unit.Price = first(Unit.Price, order_by = Invoice.Date)) %>%
spread(Garage, Unit.Price, sep = "")
这给你:
Model Components GarageA GarageB
* <chr> <chr> <int> <int>
1 BMW HL 700 800
2 Honda HL 300 600
3 Honda TL 200 700
4 Hyundai HL 500 NA
5 Porsche TL 800 800
现在我不确定如何解释你问题中的broken into cars。您可以将 (%>%
) 以上内容通过管道传输到
split(.$Model)
得到一个列表,其中每个 list-element 代表一个 Model。
nest(-Model)
获取嵌套小标题...
我们可以用 data.table
中的 dcast
来做到这一点。将 'data.frame' 转换为 'data.table' (setDT(df1)
),order
由 'Invoice.Date' 和 dcast
的行从 'long' 转换为 'wide' 与 dcast
同时指定 fun.aggregate
到 select 只有第一个观察
library(data.table)
library(lubridate)
dcast(setDT(df1)[order(dmy(Invoice.Date))] , Model + Components ~
paste0("Garage", Garage), value.var = "Unit.Price", function(x) x[1])
# Model Components GarageA GarageB
#1: BMW HL 700 800
#2: Honda HL 300 600
#3: Honda TL 200 700
#4: Hyundai HL 500 NA
#5: Porsche TL 800 800
并考虑 R 的最佳包,base
:
library(base) # COMPLETELY REDUNDANT =)
df <- df[with(df, order(Invoice.Date)),]
dfagg <- aggregate(Unit.Price ~ Model + Components + Garage, df, function(i) tail(i)[1])
dfwide <- reshape(dfagg, timevar='Garage', idvar=c('Model', 'Components'), direction="wide")
names(dfwide) <- gsub("Unit.Price.", "Garage", names(dfwide))
# Model Components GarageA GarageB
# 1 BMW HL 700 800
# 2 Honda HL 300 600
# 3 Hyundai HL 500 NA
# 4 Honda TL 200 700
# 5 Porsche TL 800 800
我有一个名为 Data 的数据框,它包含以下元素:
Model Garage City Unit.Price Invoice.Date Components
Hyundai A NY 500 31/12/2016 HL
Honda B NJ 700 31/12/2016 TL
Porsche A NY 800 30/12/2016 TL
BMW B NJ 800 30/12/2016 HL
BMW A NJ 700 31/12/2016 HL
Porsche B NY 800 30/12/2016 TL
Honda A NY 400 30/12/2016 TL
Honda A NY 500 30/12/2016 HL
Honda B NY 600 30/12/2016 HL
Honda A NY 200 29/12/2016 TL
Honda A NY 300 29/12/2016 HL
我希望按 Invoice.Date
排序的数据 broken into cars 的输出,以便首先捕获当前成本。
Ex:Honda
Components GarageA GarageB
HL 500 600
TL 400 700
我是这样开始的:
Category <- as.data.frame(c("BMW","Honda","Porsche","Hyundai"))
for(i in 1:nrow(Category))
{
m <- Category[i,1]
X <- subset(Data,Model==m)
X <- Data[order(Data$Invoice.Date,decreasing = T),]
Pivot_A<-dcast(X,Name~Garage,value.var = "Unit.Price",function(x) length((x)))
write.csv(Pivot,file = paste(X,"Cars.csv",sep = "_"))
}
我遇到的唯一问题是映射正确的单价。 dcast
是否有任何代码或函数可以做到这一点? dcast
有 sum
、count
选项。如果我想要确切的数量而不是 sum
、average
、
您可以通过以下方式做到这一点:
require(tidyverse) # dplyr would be enough...
dat %>%
mutate(Invoice.Date = as.Date(Invoice.Date, "%d/%m/%Y")) %>%
group_by(Model, Garage, Components) %>%
summarise(Unit.Price = first(Unit.Price, order_by = Invoice.Date)) %>%
spread(Garage, Unit.Price, sep = "")
这给你:
Model Components GarageA GarageB
* <chr> <chr> <int> <int>
1 BMW HL 700 800
2 Honda HL 300 600
3 Honda TL 200 700
4 Hyundai HL 500 NA
5 Porsche TL 800 800
现在我不确定如何解释你问题中的broken into cars。您可以将 (%>%
) 以上内容通过管道传输到
split(.$Model)
得到一个列表,其中每个 list-element 代表一个 Model。nest(-Model)
获取嵌套小标题...
我们可以用 data.table
中的 dcast
来做到这一点。将 'data.frame' 转换为 'data.table' (setDT(df1)
),order
由 'Invoice.Date' 和 dcast
的行从 'long' 转换为 'wide' 与 dcast
同时指定 fun.aggregate
到 select 只有第一个观察
library(data.table)
library(lubridate)
dcast(setDT(df1)[order(dmy(Invoice.Date))] , Model + Components ~
paste0("Garage", Garage), value.var = "Unit.Price", function(x) x[1])
# Model Components GarageA GarageB
#1: BMW HL 700 800
#2: Honda HL 300 600
#3: Honda TL 200 700
#4: Hyundai HL 500 NA
#5: Porsche TL 800 800
并考虑 R 的最佳包,base
:
library(base) # COMPLETELY REDUNDANT =)
df <- df[with(df, order(Invoice.Date)),]
dfagg <- aggregate(Unit.Price ~ Model + Components + Garage, df, function(i) tail(i)[1])
dfwide <- reshape(dfagg, timevar='Garage', idvar=c('Model', 'Components'), direction="wide")
names(dfwide) <- gsub("Unit.Price.", "Garage", names(dfwide))
# Model Components GarageA GarageB
# 1 BMW HL 700 800
# 2 Honda HL 300 600
# 3 Hyundai HL 500 NA
# 4 Honda TL 200 700
# 5 Porsche TL 800 800