桶中的数据分类

Data classification in buckets

我有一个名为 Data 的数据框,它包含以下元素:

Model  Garage  City  Unit.Price Invoice.Date  Components    
Hyundai  A      NY     500        31/12/2016   HL   
Honda    B      NJ     700        31/12/2016   TL     
Porsche  A      NY     800        30/12/2016   TL    
BMW      B      NJ     800        30/12/2016   HL   
BMW      A      NJ     700        31/12/2016   HL   
Porsche  B      NY     800        30/12/2016   TL   
Honda    A      NY     400        30/12/2016   TL  
Honda    A      NY     500        30/12/2016   HL  
Honda    B      NY     600        30/12/2016   HL  
Honda    A      NY     200        29/12/2016   TL  
Honda    A      NY     300        29/12/2016   HL  

我希望按 Invoice.Date 排序的数据 broken into cars 的输出,以便首先捕获当前成本。

Ex:Honda

Components    GarageA   GarageB    
HL             500          600    
TL             400          700 

我是这样开始的:

Category <- as.data.frame(c("BMW","Honda","Porsche","Hyundai"))

for(i in 1:nrow(Category))
{
  m <- Category[i,1]
  X <- subset(Data,Model==m)
  X <- Data[order(Data$Invoice.Date,decreasing = T),]
  Pivot_A<-dcast(X,Name~Garage,value.var = "Unit.Price",function(x) length((x)))
  write.csv(Pivot,file = paste(X,"Cars.csv",sep = "_"))
 }

我遇到的唯一问题是映射正确的单价。 dcast 是否有任何代码或函数可以做到这一点? dcastsumcount 选项。如果我想要确切的数量而不是 sumaverage

怎么办

您可以通过以下方式做到这一点:

require(tidyverse) # dplyr would be enough...
dat %>% 
  mutate(Invoice.Date = as.Date(Invoice.Date, "%d/%m/%Y")) %>% 
  group_by(Model, Garage, Components) %>% 
  summarise(Unit.Price = first(Unit.Price, order_by = Invoice.Date)) %>% 
  spread(Garage, Unit.Price, sep = "")

这给你:

    Model Components GarageA GarageB
*   <chr>      <chr>   <int>   <int>
1     BMW         HL     700     800
2   Honda         HL     300     600
3   Honda         TL     200     700
4 Hyundai         HL     500      NA
5 Porsche         TL     800     800

现在我不确定如何解释你问题中的broken into cars。您可以将 (%>%) 以上内容通过管道传输到

  • split(.$Model) 得到一个列表,其中每个 list-element 代表一个 Model
  • nest(-Model) 获取嵌套小标题...

我们可以用 data.table 中的 dcast 来做到这一点。将 'data.frame' 转换为 'data.table' (setDT(df1)),order 由 'Invoice.Date' 和 dcast 的行从 'long' 转换为 'wide' 与 dcast 同时指定 fun.aggregate 到 select 只有第一个观察

library(data.table)
library(lubridate)
dcast(setDT(df1)[order(dmy(Invoice.Date))] , Model + Components ~ 
  paste0("Garage", Garage), value.var = "Unit.Price", function(x) x[1])
#     Model Components GarageA GarageB
#1:     BMW         HL     700     800
#2:   Honda         HL     300     600
#3:   Honda         TL     200     700
#4: Hyundai         HL     500      NA
#5: Porsche         TL     800     800

并考虑 R 的最佳包,base:

library(base)  # COMPLETELY REDUNDANT =)

df <- df[with(df, order(Invoice.Date)),]
dfagg <- aggregate(Unit.Price ~ Model + Components + Garage, df, function(i) tail(i)[1])
dfwide <- reshape(dfagg, timevar='Garage', idvar=c('Model', 'Components'), direction="wide")
names(dfwide) <- gsub("Unit.Price.", "Garage", names(dfwide))

#     Model Components GarageA GarageB
# 1     BMW         HL     700     800
# 2   Honda         HL     300     600
# 3 Hyundai         HL     500      NA
# 4   Honda         TL     200     700
# 5 Porsche         TL     800     800