使用 dcast() 重塑数据框

Reshaping the dataframe using dcast()

我正在尝试使用 dcast() 重塑我的数据框,但出现此错误

object 'newid' not found

我不清楚错误。这是原始数据框

 Grade    Week     Subject    Location    Marks
   6      January   English     IND        76.50
   6      January   English     US         52.50
   7      January   English     IND        24.00
   7      January   English     US         5.00
   8      February  English     IND        63.00
   8      February  English     US         40.25
   9      February  English     IND        63.00
   9      February  English     US         32.50
   10     March     English     IND        27.00
   10     March     English     US         4.50
   11     March     English     IND        10.00



tmp <- plyr::ddply(monthTotalDataFinal, .(Subject, Grade), 
          transform,newid = paste(Subject))
d2 <- dcast(tmp, formula = Subject+newid ~ Grade+Location+Week, 
              value.var  = 'Marks')

需要的数据框如下:

Subject 6_IND 7_IND 6_US 7_US 8_IND 9_IND 8_US 9_US 10_IND 11_IND 10_US

English  77    24    53   5    63    63    40   33   27     10     5

请给出合适的解决方案。

使用dplyrtidyr,我们可以uniteGradeLocation列并使用spread获取宽格式数据.

library(dplyr)
library(tidyr)

df %>%
  unite(key, Grade, Location) %>%
  select(-Week) %>%
  spread(key, Marks)

#  Subject 10_IND 10_US 11_IND 6_IND 6_US 7_IND 7_US 8_IND  8_US 9_IND 9_US
#1 English     27   4.5     10  76.5 52.5    24    5    63 40.25    63 32.5

根据评论,我们可能需要为多个 Subject

创建标识符列
df %>%
  unite(key, Grade, Location) %>%
  select(-Week) %>%
  group_by(key, Subject) %>%
  mutate(row = row_number()) %>%
  spread(key, Marks)

因为是dcast题,我们可以用

library(data.table)
dcast(setDT(df), Subject ~ Grade + Location, value.var = 'Marks')
#   Subject 6_IND 6_US 7_IND 7_US 8_IND  8_US 9_IND 9_US 10_IND 10_US 11_IND
#1: English  76.5 52.5    24    5    63 40.25    63 32.5     27   4.5     10

数据

df <- structure(list(Grade = c(6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 
10L, 11L), Week = c("January", "January", "January", "January", 
"February", "February", "February", "February", "March", "March", 
"March"), Subject = c("English", "English", "English", "English", 
"English", "English", "English", "English", "English", "English", 
"English"), Location = c("IND", "US", "IND", "US", "IND", "US", 
"IND", "US", "IND", "US", "IND"), Marks = c(76.5, 52.5, 24, 5, 
63, 40.25, 63, 32.5, 27, 4.5, 10)), class = "data.frame",
row.names = c(NA, 
-11L))