使用 dcast() 重塑数据框
Reshaping the dataframe using dcast()
我正在尝试使用 dcast()
重塑我的数据框,但出现此错误
object 'newid' not found
我不清楚错误。这是原始数据框
Grade Week Subject Location Marks
6 January English IND 76.50
6 January English US 52.50
7 January English IND 24.00
7 January English US 5.00
8 February English IND 63.00
8 February English US 40.25
9 February English IND 63.00
9 February English US 32.50
10 March English IND 27.00
10 March English US 4.50
11 March English IND 10.00
tmp <- plyr::ddply(monthTotalDataFinal, .(Subject, Grade),
transform,newid = paste(Subject))
d2 <- dcast(tmp, formula = Subject+newid ~ Grade+Location+Week,
value.var = 'Marks')
需要的数据框如下:
Subject 6_IND 7_IND 6_US 7_US 8_IND 9_IND 8_US 9_US 10_IND 11_IND 10_US
English 77 24 53 5 63 63 40 33 27 10 5
请给出合适的解决方案。
使用dplyr
和tidyr
,我们可以unite
Grade
、Location
列并使用spread
获取宽格式数据.
library(dplyr)
library(tidyr)
df %>%
unite(key, Grade, Location) %>%
select(-Week) %>%
spread(key, Marks)
# Subject 10_IND 10_US 11_IND 6_IND 6_US 7_IND 7_US 8_IND 8_US 9_IND 9_US
#1 English 27 4.5 10 76.5 52.5 24 5 63 40.25 63 32.5
根据评论,我们可能需要为多个 Subject
创建标识符列
df %>%
unite(key, Grade, Location) %>%
select(-Week) %>%
group_by(key, Subject) %>%
mutate(row = row_number()) %>%
spread(key, Marks)
因为是dcast
题,我们可以用
library(data.table)
dcast(setDT(df), Subject ~ Grade + Location, value.var = 'Marks')
# Subject 6_IND 6_US 7_IND 7_US 8_IND 8_US 9_IND 9_US 10_IND 10_US 11_IND
#1: English 76.5 52.5 24 5 63 40.25 63 32.5 27 4.5 10
数据
df <- structure(list(Grade = c(6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L,
10L, 11L), Week = c("January", "January", "January", "January",
"February", "February", "February", "February", "March", "March",
"March"), Subject = c("English", "English", "English", "English",
"English", "English", "English", "English", "English", "English",
"English"), Location = c("IND", "US", "IND", "US", "IND", "US",
"IND", "US", "IND", "US", "IND"), Marks = c(76.5, 52.5, 24, 5,
63, 40.25, 63, 32.5, 27, 4.5, 10)), class = "data.frame",
row.names = c(NA,
-11L))
我正在尝试使用 dcast()
重塑我的数据框,但出现此错误
object 'newid' not found
我不清楚错误。这是原始数据框
Grade Week Subject Location Marks
6 January English IND 76.50
6 January English US 52.50
7 January English IND 24.00
7 January English US 5.00
8 February English IND 63.00
8 February English US 40.25
9 February English IND 63.00
9 February English US 32.50
10 March English IND 27.00
10 March English US 4.50
11 March English IND 10.00
tmp <- plyr::ddply(monthTotalDataFinal, .(Subject, Grade),
transform,newid = paste(Subject))
d2 <- dcast(tmp, formula = Subject+newid ~ Grade+Location+Week,
value.var = 'Marks')
需要的数据框如下:
Subject 6_IND 7_IND 6_US 7_US 8_IND 9_IND 8_US 9_US 10_IND 11_IND 10_US
English 77 24 53 5 63 63 40 33 27 10 5
请给出合适的解决方案。
使用dplyr
和tidyr
,我们可以unite
Grade
、Location
列并使用spread
获取宽格式数据.
library(dplyr)
library(tidyr)
df %>%
unite(key, Grade, Location) %>%
select(-Week) %>%
spread(key, Marks)
# Subject 10_IND 10_US 11_IND 6_IND 6_US 7_IND 7_US 8_IND 8_US 9_IND 9_US
#1 English 27 4.5 10 76.5 52.5 24 5 63 40.25 63 32.5
根据评论,我们可能需要为多个 Subject
df %>%
unite(key, Grade, Location) %>%
select(-Week) %>%
group_by(key, Subject) %>%
mutate(row = row_number()) %>%
spread(key, Marks)
因为是dcast
题,我们可以用
library(data.table)
dcast(setDT(df), Subject ~ Grade + Location, value.var = 'Marks')
# Subject 6_IND 6_US 7_IND 7_US 8_IND 8_US 9_IND 9_US 10_IND 10_US 11_IND
#1: English 76.5 52.5 24 5 63 40.25 63 32.5 27 4.5 10
数据
df <- structure(list(Grade = c(6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L,
10L, 11L), Week = c("January", "January", "January", "January",
"February", "February", "February", "February", "March", "March",
"March"), Subject = c("English", "English", "English", "English",
"English", "English", "English", "English", "English", "English",
"English"), Location = c("IND", "US", "IND", "US", "IND", "US",
"IND", "US", "IND", "US", "IND"), Marks = c(76.5, 52.5, 24, 5,
63, 40.25, 63, 32.5, 27, 4.5, 10)), class = "data.frame",
row.names = c(NA,
-11L))