将数据框从文本文件重塑为矩阵 value.var 错误

Reshape a data frame to matrix from text file value.var errors

我遇到的问题与这里的问题非常相似:Reshape three column data frame to matrix ("long" to "wide" format)

除非我从文本文件中获取数据,而且我正在尝试使用 reshape2 库和 dcast 方法

这是我的文本文件:

'Group','LiteracyLevel','Frequency'
'Shifting','Illerate',114
'Shifting','Primary',10
'Shifting','AtLeastMiddle',45
'Settled','Illerate',76
'Settled','Primary',2
'Settled','AtLeastMiddle',53
'Town','Illerate',93
'Town','Primary',13
'Town','AtLeastMiddle',208

应该改成这种格式,因为我想在上面使用barplot(as.matrix(data))

'Group','Illerate','Primary','AtLeastMiddle'
'Shifting',114,10,45
'Settled',76,2,53
'Town',93,13,208

我不知道为 dcast 的 value.var 部分输入什么。我假设它的频率。我目前重塑数据的尝试如下所示:

> data <- read.csv("ex3-39.txt", header=TRUE)

> dcast(data, data$Group~data$LiteracyLevel, value.var="X.Frequency")
Error: value.var (X.Frequency) not found in input

> dcast(data, data$Group~data$LiteracyLevel, value.var="Frequency")
Error: value.var (Frequency) not found in input

> dcast(data, data$Group~data$LiteracyLevel, value.var="data$X.Frequency")
Error: value.var (data$X.Frequency) not found in input

> dcast(data, data$Group~data$LiteracyLevel, value.var=data$X.Frequency)
Error: value.var (1141045762539313208) not found in input
In addition: Warning message:
In if (!(value.var %in% names(data))) { :
  the condition has length > 1 and only the first element will be used

> dcast(data, data$Group~data$LiteracyLevel, value.var=Frequency)
Error in match(x, table, nomatch = 0L) : object 'Frequency' not found
# Just to make sure we're dealing with the same data...
df <- read.csv(quote="'",text="'Group','LiteracyLevel','Frequency'
'Shifting','Illerate',114
'Shifting','Primary',10
'Shifting','AtLeastMiddle',45
'Settled','Illerate',76
'Settled','Primary',2
'Settled','AtLeastMiddle',53
'Town','Illerate',93
'Town','Primary',13
'Town','AtLeastMiddle',208")
df
#                     Group LiteracyLevel Frequency
# 1                Shifting      Illerate       114
# 2                Shifting       Primary        10
# 3                Shifting AtLeastMiddle        45
# 4                 Settled      Illerate        76
# 5                 Settled       Primary         2
# 6                 Settled AtLeastMiddle        53
# 7                    Town      Illerate        93
# 8                    Town       Primary        13
# 9                    Town AtLeastMiddle       208

library(reshape2)
dcast(df, Group~LiteracyLevel)
#                     Group AtLeastMiddle Illerate Primary
# 1                 Settled            53       76       2
# 2                Shifting            45       NA      NA
# 3                    Town           208       93      13
# 4                Shifting            NA      114      10

问题是您需要在公式中指定列名(参考data),而不是列。当您像您一样指定列时,例如df$Group 生成的矢量是 未命名

names(df)
# [1] "Group"         "LiteracyLevel" "Frequency"    
names(df$Group)
# NULL

这有帮助吗

library(reshape2)
data<-read.csv("filename.csv",quote = "'")
dcast(data, data$Group~data$LiteracyLevel, value.var="Frequency")

这给出了输出

  data$Group AtLeastMiddle Illerate Primary
1    Settled            53       76       2
2   Shifting            45      114      10
3       Town           208       93      13

我认为您错过了 quote="'" 参数并且您的列名称的格式为

"X.Group." "X.LiteracyLevel." "X.Frequency."

如果您不想使用 quote="'" 参数,请使用:

dcast(data, data$X.Group.~data$X.LiteracyLevel., value.var="X.Frequency.")

这将给出输出

  data$X.Group. 'AtLeastMiddle' 'Illerate' 'Primary'
1     'Settled'              53         76         2
2    'Shifting'              45        114        10
3        'Town'             208         93        13

这是为了好玩。要在此代码后创建一个漂亮的条形图,请不要投射整个矩阵。您应该将第一列保留为图例

final_data 包含整形后的数据。对于矩阵,跳过第一列并将其用作图例。

barplot(as.matrix(final_data[,2:4]),legend=final_data$"data$Group")

这将给出一个漂亮的图表