reshape() 错误:不允许重复 'row names'
Error in reshape(): duplicate 'row names' are not allowed
我有宽纵向数据,我想将其重塑为长数据。这是一个示例:
sex group id sex.1 group.1 status1 beg1 end1 status2 beg2 end2
1 1000 1 a 1000 1 a Vocational <NA> S2007 HE S2007 S2008
2 1001 1 a 1001 1 a Vocational <NA> S2007 HE S2008 S2012
3 1004 1 a 1004 1 a Vocational <NA> S2008 999 <NA> <NA>
4 1006 2 a 1006 2 a Vocational <NA> S2007 Army S2012 <NA>
5 1007 1 a 1007 1 a HE <NA> S2007 999 <NA> <NA>
6 1008 1 a 1008 1 a Vocational S2013 <NA> 999 <NA> <NA>
我需要得到这个形状,兼容SPELL格式:
id sex group index status beg end
1000 1 a 1 Vocational NA S2007
1000 1 a 2 HE S2008 S2012
...
我正在使用以下命令:
spell <- reshape(data,
varying=names(data)[4:60],
direction="long",
idvar=c("id","sex","group"),
sep="")
我收到以下错误消息:
Error in `row.names<-.data.frame`(`*tmp*`, value = paste(d[, idvar], times[1L], :
duplicate 'row.names' are not allowed
In addition: Warning message: non-unique value when setting 'row.names': ‘NA.1’
我试过用这种方法将 NA 值设置为 999,但它不起作用。
data[is.na(data)] <- 999
您知道什么可以让它发挥作用吗?非常感谢!
假设 "id.1"、"sex.1" 和 "group.1" 是重复的列,我们可以删除这些列,通过插入分隔符 (“_”) 更改列名,然后reshape
data1 <- data[-(4:6)]
nm1 <- sub('\d+$', '', names(data1)[-(1:3)])
names(data1)[-(1:3)] <- paste(nm1, ave(nm1, nm1, FUN=seq_along), sep="_")
res <- reshape(data1, varying=4:ncol(data1), direction='long',
idvar=c('id', 'sex', 'group'), sep="_")
row.names(res) <- NULL
head(res)
# id sex group time status beg end
# 1 1000 1 a 1 Vocational <NA> S2007
# 2 1001 1 a 1 Vocational <NA> S2007
# 3 1004 1 a 1 Vocational <NA> S2008
# 4 1006 2 a 1 Vocational <NA> S2007
# 5 1007 1 a 1 HE <NA> S2007
# 6 1008 1 a 1 Vocational S2013 <NA>
数据
data <- structure(list(id = c(1000L, 1001L, 1004L, 1006L, 1007L, 1008L
), sex = c(1L, 1L, 1L, 2L, 1L, 1L), group = c("a", "a", "a",
"a", "a", "a"), id.1 = c(1000L, 1001L, 1004L, 1006L, 1007L, 1008L
), sex.1 = c(1L, 1L, 1L, 2L, 1L, 1L), group.1 = c("a", "a", "a",
"a", "a", "a"), status1 = c("Vocational", "Vocational", "Vocational",
"Vocational", "HE", "Vocational"), beg1 = c("<NA>", "<NA>", "<NA>",
"<NA>", "<NA>", "S2013"), end1 = c("S2007", "S2007", "S2008",
"S2007", "S2007", "<NA>"), status2 = c("HE", "HE", "999", "Army",
"999", "999"), beg2 = c("S2007", "S2008", "<NA>", "S2012", "<NA>",
"<NA>"), end2 = c("S2008", "S2012", "<NA>", "<NA>", "<NA>", "<NA>"
)), .Names = c("id", "sex", "group", "id.1", "sex.1", "group.1",
"status1", "beg1", "end1", "status2", "beg2", "end2"), class =
"data.frame", row.names = c(NA, -6L))
x2 <- reshape(mydata, idvar=c("id.1", "sex.1", "group.1"), direction="long",
varying=list(c(7, 10), c(8, 11), c(9, 12)),
v.names=c("status","beg","end"))
head(x2)
id sex group id.1 sex.1 group.1 time status beg end
1000.1.a.1 1000 1 a 1000 1 a 1 Vocational <NA> S2007
1001.1.a.1 1001 1 a 1001 1 a 1 Vocational <NA> S2007
1004.1.a.1 1004 1 a 1004 1 a 1 Vocational <NA> S2008
1006.2.a.1 1006 2 a 1006 2 a 1 Vocational <NA> S2007
1007.1.a.1 1007 1 a 1007 1 a 1 HE <NA> S2007
1008.1.a.1 1008 1 a 1008 1 a 1 Vocational S2013 <NA>
该错误消息表明您的 id 变量中有重复行或缺失值。
首先检查重复项:
with(data, any(duplicated(cbind(id, sex, group))))
如果为真,那就是你的答案。
如果为 FALSE,那么您可能在 id 变量中缺少值,甚至可能缺少整个行,并且可能在末尾。这可能是由于实际源数据有空白行或您的 R 命令导入数据,例如使用 read_excel 并在范围参数中指定了太多行。无论如何,请仔细检查数据以查找 id 变量中的缺失值。全部换成999也没用
您可以通过在使用 reshape
的 new.row.names
选项重塑时指定新的行名称来解决“不允许重复 'row.names'”错误消息:
spell <- reshape(data,
varying = names(data)[4:60],
direction = "long",
idvar = c("id","sex","group"),
sep = "",
new.row.names = 1:1000)
我有宽纵向数据,我想将其重塑为长数据。这是一个示例:
sex group id sex.1 group.1 status1 beg1 end1 status2 beg2 end2
1 1000 1 a 1000 1 a Vocational <NA> S2007 HE S2007 S2008
2 1001 1 a 1001 1 a Vocational <NA> S2007 HE S2008 S2012
3 1004 1 a 1004 1 a Vocational <NA> S2008 999 <NA> <NA>
4 1006 2 a 1006 2 a Vocational <NA> S2007 Army S2012 <NA>
5 1007 1 a 1007 1 a HE <NA> S2007 999 <NA> <NA>
6 1008 1 a 1008 1 a Vocational S2013 <NA> 999 <NA> <NA>
我需要得到这个形状,兼容SPELL格式:
id sex group index status beg end
1000 1 a 1 Vocational NA S2007
1000 1 a 2 HE S2008 S2012
...
我正在使用以下命令:
spell <- reshape(data,
varying=names(data)[4:60],
direction="long",
idvar=c("id","sex","group"),
sep="")
我收到以下错误消息:
Error in `row.names<-.data.frame`(`*tmp*`, value = paste(d[, idvar], times[1L], :
duplicate 'row.names' are not allowed
In addition: Warning message: non-unique value when setting 'row.names': ‘NA.1’
我试过用这种方法将 NA 值设置为 999,但它不起作用。
data[is.na(data)] <- 999
您知道什么可以让它发挥作用吗?非常感谢!
假设 "id.1"、"sex.1" 和 "group.1" 是重复的列,我们可以删除这些列,通过插入分隔符 (“_”) 更改列名,然后reshape
data1 <- data[-(4:6)]
nm1 <- sub('\d+$', '', names(data1)[-(1:3)])
names(data1)[-(1:3)] <- paste(nm1, ave(nm1, nm1, FUN=seq_along), sep="_")
res <- reshape(data1, varying=4:ncol(data1), direction='long',
idvar=c('id', 'sex', 'group'), sep="_")
row.names(res) <- NULL
head(res)
# id sex group time status beg end
# 1 1000 1 a 1 Vocational <NA> S2007
# 2 1001 1 a 1 Vocational <NA> S2007
# 3 1004 1 a 1 Vocational <NA> S2008
# 4 1006 2 a 1 Vocational <NA> S2007
# 5 1007 1 a 1 HE <NA> S2007
# 6 1008 1 a 1 Vocational S2013 <NA>
数据
data <- structure(list(id = c(1000L, 1001L, 1004L, 1006L, 1007L, 1008L
), sex = c(1L, 1L, 1L, 2L, 1L, 1L), group = c("a", "a", "a",
"a", "a", "a"), id.1 = c(1000L, 1001L, 1004L, 1006L, 1007L, 1008L
), sex.1 = c(1L, 1L, 1L, 2L, 1L, 1L), group.1 = c("a", "a", "a",
"a", "a", "a"), status1 = c("Vocational", "Vocational", "Vocational",
"Vocational", "HE", "Vocational"), beg1 = c("<NA>", "<NA>", "<NA>",
"<NA>", "<NA>", "S2013"), end1 = c("S2007", "S2007", "S2008",
"S2007", "S2007", "<NA>"), status2 = c("HE", "HE", "999", "Army",
"999", "999"), beg2 = c("S2007", "S2008", "<NA>", "S2012", "<NA>",
"<NA>"), end2 = c("S2008", "S2012", "<NA>", "<NA>", "<NA>", "<NA>"
)), .Names = c("id", "sex", "group", "id.1", "sex.1", "group.1",
"status1", "beg1", "end1", "status2", "beg2", "end2"), class =
"data.frame", row.names = c(NA, -6L))
x2 <- reshape(mydata, idvar=c("id.1", "sex.1", "group.1"), direction="long",
varying=list(c(7, 10), c(8, 11), c(9, 12)),
v.names=c("status","beg","end"))
head(x2)
id sex group id.1 sex.1 group.1 time status beg end
1000.1.a.1 1000 1 a 1000 1 a 1 Vocational <NA> S2007
1001.1.a.1 1001 1 a 1001 1 a 1 Vocational <NA> S2007
1004.1.a.1 1004 1 a 1004 1 a 1 Vocational <NA> S2008
1006.2.a.1 1006 2 a 1006 2 a 1 Vocational <NA> S2007
1007.1.a.1 1007 1 a 1007 1 a 1 HE <NA> S2007
1008.1.a.1 1008 1 a 1008 1 a 1 Vocational S2013 <NA>
该错误消息表明您的 id 变量中有重复行或缺失值。
首先检查重复项:
with(data, any(duplicated(cbind(id, sex, group))))
如果为真,那就是你的答案。
如果为 FALSE,那么您可能在 id 变量中缺少值,甚至可能缺少整个行,并且可能在末尾。这可能是由于实际源数据有空白行或您的 R 命令导入数据,例如使用 read_excel 并在范围参数中指定了太多行。无论如何,请仔细检查数据以查找 id 变量中的缺失值。全部换成999也没用
您可以通过在使用 reshape
的 new.row.names
选项重塑时指定新的行名称来解决“不允许重复 'row.names'”错误消息:
spell <- reshape(data,
varying = names(data)[4:60],
direction = "long",
idvar = c("id","sex","group"),
sep = "",
new.row.names = 1:1000)