dcast 不将变量类型保留为字符,当变量为 NA 时 vapply 出错

dcast not retaining variable type as character, error in vapply when a variable is NA

在尝试使用 resphape2::dcast 重塑数据时,我遇到了一个涉及 NA 条目的错误。示例数据在最后。

数据从长变宽,但有时某些参数包含所有 NA 个条目,这似乎是导致问题的原因。或者至少,我认为是。如果我删除任何类似的参数,在此示例中为 Ammonia,错误就会消失。

在调试中dcast,好像是固定在这一行:

ordered <- vaggregate(.value = value, .group = overall, 
  .fun = fun.aggregate, ..., .default = fill, .n = n)

导致错误:

Error in vapply(indices, fun, .default) : 
values must be type 'character',
but FUN(X[[1]]) result is type 'integer'

看到 NA 变量排在第一位,我认为 aggregate 函数可能默认为整数,即使整个列都是 character,但移动那些行确实不解决它。我能找到 solve it is by using na.omit 的唯一方法,它完全删除了该参数。如果可能,我的预期输出将保留所有 NA 的任何参数。第二个原因是如果 day/depth 没有被抽样,它应该被保留并且那些条目应该是 ns(没有被抽样)。 有没有一种方法可以解决这个错误,而不必删除所有 NA 将被重塑的参数?

可重现示例(数据在 dcast 代码下方):

library(reshape2)
dcast(df, station + date + depth ~ parmcode, value.var = "value_qualif", fill="ns")

dcast(na.omit(df), station + date + depth ~ parmcode,
 value.var = "value_qualif", fill="ns") # solves error, but removes parameter completely

示例数据:

df <- structure(list(station = c("A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A"), date = c("7/2/2018", 
"7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", 
"7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", 
"7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", 
"7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", 
"7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/1/2018", 
"7/1/2018", "7/1/2018", "7/1/2018", "7/1/2018", "7/1/2018", "7/1/2018", 
"7/1/2018", "7/1/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", 
"7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", 
"7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", 
"7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", 
"7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", 
"7/9/2018", "7/9/2018"), depth = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 1L, 1L, 1L, 
12L, 12L, 12L, 18L, 18L, 18L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 18L, 
18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L), parmcode = c("CDOM", 
"DENSITY", "DO", "ENTERO", "PH", "TOTAL", "XMS", "TEMP", "SAL", 
"FECAL", "TOTAL", "FECAL", "ENTERO", "CDOM", "XMS", "TEMP", "SAL", 
"PH", "DO", "DENSITY", "DO", "DENSITY", "TOTAL", "FECAL", "PH", 
"CDOM", "XMS", "TEMP", "SAL", "ENTERO", "AMMONIA AS N", "AMMONIA AS N", 
"AMMONIA AS N", "AMMONIA AS N", "AMMONIA AS N", "AMMONIA AS N", 
"AMMONIA AS N", "AMMONIA AS N", "AMMONIA AS N", "TOTAL", "XMS", 
"TEMP", "SAL", "PH", "DO", "DENSITY", "CDOM", "FECAL", "ENTERO", 
"CDOM", "FECAL", "ENTERO", "PH", "DO", "TEMP", "XMS", "TOTAL", 
"DENSITY", "SAL", "TOTAL", "FECAL", "ENTERO", "XMS", "TEMP", 
"SAL", "PH", "DO", "DENSITY", "CDOM"), value_qualif = c(NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, "<2", "<2", "<2", "1.3", "69.67", 
"16.6", "33.7", "8.1", "7.6", "24.622", "5.5", "25.279", "<2", 
"<2", "7.8", "1.38", "72.96", "13.2", "33.61", "<2", NA, NA, 
NA, NA, NA, NA, NA, NA, NA, "<2", "77.82", "20.8", "33.72", "8.2", 
"8.8", "23.58", "1.01", "<2", "<2", "1.78", "<2", "<2", "8", 
"6.5", "13.5", "67.19", "2e", "25.197", "33.58", "2e", "2e", 
"<2", "75.53", "12.9", "33.61", "7.9", "5.5", "25.34", "1.77"
)), class = "data.frame", row.names = c(NA, -69L))

一些没有回答我的问题的切线相关问题是 POSIXct values become numeric in reshape2 dcast and Error with custom aggregate function for a cast() call in R reshape2

使用 na.omit 我的输出是:

  station     date depth CDOM DENSITY  DO ENTERO FECAL  PH   SAL TEMP TOTAL   XMS
1       A 7/2/2018    12  1.3  24.622 7.6     <2    <2 8.1  33.7 16.6    <2 69.67
2       A 7/2/2018    18 1.38  25.279 5.5     <2    <2 7.8 33.61 13.2    <2 72.96
3       A 7/9/2018     1 1.01   23.58 8.8     <2    <2 8.2 33.72 20.8    <2 77.82
4       A 7/9/2018    12 1.78  25.197 6.5     <2    <2   8 33.58 13.5    2e 67.19
5       A 7/9/2018    18 1.77   25.34 5.5     <2    2e 7.9 33.61 12.9    2e 75.53

不使用 na.omit 的预期输出是:

  station     date depth AMMONIA CDOM  DENSITY  DO  ENTERO FECAL  PH   SAL TEMP TOTAL   XMS
1       A 7/2/2018     1    ns    ns      ns    ns     ns    ns   ns    ns   ns    ns    ns
2       A 7/2/2018    12    ns    1.3   24.622  7.6    <2    <2  8.1  33.7 16.6    <2 69.67
3       A 7/2/2018    18    ns    1.38  25.279  5.5    <2    <2  7.8 33.61 13.2    <2 72.96
4       A 7/9/2018     1    ns    1.01  23.58   8.8    <2    <2  8.2 33.72 20.8    <2 77.82
5       A 7/9/2018    12    ns    1.78  25.197  6.5    <2    <2    8 33.58 13.5    2e 67.19
6       A 7/9/2018    18    ns    1.77  25.34   5.5    <2    2e  7.9 33.61 12.9    2e 75.53

实际问题是所有参数对于(站点、日期、深度)的每个三元组只有一个值 除了 对于 AMMONIA AS N,它有三个 NA 个条目。

例如,

dcast(df, station + date + depth ~ parmcode, value.var = "value_qualif")
# Aggregation function missing: defaulting to length
#   station     date depth AMMONIA AS N CDOM DENSITY DO ENTERO FECAL PH SAL TEMP TOTAL XMS
# 1       A 7/1/2018     1            3    0       0  0      0     0  0   0    0     0   0
# 2       A 7/1/2018    12            3    0       0  0      0     0  0   0    0     0   0
# 3       A 7/1/2018    18            3    0       0  0      0     0  0   0    0     0   0
# 4       A 7/2/2018     1            0    1       1  1      1     1  1   1    1     1   1
# 5       A 7/2/2018    12            0    1       1  1      1     1  1   1    1     1   1
# 6       A 7/2/2018    18            0    1       1  1      1     1  1   1    1     1   1
# 7       A 7/9/2018     1            0    1       1  1      1     1  1   1    1     1   1
# 8       A 7/9/2018    12            0    1       1  1      1     1  1   1    1     1   1
# 9       A 7/9/2018    18            0    1       1  1      1     1  1   1    1     1   1

一旦我们删除了重复的行,一切都会顺利进行

dcast(df[!duplicated(df), ], station + date + depth ~ parmcode, value.var = "value_qualif")
#   station     date depth AMMONIA AS N CDOM DENSITY   DO ENTERO FECAL   PH   SAL TEMP TOTAL   XMS
# 1       A 7/1/2018     1         <NA> <NA>    <NA> <NA>   <NA>  <NA> <NA>  <NA> <NA>  <NA>  <NA>
# 2       A 7/1/2018    12         <NA> <NA>    <NA> <NA>   <NA>  <NA> <NA>  <NA> <NA>  <NA>  <NA>
# 3       A 7/1/2018    18         <NA> <NA>    <NA> <NA>   <NA>  <NA> <NA>  <NA> <NA>  <NA>  <NA>
# 4       A 7/2/2018     1         <NA> <NA>    <NA> <NA>   <NA>  <NA> <NA>  <NA> <NA>  <NA>  <NA>
# 5       A 7/2/2018    12         <NA>  1.3  24.622  7.6     <2    <2  8.1  33.7 16.6    <2 69.67
# 6       A 7/2/2018    18         <NA> 1.38  25.279  5.5     <2    <2  7.8 33.61 13.2    <2 72.96
# 7       A 7/9/2018     1         <NA> 1.01   23.58  8.8     <2    <2  8.2 33.72 20.8    <2 77.82
# 8       A 7/9/2018    12         <NA> 1.78  25.197  6.5     <2    <2    8 33.58 13.5    2e 67.19
# 9       A 7/9/2018    18         <NA> 1.77   25.34  5.5     <2    2e  7.9 33.61 12.9    2e 75.53

dcast(df[!duplicated(df), ], station + date + depth ~ parmcode, value.var = "value_qualif", fill = "ns")
#   station     date depth AMMONIA AS N CDOM DENSITY  DO ENTERO FECAL  PH   SAL TEMP TOTAL   XMS
# 1       A 7/1/2018     1           ns   ns      ns  ns     ns    ns  ns    ns   ns    ns    ns
# 2       A 7/1/2018    12           ns   ns      ns  ns     ns    ns  ns    ns   ns    ns    ns
# 3       A 7/1/2018    18           ns   ns      ns  ns     ns    ns  ns    ns   ns    ns    ns
# 4       A 7/2/2018     1           ns   ns      ns  ns     ns    ns  ns    ns   ns    ns    ns
# 5       A 7/2/2018    12           ns  1.3  24.622 7.6     <2    <2 8.1  33.7 16.6    <2 69.67
# 6       A 7/2/2018    18           ns 1.38  25.279 5.5     <2    <2 7.8 33.61 13.2    <2 72.96
# 7       A 7/9/2018     1           ns 1.01   23.58 8.8     <2    <2 8.2 33.72 20.8    <2 77.82
# 8       A 7/9/2018    12           ns 1.78  25.197 6.5     <2    <2   8 33.58 13.5    2e 67.19
# 9       A 7/9/2018    18           ns 1.77   25.34 5.5     <2    2e 7.9 33.61 12.9    2e 75.53

或者,您可以 运行

dcast(df, station + date + depth ~ parmcode, value.var = "value_qualif", 
      fill = NA_character_, fun.aggregate = head, n = 1)
#   station     date depth AMMONIA AS N CDOM DENSITY   DO ENTERO FECAL   PH   SAL TEMP TOTAL   XMS
# 1       A 7/1/2018     1         <NA> <NA>    <NA> <NA>   <NA>  <NA> <NA>  <NA> <NA>  <NA>  <NA>
# 2       A 7/1/2018    12         <NA> <NA>    <NA> <NA>   <NA>  <NA> <NA>  <NA> <NA>  <NA>  <NA>
# 3       A 7/1/2018    18         <NA> <NA>    <NA> <NA>   <NA>  <NA> <NA>  <NA> <NA>  <NA>  <NA>
# 4       A 7/2/2018     1         <NA> <NA>    <NA> <NA>   <NA>  <NA> <NA>  <NA> <NA>  <NA>  <NA>
# 5       A 7/2/2018    12         <NA>  1.3  24.622  7.6     <2    <2  8.1  33.7 16.6    <2 69.67
# 6       A 7/2/2018    18         <NA> 1.38  25.279  5.5     <2    <2  7.8 33.61 13.2    <2 72.96
# 7       A 7/9/2018     1         <NA> 1.01   23.58  8.8     <2    <2  8.2 33.72 20.8    <2 77.82
# 8       A 7/9/2018    12         <NA> 1.78  25.197  6.5     <2    <2    8 33.58 13.5    2e 67.19
# 9       A 7/9/2018    18         <NA> 1.77   25.34  5.5     <2    2e  7.9 33.61 12.9    2e 75.53

参见 this answer 关于 NA_character_