使用 R 中的重复标识符将数据框从长转换为宽

Transforming a dataframe from long to wide with Duplicate identifiers in R

这个话题我已经有 3 到 4 次了,我以为我找到了解决办法,但我没有。我在转换这样的数据帧时遇到了很大的问题(更多的一个例子):

https://megastore.uni-augsburg.de/get/TXLoameX7G/(我希望可以通过我的大学网站提供数据框)

原始数据框看起来像左侧的那个,我想让它看起来像右侧的:

我有这段代码,它适用于我的大多数数据帧(真实数据帧有 31 天,而不仅仅是 3 天)。

library(tidyverse)
trans_df= df %>% gather(Day, value, Day01:Day31) %>% spread(Station, value)

但出于某种原因,它不适用于我的所有数据帧。有些显示此错误(例如我在 link 中上传的错误):

Error: Duplicate identifiers for rows (2893, 2905), (19333, 19345), (35773, 35785), (52213, 52225), (68653, 68665), (85093, 85105), (101533, 101545), (117973, 117985), (134413, 134425), (150853, 150865), (167293, 167305), (183733, 183745), (200173, 200185), (216613, 216625), (233053, 233065), (249493, 249505), (265933, 265945), (282373, 282385), (298813, 298825), (315253, 315265), (331693, 331705), (348133, 348145), (364573, 364585), (381013, 381025), (397453, 397465), (413893, 413905), (430333, 430345), (446773, 446785), (463213, 463225), (479653, 479665), (496093, 496105), (2894, 2906), (19334, 19346), (35774, 35786), (52214, 52226), (68654, 68666), (85094, 85106), (101534, 101546), (117974, 117986), (134414, 134426), (150854, 150866), (167294, 167306), (183734, 183746), (200174, 200186), (216614, 216626), (233054, 233066), (249494, 249506), (265934, 265946), (282374, 282386), (298814, 298826), (315254, 315266), (331694, 331706), (348134, 348146), (364574, 364586), (381014, 381026),

我已经在这里问过如何解决这个问题:

我得到了一个答案:

data2 <- data %>%
gather(Day, value, Day01:Day31) %>%
tibble::rowid_to_column() %>%
spread(Station, value)

首先我认为它可以正常工作,因为我不再收到重复标识符错误,但是文件越来越大,而且似乎每一行都重复了 4 次!

知道如何最终解决这个问题吗?

如果查看原始数据框,Station 和 Day 是重复的:

df.summary <- group_by(df, Station, Date) %>% count()
df.summary[which(df.summary$n > 1), ]

# A tibble: 396 x 3
# Groups:   Station, Date [396]
   Station       Date     n
    <fctr>     <fctr> <int>
 1 DEBW001 2001-01-01     2
 2 DEBW001 2001-02-01     2
 3 DEBW001 2001-03-01     2
 4 DEBW001 2001-04-01     2
 5 DEBW001 2001-05-01     2
 6 DEBW001 2001-06-01     2
 7 DEBW001 2001-07-01     2
 8 DEBW001 2001-08-01     2
 9 DEBW001 2001-09-01     2
10 DEBW001 2001-10-01     2
# ... with 386 more rows

这取决于您要如何处理这些重复项。假设您想取重复值的平均值:

df2 <- reshape2::melt(df, id.vars=c("Station", "Date"), variable.name="Day")
df3 <- reshape2::dcast(df2, Date+Day~Station, value.var="value", fun.aggregate=mean)

生成的数据框如下所示:

df3[1:10, 1:10]
         Date   Day AT0ACH1 AT0ENK1 AT0ILL1 AT0PIL1 AT0SIG1 AT0SON1 AT0STO1 AT0VOR1
1  2001-01-01 Day01  53.696  44.727  47.826  40.955  85.500  94.455  64.739  62.455
2  2001-01-01 Day02  42.048  28.609  39.435  42.435  78.000  89.261      NA  71.348
3  2001-01-01 Day03  38.565  28.957  19.522  28.304  72.500  88.625      NA  47.130
4  2001-01-01 Day04  39.304  23.739  16.522  20.870  85.625  95.870      NA  52.913
5  2001-01-01 Day05  67.375  29.864  22.421  21.174  82.875  93.087      NA  61.652
6  2001-01-01 Day06  58.478  32.478  28.708  26.870  67.043  79.391      NA  55.957
7  2001-01-01 Day07  49.652  21.217  29.042  48.609  55.870  76.174      NA  52.435
8  2001-01-01 Day08  48.217  16.739  27.591  41.217  59.522  79.435      NA  55.696
9  2001-01-01 Day09  52.000  30.391  44.542  46.783  67.609  82.583      NA  54.455
10 2001-01-01 Day10  37.087  33.174  28.522  30.182  80.750  94.478      NA  52.818