使用 R 中的重复标识符将数据框从长转换为宽
Transforming a dataframe from long to wide with Duplicate identifiers in R
这个话题我已经有 3 到 4 次了,我以为我找到了解决办法,但我没有。我在转换这样的数据帧时遇到了很大的问题(更多的一个例子):
https://megastore.uni-augsburg.de/get/TXLoameX7G/(我希望可以通过我的大学网站提供数据框)
原始数据框看起来像左侧的那个,我想让它看起来像右侧的:
我有这段代码,它适用于我的大多数数据帧(真实数据帧有 31 天,而不仅仅是 3 天)。
library(tidyverse)
trans_df= df %>% gather(Day, value, Day01:Day31) %>% spread(Station, value)
但出于某种原因,它不适用于我的所有数据帧。有些显示此错误(例如我在 link 中上传的错误):
Error: Duplicate identifiers for rows (2893, 2905), (19333, 19345), (35773, 35785), (52213, 52225), (68653, 68665), (85093, 85105), (101533, 101545), (117973, 117985), (134413, 134425), (150853, 150865), (167293, 167305), (183733, 183745), (200173, 200185), (216613, 216625), (233053, 233065), (249493, 249505), (265933, 265945), (282373, 282385), (298813, 298825), (315253, 315265), (331693, 331705), (348133, 348145), (364573, 364585), (381013, 381025), (397453, 397465), (413893, 413905), (430333, 430345), (446773, 446785), (463213, 463225), (479653, 479665), (496093, 496105), (2894, 2906), (19334, 19346), (35774, 35786), (52214, 52226), (68654, 68666), (85094, 85106), (101534, 101546), (117974, 117986), (134414, 134426), (150854, 150866), (167294, 167306), (183734, 183746), (200174, 200186), (216614, 216626), (233054, 233066), (249494, 249506), (265934, 265946), (282374, 282386), (298814, 298826), (315254, 315266), (331694, 331706), (348134, 348146), (364574, 364586), (381014, 381026),
我已经在这里问过如何解决这个问题:
我得到了一个答案:
data2 <- data %>%
gather(Day, value, Day01:Day31) %>%
tibble::rowid_to_column() %>%
spread(Station, value)
首先我认为它可以正常工作,因为我不再收到重复标识符错误,但是文件越来越大,而且似乎每一行都重复了 4 次!
知道如何最终解决这个问题吗?
如果查看原始数据框,Station 和 Day 是重复的:
df.summary <- group_by(df, Station, Date) %>% count()
df.summary[which(df.summary$n > 1), ]
# A tibble: 396 x 3
# Groups: Station, Date [396]
Station Date n
<fctr> <fctr> <int>
1 DEBW001 2001-01-01 2
2 DEBW001 2001-02-01 2
3 DEBW001 2001-03-01 2
4 DEBW001 2001-04-01 2
5 DEBW001 2001-05-01 2
6 DEBW001 2001-06-01 2
7 DEBW001 2001-07-01 2
8 DEBW001 2001-08-01 2
9 DEBW001 2001-09-01 2
10 DEBW001 2001-10-01 2
# ... with 386 more rows
这取决于您要如何处理这些重复项。假设您想取重复值的平均值:
df2 <- reshape2::melt(df, id.vars=c("Station", "Date"), variable.name="Day")
df3 <- reshape2::dcast(df2, Date+Day~Station, value.var="value", fun.aggregate=mean)
生成的数据框如下所示:
df3[1:10, 1:10]
Date Day AT0ACH1 AT0ENK1 AT0ILL1 AT0PIL1 AT0SIG1 AT0SON1 AT0STO1 AT0VOR1
1 2001-01-01 Day01 53.696 44.727 47.826 40.955 85.500 94.455 64.739 62.455
2 2001-01-01 Day02 42.048 28.609 39.435 42.435 78.000 89.261 NA 71.348
3 2001-01-01 Day03 38.565 28.957 19.522 28.304 72.500 88.625 NA 47.130
4 2001-01-01 Day04 39.304 23.739 16.522 20.870 85.625 95.870 NA 52.913
5 2001-01-01 Day05 67.375 29.864 22.421 21.174 82.875 93.087 NA 61.652
6 2001-01-01 Day06 58.478 32.478 28.708 26.870 67.043 79.391 NA 55.957
7 2001-01-01 Day07 49.652 21.217 29.042 48.609 55.870 76.174 NA 52.435
8 2001-01-01 Day08 48.217 16.739 27.591 41.217 59.522 79.435 NA 55.696
9 2001-01-01 Day09 52.000 30.391 44.542 46.783 67.609 82.583 NA 54.455
10 2001-01-01 Day10 37.087 33.174 28.522 30.182 80.750 94.478 NA 52.818
这个话题我已经有 3 到 4 次了,我以为我找到了解决办法,但我没有。我在转换这样的数据帧时遇到了很大的问题(更多的一个例子):
https://megastore.uni-augsburg.de/get/TXLoameX7G/(我希望可以通过我的大学网站提供数据框)
原始数据框看起来像左侧的那个,我想让它看起来像右侧的:
我有这段代码,它适用于我的大多数数据帧(真实数据帧有 31 天,而不仅仅是 3 天)。
library(tidyverse)
trans_df= df %>% gather(Day, value, Day01:Day31) %>% spread(Station, value)
但出于某种原因,它不适用于我的所有数据帧。有些显示此错误(例如我在 link 中上传的错误):
Error: Duplicate identifiers for rows (2893, 2905), (19333, 19345), (35773, 35785), (52213, 52225), (68653, 68665), (85093, 85105), (101533, 101545), (117973, 117985), (134413, 134425), (150853, 150865), (167293, 167305), (183733, 183745), (200173, 200185), (216613, 216625), (233053, 233065), (249493, 249505), (265933, 265945), (282373, 282385), (298813, 298825), (315253, 315265), (331693, 331705), (348133, 348145), (364573, 364585), (381013, 381025), (397453, 397465), (413893, 413905), (430333, 430345), (446773, 446785), (463213, 463225), (479653, 479665), (496093, 496105), (2894, 2906), (19334, 19346), (35774, 35786), (52214, 52226), (68654, 68666), (85094, 85106), (101534, 101546), (117974, 117986), (134414, 134426), (150854, 150866), (167294, 167306), (183734, 183746), (200174, 200186), (216614, 216626), (233054, 233066), (249494, 249506), (265934, 265946), (282374, 282386), (298814, 298826), (315254, 315266), (331694, 331706), (348134, 348146), (364574, 364586), (381014, 381026),
我已经在这里问过如何解决这个问题:
我得到了一个答案:
data2 <- data %>%
gather(Day, value, Day01:Day31) %>%
tibble::rowid_to_column() %>%
spread(Station, value)
首先我认为它可以正常工作,因为我不再收到重复标识符错误,但是文件越来越大,而且似乎每一行都重复了 4 次!
知道如何最终解决这个问题吗?
如果查看原始数据框,Station 和 Day 是重复的:
df.summary <- group_by(df, Station, Date) %>% count()
df.summary[which(df.summary$n > 1), ]
# A tibble: 396 x 3
# Groups: Station, Date [396]
Station Date n
<fctr> <fctr> <int>
1 DEBW001 2001-01-01 2
2 DEBW001 2001-02-01 2
3 DEBW001 2001-03-01 2
4 DEBW001 2001-04-01 2
5 DEBW001 2001-05-01 2
6 DEBW001 2001-06-01 2
7 DEBW001 2001-07-01 2
8 DEBW001 2001-08-01 2
9 DEBW001 2001-09-01 2
10 DEBW001 2001-10-01 2
# ... with 386 more rows
这取决于您要如何处理这些重复项。假设您想取重复值的平均值:
df2 <- reshape2::melt(df, id.vars=c("Station", "Date"), variable.name="Day")
df3 <- reshape2::dcast(df2, Date+Day~Station, value.var="value", fun.aggregate=mean)
生成的数据框如下所示:
df3[1:10, 1:10]
Date Day AT0ACH1 AT0ENK1 AT0ILL1 AT0PIL1 AT0SIG1 AT0SON1 AT0STO1 AT0VOR1
1 2001-01-01 Day01 53.696 44.727 47.826 40.955 85.500 94.455 64.739 62.455
2 2001-01-01 Day02 42.048 28.609 39.435 42.435 78.000 89.261 NA 71.348
3 2001-01-01 Day03 38.565 28.957 19.522 28.304 72.500 88.625 NA 47.130
4 2001-01-01 Day04 39.304 23.739 16.522 20.870 85.625 95.870 NA 52.913
5 2001-01-01 Day05 67.375 29.864 22.421 21.174 82.875 93.087 NA 61.652
6 2001-01-01 Day06 58.478 32.478 28.708 26.870 67.043 79.391 NA 55.957
7 2001-01-01 Day07 49.652 21.217 29.042 48.609 55.870 76.174 NA 52.435
8 2001-01-01 Day08 48.217 16.739 27.591 41.217 59.522 79.435 NA 55.696
9 2001-01-01 Day09 52.000 30.391 44.542 46.783 67.609 82.583 NA 54.455
10 2001-01-01 Day10 37.087 33.174 28.522 30.182 80.750 94.478 NA 52.818