传播行的重复标识符
Spread with duplicate identifiers for rows
在 之前已经有关于此主题的问题,但我仍在努力传播它。我希望每个 state
都有自己的温度值列。
这是我的 dput()
数据。我称之为 df
structure(list(date = c("2018-01-21", "2018-01-21", "2018-01-20",
"2018-01-20", "2018-01-19", "2018-01-19", "2018-01-18", "2018-01-18",
"2018-01-17", "2018-01-17", "2018-01-16", "2018-01-16", "2018-01-15",
"2018-01-15", "2018-01-14", "2018-01-14", "2018-01-12", "2018-01-12",
"2018-01-11", "2018-01-11", "2018-01-10", "2018-01-10", "2018-01-09",
"2018-01-09", "2018-01-08", "2018-01-08", "2018-01-07", "2018-01-07",
"2018-01-06", "2018-01-06", "2018-01-05", "2018-01-05", "2018-01-04",
"2018-01-04", "2018-01-03", "2018-01-03", "2018-01-03", "2018-01-03",
"2018-01-02", "2018-01-02"), tmin = c(24, 31, 31, 29, 44, 17,
32, 7, 31, 7, 31, 6, 30, 13, 30, 1, 43, 20, 33, 52, 42, 29, 30,
29, 26, 32, 33, -2, 29, 0, 23, 3, 19, 11, NA, -3, 22, -3, 24,
-4), state = c("UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH",
"UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT",
"OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH",
"UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH")), class = "data.frame", row.names = c(NA,
-40L), .Names = c("date", "tmin", "state"))
我运行的代码是
df %>% spread(state,tmin)
我希望给我以下格式
date UT OH
... ... ...
但我收到错误消息
Error: Duplicate identifiers for rows (36, 38), (35, 37)
我尝试了几种不同的方法。我尝试过的一件事是按日期分组。我在想相同日期的行导致了 spread
的问题。我还尝试使用 add_rownames()
创建新行,然后使用 spread(state,tmin)
,但这也未能解决问题。
为了 spread
按预期工作,生成的数据框必须具有唯一标识的行和列。对于您的数据,"date" 列是传播后唯一的唯一标识符。但是,第 36 行和第 38 行是相同的:
date tmin state
36 2018-01-03 -3 OH
38 2018-01-03 -3 OH
这使 tidyr 无法尝试将两个数据点解析为同一行和同一列。此外,第 35 行和第 37 行具有相同的日期和状态,再次造成不可能将两个不同的值放在新数据框中的相同位置的情况:
date tmin state
35 2018-01-03 NA UT
37 2018-01-03 22 UT
以下数据清理将使传播成为可能:
df %>%
filter(!is.na(tmin)) %>% # remove NA values
unique %>% # remove duplicated rows
spread(state, tmin)
date OH UT
1 2018-01-02 -4 24
2 2018-01-03 -3 22
3 2018-01-04 11 19
4 2018-01-05 3 23
5 2018-01-06 0 29
...
在 state
都有自己的温度值列。
这是我的 dput()
数据。我称之为 df
structure(list(date = c("2018-01-21", "2018-01-21", "2018-01-20",
"2018-01-20", "2018-01-19", "2018-01-19", "2018-01-18", "2018-01-18",
"2018-01-17", "2018-01-17", "2018-01-16", "2018-01-16", "2018-01-15",
"2018-01-15", "2018-01-14", "2018-01-14", "2018-01-12", "2018-01-12",
"2018-01-11", "2018-01-11", "2018-01-10", "2018-01-10", "2018-01-09",
"2018-01-09", "2018-01-08", "2018-01-08", "2018-01-07", "2018-01-07",
"2018-01-06", "2018-01-06", "2018-01-05", "2018-01-05", "2018-01-04",
"2018-01-04", "2018-01-03", "2018-01-03", "2018-01-03", "2018-01-03",
"2018-01-02", "2018-01-02"), tmin = c(24, 31, 31, 29, 44, 17,
32, 7, 31, 7, 31, 6, 30, 13, 30, 1, 43, 20, 33, 52, 42, 29, 30,
29, 26, 32, 33, -2, 29, 0, 23, 3, 19, 11, NA, -3, 22, -3, 24,
-4), state = c("UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH",
"UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT",
"OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH",
"UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH")), class = "data.frame", row.names = c(NA,
-40L), .Names = c("date", "tmin", "state"))
我运行的代码是
df %>% spread(state,tmin)
我希望给我以下格式
date UT OH
... ... ...
但我收到错误消息
Error: Duplicate identifiers for rows (36, 38), (35, 37)
我尝试了几种不同的方法。我尝试过的一件事是按日期分组。我在想相同日期的行导致了 spread
的问题。我还尝试使用 add_rownames()
创建新行,然后使用 spread(state,tmin)
,但这也未能解决问题。
为了 spread
按预期工作,生成的数据框必须具有唯一标识的行和列。对于您的数据,"date" 列是传播后唯一的唯一标识符。但是,第 36 行和第 38 行是相同的:
date tmin state
36 2018-01-03 -3 OH
38 2018-01-03 -3 OH
这使 tidyr 无法尝试将两个数据点解析为同一行和同一列。此外,第 35 行和第 37 行具有相同的日期和状态,再次造成不可能将两个不同的值放在新数据框中的相同位置的情况:
date tmin state
35 2018-01-03 NA UT
37 2018-01-03 22 UT
以下数据清理将使传播成为可能:
df %>%
filter(!is.na(tmin)) %>% # remove NA values
unique %>% # remove duplicated rows
spread(state, tmin)
date OH UT
1 2018-01-02 -4 24
2 2018-01-03 -3 22
3 2018-01-04 11 19
4 2018-01-05 3 23
5 2018-01-06 0 29
...