由两列传播
Spread by Two Columns
我正在尝试传播一个数据框,但我对spread()
和gather()
不太熟悉。
以下是我的数据示例。它有 9 行都具有相同的 Application.Number
。我想以每个 Application.Number-Decicion
组合结束一行。其余变量 date_generated
date_decided
time_to_decision
和 text
必须对每个 Application.Number-Decicion
组合重复,否则应采用最后一个。数据已按 Application.Number
和 date_generated
.
排序
structure(list(Application.Number = c(80749L, 80749L, 80749L,
80749L, 80749L, 80749L, 80749L, 80749L, 80749L), Decision = c("Invalid",
"Invalid", "Invalid", "Invalid", "Invalid", "Invalid", "Approved",
"Approved", "Approved"), date_generated = structure(c(1521810060,
1521810060, 1523523840, 1523536500, 1524036720, 1524136380, 1524137460,
1524137460, 1524137460), class = c("POSIXct", "POSIXt"), tzone = ""),
date_decided = structure(c(1522155960, 1522155660, 1523534400,
1523600520, 1524127140, 1524136740, 1524211800, 1524211740,
1524211200), class = c("POSIXct", "POSIXt"), tzone = ""),
time_to_decision = c(4.00347222222222, 4, 0.122222222222222,
0.740972222222222, 1.04652777777778, 0.00416666666666667,
0.860416666666667, 0.859722222222222, 0.853472222222222),
text = c("rIUQRmOkyZ", "ZxdYUr16NR", "8IIipoleOV", "nLuIgToxcT",
"xYFksrws87", "N2oECMtgQo", "RKcrBcBFI2", "jaH438byVt", "80ggA2hZr7"
)), row.names = 15880:15888, class = "data.frame")
编辑:决定输出应该只有一行,所有行都应该围绕 Application.Number
。
我最终用重复项制作了一个单独的数据框,并将其连接回唯一行。
一定有更好的方法。
好吧,你可以做,但我同意用户42-,以后会因为数据格式而导致问题:
> gather(x, "key", "val", -Application.Number, -Decision)
Application.Number Decision key val
1 80749 Invalid date_generated 1521810060
2 80749 Invalid date_generated 1521810060
3 80749 Invalid date_generated 1523523840
4 80749 Invalid date_generated 1523536500
5 80749 Invalid date_generated 1524036720
6 80749 Invalid date_generated 1524136380
7 80749 Approved date_generated 1524137460
8 80749 Approved date_generated 1524137460
9 80749 Approved date_generated 1524137460
10 80749 Invalid date_decided 1522155960
11 80749 Invalid date_decided 1522155660
12 80749 Invalid date_decided 1523534400
13 80749 Invalid date_decided 1523600520
14 80749 Invalid date_decided 1524127140
15 80749 Invalid date_decided 1524136740
16 80749 Approved date_decided 1524211800
17 80749 Approved date_decided 1524211740
18 80749 Approved date_decided 1524211200
19 80749 Invalid time_to_decision 4.00347222222222
20 80749 Invalid time_to_decision 4
21 80749 Invalid time_to_decision 0.122222222222222
22 80749 Invalid time_to_decision 0.740972222222222
23 80749 Invalid time_to_decision 1.04652777777778
24 80749 Invalid time_to_decision 0.00416666666666667
25 80749 Approved time_to_decision 0.860416666666667
26 80749 Approved time_to_decision 0.859722222222222
27 80749 Approved time_to_decision 0.853472222222222
28 80749 Invalid text rIUQRmOkyZ
29 80749 Invalid text ZxdYUr16NR
30 80749 Invalid text 8IIipoleOV
31 80749 Invalid text nLuIgToxcT
32 80749 Invalid text xYFksrws87
33 80749 Invalid text N2oECMtgQo
34 80749 Approved text RKcrBcBFI2
35 80749 Approved text jaH438byVt
36 80749 Approved text 80ggA2hZr7
Warning:
attributes are not identical across measure variables;
they will be dropped
警告已经是一个提示:您已将所有值列 data_generated
、date_decided
、time_to_decision
和 text
转换为最通用的数据格式,可以保存所有这些值: 字符串。查看您的日期如何转换为自纪元 以来的 秒:例如,您丢失了时区信息。
简而言之,您可以做到,但我认为您不应该这样做。因为你没有展示你的用例或任何上下文,所以我不能提出更好的解决方案。,
我正在尝试传播一个数据框,但我对spread()
和gather()
不太熟悉。
以下是我的数据示例。它有 9 行都具有相同的 Application.Number
。我想以每个 Application.Number-Decicion
组合结束一行。其余变量 date_generated
date_decided
time_to_decision
和 text
必须对每个 Application.Number-Decicion
组合重复,否则应采用最后一个。数据已按 Application.Number
和 date_generated
.
structure(list(Application.Number = c(80749L, 80749L, 80749L,
80749L, 80749L, 80749L, 80749L, 80749L, 80749L), Decision = c("Invalid",
"Invalid", "Invalid", "Invalid", "Invalid", "Invalid", "Approved",
"Approved", "Approved"), date_generated = structure(c(1521810060,
1521810060, 1523523840, 1523536500, 1524036720, 1524136380, 1524137460,
1524137460, 1524137460), class = c("POSIXct", "POSIXt"), tzone = ""),
date_decided = structure(c(1522155960, 1522155660, 1523534400,
1523600520, 1524127140, 1524136740, 1524211800, 1524211740,
1524211200), class = c("POSIXct", "POSIXt"), tzone = ""),
time_to_decision = c(4.00347222222222, 4, 0.122222222222222,
0.740972222222222, 1.04652777777778, 0.00416666666666667,
0.860416666666667, 0.859722222222222, 0.853472222222222),
text = c("rIUQRmOkyZ", "ZxdYUr16NR", "8IIipoleOV", "nLuIgToxcT",
"xYFksrws87", "N2oECMtgQo", "RKcrBcBFI2", "jaH438byVt", "80ggA2hZr7"
)), row.names = 15880:15888, class = "data.frame")
编辑:决定输出应该只有一行,所有行都应该围绕 Application.Number
。
我最终用重复项制作了一个单独的数据框,并将其连接回唯一行。
一定有更好的方法。
好吧,你可以做,但我同意用户42-,以后会因为数据格式而导致问题:
> gather(x, "key", "val", -Application.Number, -Decision)
Application.Number Decision key val
1 80749 Invalid date_generated 1521810060
2 80749 Invalid date_generated 1521810060
3 80749 Invalid date_generated 1523523840
4 80749 Invalid date_generated 1523536500
5 80749 Invalid date_generated 1524036720
6 80749 Invalid date_generated 1524136380
7 80749 Approved date_generated 1524137460
8 80749 Approved date_generated 1524137460
9 80749 Approved date_generated 1524137460
10 80749 Invalid date_decided 1522155960
11 80749 Invalid date_decided 1522155660
12 80749 Invalid date_decided 1523534400
13 80749 Invalid date_decided 1523600520
14 80749 Invalid date_decided 1524127140
15 80749 Invalid date_decided 1524136740
16 80749 Approved date_decided 1524211800
17 80749 Approved date_decided 1524211740
18 80749 Approved date_decided 1524211200
19 80749 Invalid time_to_decision 4.00347222222222
20 80749 Invalid time_to_decision 4
21 80749 Invalid time_to_decision 0.122222222222222
22 80749 Invalid time_to_decision 0.740972222222222
23 80749 Invalid time_to_decision 1.04652777777778
24 80749 Invalid time_to_decision 0.00416666666666667
25 80749 Approved time_to_decision 0.860416666666667
26 80749 Approved time_to_decision 0.859722222222222
27 80749 Approved time_to_decision 0.853472222222222
28 80749 Invalid text rIUQRmOkyZ
29 80749 Invalid text ZxdYUr16NR
30 80749 Invalid text 8IIipoleOV
31 80749 Invalid text nLuIgToxcT
32 80749 Invalid text xYFksrws87
33 80749 Invalid text N2oECMtgQo
34 80749 Approved text RKcrBcBFI2
35 80749 Approved text jaH438byVt
36 80749 Approved text 80ggA2hZr7
Warning:
attributes are not identical across measure variables;
they will be dropped
警告已经是一个提示:您已将所有值列 data_generated
、date_decided
、time_to_decision
和 text
转换为最通用的数据格式,可以保存所有这些值: 字符串。查看您的日期如何转换为自纪元 以来的 秒:例如,您丢失了时区信息。
简而言之,您可以做到,但我认为您不应该这样做。因为你没有展示你的用例或任何上下文,所以我不能提出更好的解决方案。,