你如何整理和组合主键略有不同的两个数据框?
how do you tidy and combine two data frames with slightly different primary keys?
流table
experiment
protocol
test
stream_size
metric
value
1
tcp
stream
64
throughput Gbps
10
1
tcp
stream
64
cpu utilization
.5
2
tcp
stream
64
throughput Gbps
40
2
tcp
stream
64
cpu utilization
.9
3
udp
stream
64
throughput Gbps
20
3
udp
stream
64
cpu utilization
.5
4
udp
stream
64
throughput Gbps
60
4
udp
stream
64
cpu utilization
.8
rr table
experiment
protocol
test
request_size
response_size
metric
value
5
tcp
request and response
64
64
transactions per second
10
5
tcp
request and response
64
64
cpu utilization
.6
6
tcp
request and response
64
1024
transactions per second
8
6
tcp
request and response
64
1024
cpu utilization
.5
7
udp
request and response
64
64
transactions per second
30
7
udp
request and response
64
64
cpu utilization
.4
8
udp
request and response
64
1024
transactions per second
29
8
udp
request and response
64
64
cpu utilization
.75
到目前为止,实验的结果列在指标列中,它们的值列在值列中。
我知道我可以删除 stream_size
、request_size
和 response_size
等测试特定列,然后绑定行以创建一个数据框。
使用 R 和 tidyverse 工具,您将如何将两个数据框组合成一个长格式,以便组合的数据框没有测试特定列,stream_size
,request_size
,以及 response_size
?
是否有更好或更简洁的方法来为这些实验数据制作模式以方便合并数据框?
您可以将 2 个数据帧绑定在一起,然后仅将以 size
结尾的列转换为长格式。
library(tidyverse)
bind_rows(stream, rr) %>%
pivot_longer(ends_with("size"), names_to = "test_specific", values_to = "size", values_drop_na = TRUE)
输出
experiment protocol test metric value test_specific size
<int> <chr> <chr> <chr> <dbl> <chr> <int>
1 1 tcp stream throughput Gbps 10 stream_size 64
2 1 tcp stream cpu utilization 0.5 stream_size 64
3 2 tcp stream throughput Gbps 40 stream_size 64
4 2 tcp stream cpu utilization 0.9 stream_size 64
5 3 udp stream throughput Gbps 20 stream_size 64
6 3 udp stream cpu utilization 0.5 stream_size 64
7 4 udp stream throughput Gbps 60 stream_size 64
8 4 udp stream cpu utilization 0.8 stream_size 64
9 5 tcp request and response transactions per second 10 request_size 64
10 5 tcp request and response transactions per second 10 response_size 64
# … with 14 more rows
流table
experiment | protocol | test | stream_size | metric | value |
---|---|---|---|---|---|
1 | tcp | stream | 64 | throughput Gbps | 10 |
1 | tcp | stream | 64 | cpu utilization | .5 |
2 | tcp | stream | 64 | throughput Gbps | 40 |
2 | tcp | stream | 64 | cpu utilization | .9 |
3 | udp | stream | 64 | throughput Gbps | 20 |
3 | udp | stream | 64 | cpu utilization | .5 |
4 | udp | stream | 64 | throughput Gbps | 60 |
4 | udp | stream | 64 | cpu utilization | .8 |
rr table
experiment | protocol | test | request_size | response_size | metric | value |
---|---|---|---|---|---|---|
5 | tcp | request and response | 64 | 64 | transactions per second | 10 |
5 | tcp | request and response | 64 | 64 | cpu utilization | .6 |
6 | tcp | request and response | 64 | 1024 | transactions per second | 8 |
6 | tcp | request and response | 64 | 1024 | cpu utilization | .5 |
7 | udp | request and response | 64 | 64 | transactions per second | 30 |
7 | udp | request and response | 64 | 64 | cpu utilization | .4 |
8 | udp | request and response | 64 | 1024 | transactions per second | 29 |
8 | udp | request and response | 64 | 64 | cpu utilization | .75 |
到目前为止,实验的结果列在指标列中,它们的值列在值列中。
我知道我可以删除 stream_size
、request_size
和 response_size
等测试特定列,然后绑定行以创建一个数据框。
使用 R 和 tidyverse 工具,您将如何将两个数据框组合成一个长格式,以便组合的数据框没有测试特定列,stream_size
,request_size
,以及 response_size
?
是否有更好或更简洁的方法来为这些实验数据制作模式以方便合并数据框?
您可以将 2 个数据帧绑定在一起,然后仅将以 size
结尾的列转换为长格式。
library(tidyverse)
bind_rows(stream, rr) %>%
pivot_longer(ends_with("size"), names_to = "test_specific", values_to = "size", values_drop_na = TRUE)
输出
experiment protocol test metric value test_specific size
<int> <chr> <chr> <chr> <dbl> <chr> <int>
1 1 tcp stream throughput Gbps 10 stream_size 64
2 1 tcp stream cpu utilization 0.5 stream_size 64
3 2 tcp stream throughput Gbps 40 stream_size 64
4 2 tcp stream cpu utilization 0.9 stream_size 64
5 3 udp stream throughput Gbps 20 stream_size 64
6 3 udp stream cpu utilization 0.5 stream_size 64
7 4 udp stream throughput Gbps 60 stream_size 64
8 4 udp stream cpu utilization 0.8 stream_size 64
9 5 tcp request and response transactions per second 10 request_size 64
10 5 tcp request and response transactions per second 10 response_size 64
# … with 14 more rows