你如何整理和组合主键略有不同的两个数据框?

how do you tidy and combine two data frames with slightly different primary keys?

流table

experiment protocol test stream_size metric value
1 tcp stream 64 throughput Gbps 10
1 tcp stream 64 cpu utilization .5
2 tcp stream 64 throughput Gbps 40
2 tcp stream 64 cpu utilization .9
3 udp stream 64 throughput Gbps 20
3 udp stream 64 cpu utilization .5
4 udp stream 64 throughput Gbps 60
4 udp stream 64 cpu utilization .8

rr table

experiment protocol test request_size response_size metric value
5 tcp request and response 64 64 transactions per second 10
5 tcp request and response 64 64 cpu utilization .6
6 tcp request and response 64 1024 transactions per second 8
6 tcp request and response 64 1024 cpu utilization .5
7 udp request and response 64 64 transactions per second 30
7 udp request and response 64 64 cpu utilization .4
8 udp request and response 64 1024 transactions per second 29
8 udp request and response 64 64 cpu utilization .75

到目前为止,实验的结果列在指标列中,它们的值列在值列中。

我知道我可以删除 stream_sizerequest_sizeresponse_size 等测试特定列,然后绑定行以创建一个数据框。

使用 R 和 tidyverse 工具,您将如何将两个数据框组合成一个长格式,以便组合的数据框没有测试特定列,stream_sizerequest_size,以及 response_size?

是否有更好或更简洁的方法来为这些实验数据制作模式以方便合并数据框?

您可以将 2 个数据帧绑定在一起,然后仅将以 size 结尾的列转换为长格式。

library(tidyverse)

bind_rows(stream, rr) %>%
  pivot_longer(ends_with("size"), names_to = "test_specific", values_to = "size", values_drop_na = TRUE)

输出

   experiment protocol test                 metric                  value test_specific  size
        <int> <chr>    <chr>                <chr>                   <dbl> <chr>         <int>
 1          1 tcp      stream               throughput Gbps          10   stream_size      64
 2          1 tcp      stream               cpu utilization           0.5 stream_size      64
 3          2 tcp      stream               throughput Gbps          40   stream_size      64
 4          2 tcp      stream               cpu utilization           0.9 stream_size      64
 5          3 udp      stream               throughput Gbps          20   stream_size      64
 6          3 udp      stream               cpu utilization           0.5 stream_size      64
 7          4 udp      stream               throughput Gbps          60   stream_size      64
 8          4 udp      stream               cpu utilization           0.8 stream_size      64
 9          5 tcp      request and response transactions per second  10   request_size     64
10          5 tcp      request and response transactions per second  10   response_size    64
# … with 14 more rows