R中是否有函数可以有效地将原始数据文件的格式转换为所需的格式？

Question

我有一个车辆探测数据的测试数据集。（请看下文）。

Vehicle ID,Trip ID,Link ID,GenTime
7351,95263521,100,20200108141411
7351,95263521,101,20200108141421
7351,95263521,102,20200108141431
7351,95263521,110,20200108141441
7363,95263553,123,20200108141403
7363,95263553,125,20200108141413
7363,95263553,157,20200108141423
7363,95263553,168,20200108141433
7363,95270158,121,20200108160458
7363,95270158,324,20200108160508
7363,95270158,568,20200108160518
7351,95270151,325,20200108160441
7351,95270151,628,20200108160451
7351,95270151,576,20200108160501
7351,95270151,231,20200108160511
7363,95270158,432,20200108160738
7363,95270158,231,20200108160748
7363,95270158,981,20200108160758
7351,95270151,954,20200108160721
7351,95270151,950,20200108160731
7351,95270151,958,20200108160741
7351,95270151,957,20200108160751

我想将它们转换为以下格式：

Vehicle ID, Trip ID, Link ID (From), GenTime (From), Link ID (To), GenTime (To)
7351,95263521,100,20200108141411,101,20200108141421
7351,95263521,101,20200108141421,102,20200108141431
7351,95263521,102,20200108141431,110,20200108141441
...

问题 1：在 R 中是否有有效的方法来做到这一点？

Qn 2：潜在地，我可能能够接收一百万辆车的数据，并且每天可能生成多达数十亿行的数据集。 R 是否能够处理该卷的数据？

Answer 1

在dplyr中，我们可以group_byVehicle.ID和Trip.ID并使用lead

library(dplyr)

df %>%
  group_by(Vehicle.ID, Trip.ID) %>%
  mutate(Link.ID_from = Link.ID, Link.ID_to = lead(Link.ID), 
         GenTime_from = GenTime, GenTime_to = lead(GenTime)) %>%
  select(-GenTime, -Link.ID)

同样可以在 data.table 中完成，这对于更大的数据集可能更快

library(data.table)

setDT(df)[, c('Link.ID_from', 'Link.ID_to','GenTime_from', 'GenTime_to') := 
        list(Link.ID, shift(Link.ID, type = "lead"),
        GenTime, shift(GenTime, type = "lead")), .(Vehicle.ID, Trip.ID)][,-c(3, 4)]

R中是否有函数可以有效地将原始数据文件的格式转换为所需的格式？

Is there a function in R to effectively transform the format of raw data file to the desired format?

r

transformation