R 中的数据清理

Question

我有一个 csv 文件，我只想提取包含 toward 的句子的时间戳以及该句子中的水果名称。我如何在 R 中执行此操作（或者如果有更快的方法，那是什么？）

rosbagTimestamp,data
1438293900729698553,robot is in motion toward [strawberry]
1438293900730571638,Found a plan for avocado in 1.36400008202 seconds
1438293900731434815,current probability is greater than EXECUTION_THRESHOLD
1438293900731554567,ready to execute am original plan of len = 33
1438293900731586463,len of sub plan 1 = 24
1438293900731633713,len of sub plan 2 = 9
1438293900732910799,put in an execution request; now updating the dict
1438293900732949576,current_prediciton_item = avocado
1438293900733070339,current_item_probability = 0.880086981207
1438293901677787230,current probability is greater than PLANNING_THRESHOLD
1438293901681590725,robot is in motion toward [avocado]
1438293902689233770,we have received verbal request [avocado]
1438293902689314002,we already have a plan for the verbal request
1438293902689377800,debug
1438293902690529516,put in the final motion request
1438293902691076051,Found a plan for avocado in 1.95595788956 seconds
1438293902691084147,current predicted item != motion target; calc a new plan
1438293902691110642,current probability is greater than EXECUTION_THRESHOLD
1438293902691885974,have existing requests
1438293904496769068,robot is in motion toward [avocado]
1438293907737142498,ready to pick up the item

理想情况下，我希望输出是这样的：

1438293900729698553, strawberry
1438293901681590725, avocado
1438293904496769068, avocado

显然我必须在 grep 中为 R 使用 subset，但我不太确定该怎么做！

Answer 1

stamps <- df$rosbagTimestamp[grep("toward \[", df$data)]
fruits <- gsub(".*\[(\w+)\].*", "\1", df$data[grep("toward \[", df$data)])
data.frame(stamps,fruits)
               stamps     fruits
1 1438293900729698560 strawberry
2 1438293901681590784    avocado
3 1438293904496769024    avocado

我使用模式 "toward \[" 来定位水果。如果可变性发生任何变化，则可以扩展。 stamps 变量是通过在数据列中查找具有模式的时间戳创建的。 fruits 变量隔离括号内的水果。

R 中的数据清理

Data Cleaning in R

csv

r

dataset

data-cleaning