R:将文本文件拆分为可行的数据框
R: splitting a textfile into a workable dataframe
我正在处理一个包含本地化数据的文本文件。每 5 分钟就会有多个报告,这些报告可能会导致计算区域。如果它解析区域,它会输出一个已识别的房间 ID(示例中为 4260 和 4256):
[08/14/2021 05:05:59 600] - TagId: 4194912 Identified RoomId:4260
[08/14/2021 05:05:59 616] - TagId: 4194912 Last Monitorid:4195283
[08/14/2021 05:05:59 631] - TagId: 4194912 After RoomId:2199
[08/14/2021 05:05:59 631] - Localization RoomId: 2199
[08/14/2021 05:05:59 663] - TagId: 4194912 Reporting RoomId:2199
[08/14/2021 05:05:59 663] - MacId: F0_5C_19_C6_88_A4 RSSI: -72
[08/14/2021 05:05:59 678] - MacId: F0_5C_19_C7_86_54 RSSI: -82
[08/14/2021 05:05:59 678] - MacId: F0_5C_19_C6_89_3C RSSI: -45
[08/14/2021 05:05:59 694] - MacId: F0_5C_19_C6_88_22 RSSI: -80
[08/14/2021 05:05:59 709] - MacId: F0_5C_19_C6_88_12 RSSI: -60
[08/14/2021 05:05:59 709] - MacId: F0_5C_19_C6_88_A8 RSSI: -83
[08/14/2021 05:05:59 709] - MacId: F0_5C_19_C6_88_90 RSSI: -89
[08/14/2021 05:05:59 709] - MacId: F0_5C_19_C6_88_2E RSSI: -54
[08/14/2021 05:05:59 913] - MacId: 40_E3_D6_CA_56_5C RSSI: -92
[08/14/2021 05:05:59 913] - MacId: F0_5C_19_C6_88_52 RSSI: -92
[08/14/2021 05:05:59 928] - MacId: F0_5C_19_C6_88_B8 RSSI: -80
[08/14/2021 05:06:00 288] - MacId: F0_5C_19_C6_88_A4 RSSI: -72
[08/14/2021 05:06:00 288] - MacId: F0_5C_19_C7_86_54 RSSI: -82
[08/14/2021 05:06:00 288] - MacId: 40_E3_D6_CA_57_0A RSSI: -90
[08/14/2021 05:06:00 288] - MacId: F0_5C_19_C6_89_3C RSSI: -45
[08/14/2021 05:06:00 413] - MacId: F0_5C_19_C6_88_90 RSSI: -90
[08/14/2021 05:06:00 413] - MacId: F0_5C_19_C6_88_12 RSSI: -60
[08/14/2021 05:06:00 428] - MacId: F0_5C_19_C6_88_22 RSSI: -80
[08/14/2021 05:06:00 428] - MacId: F0_5C_19_C6_88_A8 RSSI: -83
[08/14/2021 05:06:00 428] - MacId: F0_5C_19_C6_88_2E RSSI: -55
[08/14/2021 05:11:00 974] - MacId: F0_5C_19_C6_88_A4 RSSI: -72
[08/14/2021 05:11:01 006] - TagId: 4194912 Identified RoomId:4256
[08/14/2021 05:11:01 021] - TagId: 4194912 Last Monitorid:4195283
[08/14/2021 05:11:01 037] - TagId: 4194912 After RoomId:2199
[08/14/2021 05:11:01 052] - Localization RoomId: 2199
[08/14/2021 05:11:01 084] - TagId: 4194912 Reporting RoomId:2199
[08/14/2021 05:11:01 084] - MacId: F0_5C_19_C7_86_54 RSSI: -83
[08/14/2021 05:11:01 084] - MacId: F0_5C_19_C6_88_78 RSSI: -90
[08/14/2021 05:11:01 099] - MacId: F0_5C_19_C6_89_3C RSSI: -45
[08/14/2021 05:11:01 349] - MacId: F0_5C_19_C6_88_12 RSSI: -60
[08/14/2021 05:11:01 349] - MacId: F0_5C_19_C6_88_2E RSSI: -55
[08/14/2021 05:11:01 349] - MacId: F0_5C_19_C6_88_A8 RSSI: -84
[08/14/2021 05:11:01 349] - MacId: F0_5C_19_C6_88_90 RSSI: -89
[08/14/2021 05:11:01 365] - MacId: F0_5C_19_C6_88_22 RSSI: -80
[08/14/2021 05:11:01 474] - MacId: 40_E3_D6_CA_56_5C RSSI: -93
[08/14/2021 05:11:01 490] - MacId: F0_5C_19_C6_88_52 RSSI: -90
[08/14/2021 05:11:01 490] - MacId: F0_5C_19_C6_88_BE RSSI: -89
[08/14/2021 05:11:01 802] - MacId: F0_5C_19_C6_88_A4 RSSI: -72
[08/14/2021 05:11:01 802] - MacId: 40_E3_D6_CA_57_0A RSSI: -90
[08/14/2021 05:11:01 802] - MacId: F0_5C_19_C6_89_3C RSSI: -45
[08/14/2021 05:11:01 802] - MacId: F0_5C_19_C6_88_78 RSSI: -89
[08/14/2021 05:11:01 802] - MacId: F0_5C_19_C7_86_54 RSSI: -82
[08/14/2021 05:11:02 006] - MacId: F0_5C_19_C6_88_90 RSSI: -89
[08/14/2021 05:11:02 006] - MacId: F0_5C_19_C6_88_22 RSSI: -80
[08/14/2021 05:11:02 021] - MacId: F0_5C_19_C6_88_A8 RSSI: -84
[08/14/2021 05:11:02 021] - MacId: F0_5C_19_C6_88_2E RSSI: -55
[08/14/2021 05:11:02 021] - MacId: F0_5C_19_C6_88_12 RSSI: -60
[08/14/2021 05:11:02 115] - MacId: F0_5C_19_C6_88_52 RSSI: -91
[08/14/2021 05:11:02 115] - MacId: F0_5C_19_C6_88_BE RSSI: -88
我希望得到以下形式的数据:
如果 RoomId 未在 5 分钟时间范围内解析(来自原始文本文件),则 RoomId 列可以只是 NA。
一位非常有帮助的成员已经展示了如何以正确的方式拆分列 ()
所以主要问题是:我怎样才能以类似于图像的方式构建这个原始文本文件,它是一个可行的数据框,尽管原始文本文件中的行并非都是相似的?
假设数据存储在名为 'temp.txt'
的文本文件中,您可以使用 readLines
读取它。仅保留具有 MacId
和 RSSI
值的行并获得 RoomId
并保留 'Identified RoomId'
行。将数据拆分成集合,并使用之前 post 中的代码从每个集合中提取 Datetime
、MacId
和 RSSI
,并通过删除所有内容提取房间 ID,直到 RoomId
.您可以将输出合并到一个数据帧中。
data <- readLines('temp.txt')
req_data <- grep('MacId.*RSSI|Identified RoomId', data, value = TRUE)
result <- do.call(rbind, by(req_data, cumsum(grepl('Identified', req_data)),
function(x) {
room_id <- sub('.*RoomId:\s*', '', x[1])
cbind(strcapture('\[(.*)\] - MacId: (.*) RSSI: (.*)', x[-1],
proto = list(Datetime = character(), MacId = character(),
RSSI = numeric())), RoomId = room_id)
}))
rownames(result) <- NULL
对于共享的文本数据,我得到的输出为 -
result
Datetime MacId RSSI RoomId
1 08/14/2021 05:05:59 663 F0_5C_19_C6_88_A4 -72 4260
2 08/14/2021 05:05:59 678 F0_5C_19_C7_86_54 -82 4260
3 08/14/2021 05:05:59 678 F0_5C_19_C6_89_3C -45 4260
4 08/14/2021 05:05:59 694 F0_5C_19_C6_88_22 -80 4260
5 08/14/2021 05:05:59 709 F0_5C_19_C6_88_12 -60 4260
6 08/14/2021 05:05:59 709 F0_5C_19_C6_88_A8 -83 4260
7 08/14/2021 05:05:59 709 F0_5C_19_C6_88_90 -89 4260
8 08/14/2021 05:05:59 709 F0_5C_19_C6_88_2E -54 4260
9 08/14/2021 05:05:59 913 40_E3_D6_CA_56_5C -92 4260
10 08/14/2021 05:05:59 913 F0_5C_19_C6_88_52 -92 4260
11 08/14/2021 05:05:59 928 F0_5C_19_C6_88_B8 -80 4260
12 08/14/2021 05:06:00 288 F0_5C_19_C6_88_A4 -72 4260
13 08/14/2021 05:06:00 288 F0_5C_19_C7_86_54 -82 4260
14 08/14/2021 05:06:00 288 40_E3_D6_CA_57_0A -90 4260
15 08/14/2021 05:06:00 288 F0_5C_19_C6_89_3C -45 4260
16 08/14/2021 05:06:00 413 F0_5C_19_C6_88_90 -90 4260
17 08/14/2021 05:06:00 413 F0_5C_19_C6_88_12 -60 4260
18 08/14/2021 05:06:00 428 F0_5C_19_C6_88_22 -80 4260
19 08/14/2021 05:06:00 428 F0_5C_19_C6_88_A8 -83 4260
20 08/14/2021 05:06:00 428 F0_5C_19_C6_88_2E -55 4260
21 08/14/2021 05:11:00 974 F0_5C_19_C6_88_A4 -72 4260
22 08/14/2021 05:11:01 084 F0_5C_19_C7_86_54 -83 4256
23 08/14/2021 05:11:01 084 F0_5C_19_C6_88_78 -90 4256
24 08/14/2021 05:11:01 099 F0_5C_19_C6_89_3C -45 4256
25 08/14/2021 05:11:01 349 F0_5C_19_C6_88_12 -60 4256
26 08/14/2021 05:11:01 349 F0_5C_19_C6_88_2E -55 4256
27 08/14/2021 05:11:01 349 F0_5C_19_C6_88_A8 -84 4256
28 08/14/2021 05:11:01 349 F0_5C_19_C6_88_90 -89 4256
29 08/14/2021 05:11:01 365 F0_5C_19_C6_88_22 -80 4256
30 08/14/2021 05:11:01 474 40_E3_D6_CA_56_5C -93 4256
31 08/14/2021 05:11:01 490 F0_5C_19_C6_88_52 -90 4256
32 08/14/2021 05:11:01 490 F0_5C_19_C6_88_BE -89 4256
33 08/14/2021 05:11:01 802 F0_5C_19_C6_88_A4 -72 4256
34 08/14/2021 05:11:01 802 40_E3_D6_CA_57_0A -90 4256
35 08/14/2021 05:11:01 802 F0_5C_19_C6_89_3C -45 4256
36 08/14/2021 05:11:01 802 F0_5C_19_C6_88_78 -89 4256
37 08/14/2021 05:11:01 802 F0_5C_19_C7_86_54 -82 4256
38 08/14/2021 05:11:02 006 F0_5C_19_C6_88_90 -89 4256
39 08/14/2021 05:11:02 006 F0_5C_19_C6_88_22 -80 4256
40 08/14/2021 05:11:02 021 F0_5C_19_C6_88_A8 -84 4256
41 08/14/2021 05:11:02 021 F0_5C_19_C6_88_2E -55 4256
42 08/14/2021 05:11:02 021 F0_5C_19_C6_88_12 -60 4256
43 08/14/2021 05:11:02 115 F0_5C_19_C6_88_52 -91 4256
44 08/14/2021 05:11:02 115 F0_5C_19_C6_88_BE -88 4256
我正在处理一个包含本地化数据的文本文件。每 5 分钟就会有多个报告,这些报告可能会导致计算区域。如果它解析区域,它会输出一个已识别的房间 ID(示例中为 4260 和 4256):
[08/14/2021 05:05:59 600] - TagId: 4194912 Identified RoomId:4260
[08/14/2021 05:05:59 616] - TagId: 4194912 Last Monitorid:4195283
[08/14/2021 05:05:59 631] - TagId: 4194912 After RoomId:2199
[08/14/2021 05:05:59 631] - Localization RoomId: 2199
[08/14/2021 05:05:59 663] - TagId: 4194912 Reporting RoomId:2199
[08/14/2021 05:05:59 663] - MacId: F0_5C_19_C6_88_A4 RSSI: -72
[08/14/2021 05:05:59 678] - MacId: F0_5C_19_C7_86_54 RSSI: -82
[08/14/2021 05:05:59 678] - MacId: F0_5C_19_C6_89_3C RSSI: -45
[08/14/2021 05:05:59 694] - MacId: F0_5C_19_C6_88_22 RSSI: -80
[08/14/2021 05:05:59 709] - MacId: F0_5C_19_C6_88_12 RSSI: -60
[08/14/2021 05:05:59 709] - MacId: F0_5C_19_C6_88_A8 RSSI: -83
[08/14/2021 05:05:59 709] - MacId: F0_5C_19_C6_88_90 RSSI: -89
[08/14/2021 05:05:59 709] - MacId: F0_5C_19_C6_88_2E RSSI: -54
[08/14/2021 05:05:59 913] - MacId: 40_E3_D6_CA_56_5C RSSI: -92
[08/14/2021 05:05:59 913] - MacId: F0_5C_19_C6_88_52 RSSI: -92
[08/14/2021 05:05:59 928] - MacId: F0_5C_19_C6_88_B8 RSSI: -80
[08/14/2021 05:06:00 288] - MacId: F0_5C_19_C6_88_A4 RSSI: -72
[08/14/2021 05:06:00 288] - MacId: F0_5C_19_C7_86_54 RSSI: -82
[08/14/2021 05:06:00 288] - MacId: 40_E3_D6_CA_57_0A RSSI: -90
[08/14/2021 05:06:00 288] - MacId: F0_5C_19_C6_89_3C RSSI: -45
[08/14/2021 05:06:00 413] - MacId: F0_5C_19_C6_88_90 RSSI: -90
[08/14/2021 05:06:00 413] - MacId: F0_5C_19_C6_88_12 RSSI: -60
[08/14/2021 05:06:00 428] - MacId: F0_5C_19_C6_88_22 RSSI: -80
[08/14/2021 05:06:00 428] - MacId: F0_5C_19_C6_88_A8 RSSI: -83
[08/14/2021 05:06:00 428] - MacId: F0_5C_19_C6_88_2E RSSI: -55
[08/14/2021 05:11:00 974] - MacId: F0_5C_19_C6_88_A4 RSSI: -72
[08/14/2021 05:11:01 006] - TagId: 4194912 Identified RoomId:4256
[08/14/2021 05:11:01 021] - TagId: 4194912 Last Monitorid:4195283
[08/14/2021 05:11:01 037] - TagId: 4194912 After RoomId:2199
[08/14/2021 05:11:01 052] - Localization RoomId: 2199
[08/14/2021 05:11:01 084] - TagId: 4194912 Reporting RoomId:2199
[08/14/2021 05:11:01 084] - MacId: F0_5C_19_C7_86_54 RSSI: -83
[08/14/2021 05:11:01 084] - MacId: F0_5C_19_C6_88_78 RSSI: -90
[08/14/2021 05:11:01 099] - MacId: F0_5C_19_C6_89_3C RSSI: -45
[08/14/2021 05:11:01 349] - MacId: F0_5C_19_C6_88_12 RSSI: -60
[08/14/2021 05:11:01 349] - MacId: F0_5C_19_C6_88_2E RSSI: -55
[08/14/2021 05:11:01 349] - MacId: F0_5C_19_C6_88_A8 RSSI: -84
[08/14/2021 05:11:01 349] - MacId: F0_5C_19_C6_88_90 RSSI: -89
[08/14/2021 05:11:01 365] - MacId: F0_5C_19_C6_88_22 RSSI: -80
[08/14/2021 05:11:01 474] - MacId: 40_E3_D6_CA_56_5C RSSI: -93
[08/14/2021 05:11:01 490] - MacId: F0_5C_19_C6_88_52 RSSI: -90
[08/14/2021 05:11:01 490] - MacId: F0_5C_19_C6_88_BE RSSI: -89
[08/14/2021 05:11:01 802] - MacId: F0_5C_19_C6_88_A4 RSSI: -72
[08/14/2021 05:11:01 802] - MacId: 40_E3_D6_CA_57_0A RSSI: -90
[08/14/2021 05:11:01 802] - MacId: F0_5C_19_C6_89_3C RSSI: -45
[08/14/2021 05:11:01 802] - MacId: F0_5C_19_C6_88_78 RSSI: -89
[08/14/2021 05:11:01 802] - MacId: F0_5C_19_C7_86_54 RSSI: -82
[08/14/2021 05:11:02 006] - MacId: F0_5C_19_C6_88_90 RSSI: -89
[08/14/2021 05:11:02 006] - MacId: F0_5C_19_C6_88_22 RSSI: -80
[08/14/2021 05:11:02 021] - MacId: F0_5C_19_C6_88_A8 RSSI: -84
[08/14/2021 05:11:02 021] - MacId: F0_5C_19_C6_88_2E RSSI: -55
[08/14/2021 05:11:02 021] - MacId: F0_5C_19_C6_88_12 RSSI: -60
[08/14/2021 05:11:02 115] - MacId: F0_5C_19_C6_88_52 RSSI: -91
[08/14/2021 05:11:02 115] - MacId: F0_5C_19_C6_88_BE RSSI: -88
我希望得到以下形式的数据:
如果 RoomId 未在 5 分钟时间范围内解析(来自原始文本文件),则 RoomId 列可以只是 NA。
一位非常有帮助的成员已经展示了如何以正确的方式拆分列 (
所以主要问题是:我怎样才能以类似于图像的方式构建这个原始文本文件,它是一个可行的数据框,尽管原始文本文件中的行并非都是相似的?
假设数据存储在名为 'temp.txt'
的文本文件中,您可以使用 readLines
读取它。仅保留具有 MacId
和 RSSI
值的行并获得 RoomId
并保留 'Identified RoomId'
行。将数据拆分成集合,并使用之前 post 中的代码从每个集合中提取 Datetime
、MacId
和 RSSI
,并通过删除所有内容提取房间 ID,直到 RoomId
.您可以将输出合并到一个数据帧中。
data <- readLines('temp.txt')
req_data <- grep('MacId.*RSSI|Identified RoomId', data, value = TRUE)
result <- do.call(rbind, by(req_data, cumsum(grepl('Identified', req_data)),
function(x) {
room_id <- sub('.*RoomId:\s*', '', x[1])
cbind(strcapture('\[(.*)\] - MacId: (.*) RSSI: (.*)', x[-1],
proto = list(Datetime = character(), MacId = character(),
RSSI = numeric())), RoomId = room_id)
}))
rownames(result) <- NULL
对于共享的文本数据,我得到的输出为 -
result
Datetime MacId RSSI RoomId
1 08/14/2021 05:05:59 663 F0_5C_19_C6_88_A4 -72 4260
2 08/14/2021 05:05:59 678 F0_5C_19_C7_86_54 -82 4260
3 08/14/2021 05:05:59 678 F0_5C_19_C6_89_3C -45 4260
4 08/14/2021 05:05:59 694 F0_5C_19_C6_88_22 -80 4260
5 08/14/2021 05:05:59 709 F0_5C_19_C6_88_12 -60 4260
6 08/14/2021 05:05:59 709 F0_5C_19_C6_88_A8 -83 4260
7 08/14/2021 05:05:59 709 F0_5C_19_C6_88_90 -89 4260
8 08/14/2021 05:05:59 709 F0_5C_19_C6_88_2E -54 4260
9 08/14/2021 05:05:59 913 40_E3_D6_CA_56_5C -92 4260
10 08/14/2021 05:05:59 913 F0_5C_19_C6_88_52 -92 4260
11 08/14/2021 05:05:59 928 F0_5C_19_C6_88_B8 -80 4260
12 08/14/2021 05:06:00 288 F0_5C_19_C6_88_A4 -72 4260
13 08/14/2021 05:06:00 288 F0_5C_19_C7_86_54 -82 4260
14 08/14/2021 05:06:00 288 40_E3_D6_CA_57_0A -90 4260
15 08/14/2021 05:06:00 288 F0_5C_19_C6_89_3C -45 4260
16 08/14/2021 05:06:00 413 F0_5C_19_C6_88_90 -90 4260
17 08/14/2021 05:06:00 413 F0_5C_19_C6_88_12 -60 4260
18 08/14/2021 05:06:00 428 F0_5C_19_C6_88_22 -80 4260
19 08/14/2021 05:06:00 428 F0_5C_19_C6_88_A8 -83 4260
20 08/14/2021 05:06:00 428 F0_5C_19_C6_88_2E -55 4260
21 08/14/2021 05:11:00 974 F0_5C_19_C6_88_A4 -72 4260
22 08/14/2021 05:11:01 084 F0_5C_19_C7_86_54 -83 4256
23 08/14/2021 05:11:01 084 F0_5C_19_C6_88_78 -90 4256
24 08/14/2021 05:11:01 099 F0_5C_19_C6_89_3C -45 4256
25 08/14/2021 05:11:01 349 F0_5C_19_C6_88_12 -60 4256
26 08/14/2021 05:11:01 349 F0_5C_19_C6_88_2E -55 4256
27 08/14/2021 05:11:01 349 F0_5C_19_C6_88_A8 -84 4256
28 08/14/2021 05:11:01 349 F0_5C_19_C6_88_90 -89 4256
29 08/14/2021 05:11:01 365 F0_5C_19_C6_88_22 -80 4256
30 08/14/2021 05:11:01 474 40_E3_D6_CA_56_5C -93 4256
31 08/14/2021 05:11:01 490 F0_5C_19_C6_88_52 -90 4256
32 08/14/2021 05:11:01 490 F0_5C_19_C6_88_BE -89 4256
33 08/14/2021 05:11:01 802 F0_5C_19_C6_88_A4 -72 4256
34 08/14/2021 05:11:01 802 40_E3_D6_CA_57_0A -90 4256
35 08/14/2021 05:11:01 802 F0_5C_19_C6_89_3C -45 4256
36 08/14/2021 05:11:01 802 F0_5C_19_C6_88_78 -89 4256
37 08/14/2021 05:11:01 802 F0_5C_19_C7_86_54 -82 4256
38 08/14/2021 05:11:02 006 F0_5C_19_C6_88_90 -89 4256
39 08/14/2021 05:11:02 006 F0_5C_19_C6_88_22 -80 4256
40 08/14/2021 05:11:02 021 F0_5C_19_C6_88_A8 -84 4256
41 08/14/2021 05:11:02 021 F0_5C_19_C6_88_2E -55 4256
42 08/14/2021 05:11:02 021 F0_5C_19_C6_88_12 -60 4256
43 08/14/2021 05:11:02 115 F0_5C_19_C6_88_52 -91 4256
44 08/14/2021 05:11:02 115 F0_5C_19_C6_88_BE -88 4256