如何读取 Pandas 中未正确分隔的 .txt
How to read a .txt in Pandas that isn't properly delimited
我有一个 .txt 文件,它与 .csv 非常相似,但又不完全相同。如您所见,前 4 列可以用 space 分隔,但最后一个字符串将分成不同数量的列。我需要最后一个字符串只是一列。
09 4 10/11/2021 22:21:17 The PLC reported that sorter SS02 has E-stopped.
08 4 10/11/2021 22:21:17 The PLC reported that sorter SS02 has stopped.
08 4 10/11/2021 22:21:18 The PLC reported that sorter SS01 has stopped.
20 5 10/11/2021 22:21:18 The PLC reported that purge mode was disabled for sorter SS02.
20 5 10/11/2021 22:21:18 The PLC reported that purge mode was disabled for sorter SS01.
23 5 10/11/2021 22:21:19 AUX Sortation has been enabled for sorter SS02.
23 5 10/11/2021 22:21:20 AUX Sortation has been enabled for sorter SS01.
我怎样才能读到这个,所以我只有 5 个一致的列?稍后我可能会将日期和时间合并为一栏。
您可以预先解析每一行,然后创建 DataFrame,例如:
import pandas as pd
with open('input.txt') as f_input:
data = [line.strip().split(' ', 4) for line in f_input]
df = pd.DataFrame(data, columns=['c1', 'c2', 'date', 'time', 'desc'])
print(df)
给你:
c1 c2 date time desc
0 09 4 10/11/2021 22:21:17 The PLC reported that sorter SS02 has E-stopped.
1 08 4 10/11/2021 22:21:17 The PLC reported that sorter SS02 has stopped.
2 08 4 10/11/2021 22:21:18 The PLC reported that sorter SS01 has stopped.
3 20 5 10/11/2021 22:21:18 The PLC reported that purge mode was disabled for sorter SS02.
4 20 5 10/11/2021 22:21:18 The PLC reported that purge mode was disabled for sorter SS01.
5 23 5 10/11/2021 22:21:19 AUX Sortation has been enabled for sorter SS02.
6 23 5 10/11/2021 22:21:20 AUX Sortation has been enabled for sorter SS01.
可以通过组合 date
和 time
列并将它们转换为日期时间来添加日期时间列:
df['datetime'] = pd.to_datetime(df['date'] + ' ' + df['time'])
我有一个 .txt 文件,它与 .csv 非常相似,但又不完全相同。如您所见,前 4 列可以用 space 分隔,但最后一个字符串将分成不同数量的列。我需要最后一个字符串只是一列。
09 4 10/11/2021 22:21:17 The PLC reported that sorter SS02 has E-stopped.
08 4 10/11/2021 22:21:17 The PLC reported that sorter SS02 has stopped.
08 4 10/11/2021 22:21:18 The PLC reported that sorter SS01 has stopped.
20 5 10/11/2021 22:21:18 The PLC reported that purge mode was disabled for sorter SS02.
20 5 10/11/2021 22:21:18 The PLC reported that purge mode was disabled for sorter SS01.
23 5 10/11/2021 22:21:19 AUX Sortation has been enabled for sorter SS02.
23 5 10/11/2021 22:21:20 AUX Sortation has been enabled for sorter SS01.
我怎样才能读到这个,所以我只有 5 个一致的列?稍后我可能会将日期和时间合并为一栏。
您可以预先解析每一行,然后创建 DataFrame,例如:
import pandas as pd
with open('input.txt') as f_input:
data = [line.strip().split(' ', 4) for line in f_input]
df = pd.DataFrame(data, columns=['c1', 'c2', 'date', 'time', 'desc'])
print(df)
给你:
c1 c2 date time desc
0 09 4 10/11/2021 22:21:17 The PLC reported that sorter SS02 has E-stopped.
1 08 4 10/11/2021 22:21:17 The PLC reported that sorter SS02 has stopped.
2 08 4 10/11/2021 22:21:18 The PLC reported that sorter SS01 has stopped.
3 20 5 10/11/2021 22:21:18 The PLC reported that purge mode was disabled for sorter SS02.
4 20 5 10/11/2021 22:21:18 The PLC reported that purge mode was disabled for sorter SS01.
5 23 5 10/11/2021 22:21:19 AUX Sortation has been enabled for sorter SS02.
6 23 5 10/11/2021 22:21:20 AUX Sortation has been enabled for sorter SS01.
可以通过组合 date
和 time
列并将它们转换为日期时间来添加日期时间列:
df['datetime'] = pd.to_datetime(df['date'] + ' ' + df['time'])