如何使用 line.split() 将文本文件拆分为不同的列
How can I split a text file into different columns using line.split()
我希望能够将我的文本文件拆分成不同的列。
我的文本文件中的数据如下所示:
023004 1997/11/14 15:00 2.971
023004 1997/11/14 18:00 3.175
023004 1997/11/14 21:00 3.300
023004 1997/11/15 00:00 AR
023004 1997/11/15 03:00 AR
除了当我尝试拆分列时,我得到这个:
['023002', '2008/11/20', '23:15', '1.076']
['023002', '2008/11/20', '23:30', '1.083']
['023002', '2008/11/20', '23:45', '1.089']
['023002', '2008/11/21', '00:00', 'AR']
['023002', '2008/11/21', '00:15', 'AR']
['023002', '2008/11/21', '00:30', 'AR']
AR和我的数据在同一列。我不知道如何指定如果有'AR',它就是一个新列。我不想用熊猫。我需要它能够将我的字符串转换为浮点数。
好吧,你似乎在尝试使用“”分隔符,但这不起作用,因为在你的数据中,有时列中没有信息,并且它猜测你想要在你的第 4 列中使用 AR列而不是你的第 5 列。
我认为最好的方法是按原样生成行并将它们放入列表中。然后,如果它匹配,我们就可以抛出空的 space.
data = [['023002', '2008/11/20', '23:15', '1.076'],
['023002', '2008/11/20', '23:30', '1.083'],
['023002', '2008/11/20', '23:45', '1.089'],
['023002', '2008/11/21', '00:00', 'AR'],
['023002', '2008/11/21', '00:15', 'AR'],
['023002', '2008/11/21', '00:30', 'AR']]
for row in data:
if row[3] == "AR":
row.insert(3, "")
for row in data:
print(row)
>>
['023002', '2008/11/20', '23:15', '1.076']
['023002', '2008/11/20', '23:30', '1.083']
['023002', '2008/11/20', '23:45', '1.089']
['023002', '2008/11/21', '00:00', '', 'AR']
['023002', '2008/11/21', '00:15', '', 'AR']
['023002', '2008/11/21', '00:30', '', 'AR']
您也可以使用正则表达式执行此操作:
import re
data = []
# this regular expression captures each column as a separate
# group
cols = re.compile("(\d+)\s{,9}(\S+)\s(\S+)\s{,4}(\d+\.\d+)*\s+(AR)*")
with open(yourfile) as fh:
for line in fh:
col = cols.match(line.strip('\n'))
# if there's no match, skip the line
if not col:
continue
data.append([x if x is not None else '' for x in col.groups()])
[['023004', '1997/11/14', '15:00', '2.971', ''],
['023004', '1997/11/14', '18:00', '3.175', ''],
['023004', '1997/11/14', '21:00', '3.300', ''],
['023004', '1997/11/15', '00:00', '', 'AR'],
['023004', '1997/11/15', '03:00', '', 'AR']]
我希望能够将我的文本文件拆分成不同的列。
我的文本文件中的数据如下所示:
023004 1997/11/14 15:00 2.971
023004 1997/11/14 18:00 3.175
023004 1997/11/14 21:00 3.300
023004 1997/11/15 00:00 AR
023004 1997/11/15 03:00 AR
除了当我尝试拆分列时,我得到这个:
['023002', '2008/11/20', '23:15', '1.076']
['023002', '2008/11/20', '23:30', '1.083']
['023002', '2008/11/20', '23:45', '1.089']
['023002', '2008/11/21', '00:00', 'AR']
['023002', '2008/11/21', '00:15', 'AR']
['023002', '2008/11/21', '00:30', 'AR']
AR和我的数据在同一列。我不知道如何指定如果有'AR',它就是一个新列。我不想用熊猫。我需要它能够将我的字符串转换为浮点数。
好吧,你似乎在尝试使用“”分隔符,但这不起作用,因为在你的数据中,有时列中没有信息,并且它猜测你想要在你的第 4 列中使用 AR列而不是你的第 5 列。
我认为最好的方法是按原样生成行并将它们放入列表中。然后,如果它匹配,我们就可以抛出空的 space.
data = [['023002', '2008/11/20', '23:15', '1.076'],
['023002', '2008/11/20', '23:30', '1.083'],
['023002', '2008/11/20', '23:45', '1.089'],
['023002', '2008/11/21', '00:00', 'AR'],
['023002', '2008/11/21', '00:15', 'AR'],
['023002', '2008/11/21', '00:30', 'AR']]
for row in data:
if row[3] == "AR":
row.insert(3, "")
for row in data:
print(row)
>>
['023002', '2008/11/20', '23:15', '1.076']
['023002', '2008/11/20', '23:30', '1.083']
['023002', '2008/11/20', '23:45', '1.089']
['023002', '2008/11/21', '00:00', '', 'AR']
['023002', '2008/11/21', '00:15', '', 'AR']
['023002', '2008/11/21', '00:30', '', 'AR']
您也可以使用正则表达式执行此操作:
import re
data = []
# this regular expression captures each column as a separate
# group
cols = re.compile("(\d+)\s{,9}(\S+)\s(\S+)\s{,4}(\d+\.\d+)*\s+(AR)*")
with open(yourfile) as fh:
for line in fh:
col = cols.match(line.strip('\n'))
# if there's no match, skip the line
if not col:
continue
data.append([x if x is not None else '' for x in col.groups()])
[['023004', '1997/11/14', '15:00', '2.971', ''],
['023004', '1997/11/14', '18:00', '3.175', ''],
['023004', '1997/11/14', '21:00', '3.300', ''],
['023004', '1997/11/15', '00:00', '', 'AR'],
['023004', '1997/11/15', '03:00', '', 'AR']]