读取非统一行 ascii 数据 - Python
Reading non uniform lines ascii data - Python
我试图读取非统一行的 ascii 数据,例如
4 0.0790926412 -0.199457773 0.325952223 0.924105917 48915.3072 -2086.17061
73540.4807 10
4 0.0245689377 -0.805261448 -0.152373497 0.573006386 -39801.696 49084.2418
16665.3857 10
4 0.0427767979 -0.0185129676 -0.143135691 -0.989529911 38770.6518
-70784.7024 32640.6307 10
4 0.0262684678 0.137741 -0.820259709 -0.555158921 25293.3918 -51148.4003
-126522.859 10
4 0.145932295 0.466618154 -0.00805648931 -0.88442218 90951.8483 19221.4234
-40205.3438 10
4 0.0907820906 0.584060054 -0.671576188 0.455915866 -78193.2124 -31269.5848
47260.338 10
4 0.0794897928 0.654042761 0.537625452 0.532153117 24643.9195 39614.3788
97184.4856 10
4 0.0896920622 -0.517384933 -0.609729743 -0.600451889 -17455.9074 -17601.0439
-13991.5163 10
4 0.0295554749 -0.53757783 -0.3710939 0.757165368 20106.124 -171013.738
-14052.1145 10
4 0.0189505245 -0.773354757 -0.0747623556 -0.629549847 -71468.2726
-53145.1259 36948.4058 10
问题是我需要将两行读成一行。我正在尝试使用 pandas.read_csv
或 numpy.genfromtxt
,但它们读取并分隔成独立的行。我尝试每 2 行合并一次,但没有成功,因为,如您所见,有时我将一行分隔为 7 列和 2 列,有时分为 6 列和 3 列。共有 9 列可供阅读。
像这样的东西应该有用。
将您的数据放入字符串或文档中,然后使用 python 对其进行操作。然后,当您拥有想要的数据时,您可以使用 pandas。
string1 = '''4 0.0790926412 -0.199457773 0.325952223 0.924105917 48915.3072 -2086.17061
73540.4807 10
4 0.0245689377 -0.805261448 -0.152373497 0.573006386 -39801.696 49084.2418
16665.3857 10
4 0.0427767979 -0.0185129676 -0.143135691 -0.989529911 38770.6518
-70784.7024 32640.6307 10
4 0.0262684678 0.137741 -0.820259709 -0.555158921 25293.3918 -51148.4003
-126522.859 10
4 0.145932295 0.466618154 -0.00805648931 -0.88442218 90951.8483 19221.4234
-40205.3438 10
4 0.0907820906 0.584060054 -0.671576188 0.455915866 -78193.2124 -31269.5848
47260.338 10
4 0.0794897928 0.654042761 0.537625452 0.532153117 24643.9195 39614.3788
97184.4856 10
4 0.0896920622 -0.517384933 -0.609729743 -0.600451889 -17455.9074 -17601.0439
-13991.5163 10
4 0.0295554749 -0.53757783 -0.3710939 0.757165368 20106.124 -171013.738
-14052.1145 10
4 0.0189505245 -0.773354757 -0.0747623556 -0.629549847 -71468.2726
-53145.1259 36948.4058 10'''
splitted = string1.splitlines()
result = ""
for index,item in enumerate(splitted):
if index % 2 != 0:
result += item+ "\n"
else:
result += item
print(result)
4 0.0790926412 -0.199457773 0.325952223 0.924105917 48915.3072 -2086.17061 73540.4807 10
4 0.0245689377 -0.805261448 -0.152373497 0.573006386 -39801.696 49084.2418 16665.3857 10
4 0.0427767979 -0.0185129676 -0.143135691 -0.989529911 38770.6518 -70784.7024 32640.6307 10
4 0.0262684678 0.137741 -0.820259709 -0.555158921 25293.3918 -51148.4003 -126522.859 10
4 0.145932295 0.466618154 -0.00805648931 -0.88442218 90951.8483 19221.4234 -40205.3438 10
4 0.0907820906 0.584060054 -0.671576188 0.455915866 -78193.2124 -31269.5848 47260.338 10
4 0.0794897928 0.654042761 0.537625452 0.532153117 24643.9195 39614.3788 97184.4856 10
4 0.0896920622 -0.517384933 -0.609729743 -0.600451889 -17455.9074 -17601.0439 -13991.5163 10
4 0.0295554749 -0.53757783 -0.3710939 0.757165368 20106.124 -171013.738 -14052.1145 10
或者如果您从文件中读取它:
data = open('/path/original.txt', 'r')
string1 = data.read()
splitted = string1.splitlines()
result = ""
for index,item in enumerate(splitted):
if index % 2 != 0:
result += item+ "\n"
else:
result += item
new_data = open('/path/new_data.txt','w')
new_data.write(result)
如果是我,我愿意这样做:
import re
with open('data.txt') as f:
s = f.read().strip()
L = [float(i) for i in re.split(r'\s+', s)]
LL = [L[i:i+9] for i in range(0, len(L), 9)]
print(LL)
[[4.0, 0.0790926412, -0.199457773, 0.325952223, 0.924105917, 48915.3072, -2086.17061, 73540.4807, 10.0], [4.0, 0.0245689377, -0.805261448, -0.152373497, 0.573006386, -39801.696, 49084.2418, 16665.3857, 10.0] , [4.0, 0.0427767979, -0.0185129676, -0.143135691, -0.989529911, 38770.6518, -70784.7024, 32640.6307, 10.0], [4.0, 0.0262684678, 0.137741, -0.820259709, -0.555158921, 25293.3918, -51148.4003, -126522.859, 10.0], [ 4.0, 0.145932295, 0.466618154, -0.00805648931, -0.88442218, 90951.8483, 19221.4234, -40205.3438, 10.0], [4.0, 0.0907820906, 0.584060054, -0.671576188, 0.455915866, -78193.2124, -31269.5848, 47260.338, 10.0], [4.0, 0.0794897928, 0.654042761, 0.537625452, 0.532153117, 24643.9195, 39614.3788, 97184.4856, 10.0], [4.0, 0.0896920622, -0.517384933, -0.609729743, -0.600451889, -17455.9074, -17601.0439, -13991.5163, 10.0], [4.0, 0.0295554749, -0.53757783, - 0.3710939, 0.757165368, 20106.124, -171013.738, -14052.1145, 10.0], [4.0, 0.0189505245, -0.773354757, -0.0747623556, -0.62 9549847, -71468.2726, -53145.1259, 36948.4058, 10.0]]
或者像这样,因为你知道每个案例有两行。
通过循环每次读取两行输入。当第一行为空时,这意味着输入文件中没有更多行可用。每次读取一对行时将它们连接起来,首先丢弃从第一行结束的行。
Pandas 可以读取使用白色 space 代替逗号的 'csv' 文件。
>>> import pandas as pd
>>> with open('temp.txt') as input, open('temp.csv', 'w') as the_csv:
... while True:
... first = input.readline()
... if not first:
... break
... second = input.readline()
... r = the_csv.write(first.strip()+second)
...
>>> df = pd.read_csv('temp.csv', sep='\s+')
>>> df
4 0.0790926412 -0.199457773 0.325952223 0.924105917 48915.3072 \
0 4 0.024569 -0.805261 -0.152373 0.573006 -39801.6960
1 4 0.042777 -0.018513 -0.143136 -0.989530 38770.6518
2 4 0.026268 0.137741 -0.820260 -0.555159 25293.3918
3 4 0.145932 0.466618 -0.008056 -0.884422 90951.8483
4 4 0.090782 0.584060 -0.671576 0.455916 -78193.2124
5 4 0.079490 0.654043 0.537625 0.532153 24643.9195
6 4 0.089692 -0.517385 -0.609730 -0.600452 -17455.9074
7 4 0.029555 -0.537578 -0.371094 0.757165 20106.1240
8 4 0.018951 -0.773355 -0.074762 -0.629550 -71468.2726
-2086.17061 73540.4807 10
0 49084.2418 16665.3857 10
1 -70784.7024 32640.6307 10
2 -51148.4003 -126522.8590 10
3 19221.4234 -40205.3438 10
4 -31269.5848 47260.3380 10
5 39614.3788 97184.4856 10
6 -17601.0439 -13991.5163 10
7 -171013.7380 -14052.1145 10
8 -53145.1259 36948.4058 10
我试图读取非统一行的 ascii 数据,例如
4 0.0790926412 -0.199457773 0.325952223 0.924105917 48915.3072 -2086.17061
73540.4807 10
4 0.0245689377 -0.805261448 -0.152373497 0.573006386 -39801.696 49084.2418
16665.3857 10
4 0.0427767979 -0.0185129676 -0.143135691 -0.989529911 38770.6518
-70784.7024 32640.6307 10
4 0.0262684678 0.137741 -0.820259709 -0.555158921 25293.3918 -51148.4003
-126522.859 10
4 0.145932295 0.466618154 -0.00805648931 -0.88442218 90951.8483 19221.4234
-40205.3438 10
4 0.0907820906 0.584060054 -0.671576188 0.455915866 -78193.2124 -31269.5848
47260.338 10
4 0.0794897928 0.654042761 0.537625452 0.532153117 24643.9195 39614.3788
97184.4856 10
4 0.0896920622 -0.517384933 -0.609729743 -0.600451889 -17455.9074 -17601.0439
-13991.5163 10
4 0.0295554749 -0.53757783 -0.3710939 0.757165368 20106.124 -171013.738
-14052.1145 10
4 0.0189505245 -0.773354757 -0.0747623556 -0.629549847 -71468.2726
-53145.1259 36948.4058 10
问题是我需要将两行读成一行。我正在尝试使用 pandas.read_csv
或 numpy.genfromtxt
,但它们读取并分隔成独立的行。我尝试每 2 行合并一次,但没有成功,因为,如您所见,有时我将一行分隔为 7 列和 2 列,有时分为 6 列和 3 列。共有 9 列可供阅读。
像这样的东西应该有用。
将您的数据放入字符串或文档中,然后使用 python 对其进行操作。然后,当您拥有想要的数据时,您可以使用 pandas。
string1 = '''4 0.0790926412 -0.199457773 0.325952223 0.924105917 48915.3072 -2086.17061
73540.4807 10
4 0.0245689377 -0.805261448 -0.152373497 0.573006386 -39801.696 49084.2418
16665.3857 10
4 0.0427767979 -0.0185129676 -0.143135691 -0.989529911 38770.6518
-70784.7024 32640.6307 10
4 0.0262684678 0.137741 -0.820259709 -0.555158921 25293.3918 -51148.4003
-126522.859 10
4 0.145932295 0.466618154 -0.00805648931 -0.88442218 90951.8483 19221.4234
-40205.3438 10
4 0.0907820906 0.584060054 -0.671576188 0.455915866 -78193.2124 -31269.5848
47260.338 10
4 0.0794897928 0.654042761 0.537625452 0.532153117 24643.9195 39614.3788
97184.4856 10
4 0.0896920622 -0.517384933 -0.609729743 -0.600451889 -17455.9074 -17601.0439
-13991.5163 10
4 0.0295554749 -0.53757783 -0.3710939 0.757165368 20106.124 -171013.738
-14052.1145 10
4 0.0189505245 -0.773354757 -0.0747623556 -0.629549847 -71468.2726
-53145.1259 36948.4058 10'''
splitted = string1.splitlines()
result = ""
for index,item in enumerate(splitted):
if index % 2 != 0:
result += item+ "\n"
else:
result += item
print(result)
4 0.0790926412 -0.199457773 0.325952223 0.924105917 48915.3072 -2086.17061 73540.4807 10
4 0.0245689377 -0.805261448 -0.152373497 0.573006386 -39801.696 49084.2418 16665.3857 10
4 0.0427767979 -0.0185129676 -0.143135691 -0.989529911 38770.6518 -70784.7024 32640.6307 10
4 0.0262684678 0.137741 -0.820259709 -0.555158921 25293.3918 -51148.4003 -126522.859 10
4 0.145932295 0.466618154 -0.00805648931 -0.88442218 90951.8483 19221.4234 -40205.3438 10
4 0.0907820906 0.584060054 -0.671576188 0.455915866 -78193.2124 -31269.5848 47260.338 10
4 0.0794897928 0.654042761 0.537625452 0.532153117 24643.9195 39614.3788 97184.4856 10
4 0.0896920622 -0.517384933 -0.609729743 -0.600451889 -17455.9074 -17601.0439 -13991.5163 10
4 0.0295554749 -0.53757783 -0.3710939 0.757165368 20106.124 -171013.738 -14052.1145 10
或者如果您从文件中读取它:
data = open('/path/original.txt', 'r')
string1 = data.read()
splitted = string1.splitlines()
result = ""
for index,item in enumerate(splitted):
if index % 2 != 0:
result += item+ "\n"
else:
result += item
new_data = open('/path/new_data.txt','w')
new_data.write(result)
如果是我,我愿意这样做:
import re
with open('data.txt') as f:
s = f.read().strip()
L = [float(i) for i in re.split(r'\s+', s)]
LL = [L[i:i+9] for i in range(0, len(L), 9)]
print(LL)
[[4.0, 0.0790926412, -0.199457773, 0.325952223, 0.924105917, 48915.3072, -2086.17061, 73540.4807, 10.0], [4.0, 0.0245689377, -0.805261448, -0.152373497, 0.573006386, -39801.696, 49084.2418, 16665.3857, 10.0] , [4.0, 0.0427767979, -0.0185129676, -0.143135691, -0.989529911, 38770.6518, -70784.7024, 32640.6307, 10.0], [4.0, 0.0262684678, 0.137741, -0.820259709, -0.555158921, 25293.3918, -51148.4003, -126522.859, 10.0], [ 4.0, 0.145932295, 0.466618154, -0.00805648931, -0.88442218, 90951.8483, 19221.4234, -40205.3438, 10.0], [4.0, 0.0907820906, 0.584060054, -0.671576188, 0.455915866, -78193.2124, -31269.5848, 47260.338, 10.0], [4.0, 0.0794897928, 0.654042761, 0.537625452, 0.532153117, 24643.9195, 39614.3788, 97184.4856, 10.0], [4.0, 0.0896920622, -0.517384933, -0.609729743, -0.600451889, -17455.9074, -17601.0439, -13991.5163, 10.0], [4.0, 0.0295554749, -0.53757783, - 0.3710939, 0.757165368, 20106.124, -171013.738, -14052.1145, 10.0], [4.0, 0.0189505245, -0.773354757, -0.0747623556, -0.62 9549847, -71468.2726, -53145.1259, 36948.4058, 10.0]]
或者像这样,因为你知道每个案例有两行。
通过循环每次读取两行输入。当第一行为空时,这意味着输入文件中没有更多行可用。每次读取一对行时将它们连接起来,首先丢弃从第一行结束的行。
Pandas 可以读取使用白色 space 代替逗号的 'csv' 文件。
>>> import pandas as pd
>>> with open('temp.txt') as input, open('temp.csv', 'w') as the_csv:
... while True:
... first = input.readline()
... if not first:
... break
... second = input.readline()
... r = the_csv.write(first.strip()+second)
...
>>> df = pd.read_csv('temp.csv', sep='\s+')
>>> df
4 0.0790926412 -0.199457773 0.325952223 0.924105917 48915.3072 \
0 4 0.024569 -0.805261 -0.152373 0.573006 -39801.6960
1 4 0.042777 -0.018513 -0.143136 -0.989530 38770.6518
2 4 0.026268 0.137741 -0.820260 -0.555159 25293.3918
3 4 0.145932 0.466618 -0.008056 -0.884422 90951.8483
4 4 0.090782 0.584060 -0.671576 0.455916 -78193.2124
5 4 0.079490 0.654043 0.537625 0.532153 24643.9195
6 4 0.089692 -0.517385 -0.609730 -0.600452 -17455.9074
7 4 0.029555 -0.537578 -0.371094 0.757165 20106.1240
8 4 0.018951 -0.773355 -0.074762 -0.629550 -71468.2726
-2086.17061 73540.4807 10
0 49084.2418 16665.3857 10
1 -70784.7024 32640.6307 10
2 -51148.4003 -126522.8590 10
3 19221.4234 -40205.3438 10
4 -31269.5848 47260.3380 10
5 39614.3788 97184.4856 10
6 -17601.0439 -13991.5163 10
7 -171013.7380 -14052.1145 10
8 -53145.1259 36948.4058 10