比较两个文本文件中的每一行
compare each row in two text file
希望我能对这个问题有所帮助:
我有两个文本文件,由大约 10.000 行组成(比方说文件 1 和文件 2)来自 FEM 分析。文件的结构是:
文件 1
....
Element Facet Node CNORMF.Magnitude CNORMF.CNF1 CNORMF.CNF2 CNORMF.CNF3 CPRESS CSHEAR1 CSHEAR2 CSHEARF.Magnitude CSHEARF.CSF1 CSHEARF.CSF2 CSHEARF.CSF3
881 3 6619 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
881 3 6648 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
881 3 6653 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
930 3 6452 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
930 3 6483 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
930 3 6488 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
1244 2 7722 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
1244 2 7724 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
1244 2 7754 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
2380 2 3757 304.326E-06 -123.097E-06 -203.689E-06 -189.663E-06 564.697E-06 -281.448E-06 22.5357E-06 152.710E-06 144.843E-06 -26.7177E-06 -40.3387E-06
2380 2 3826 226.603E-06 -85.9859E-06 -161.270E-06 -133.967E-06 270.594E-06 -134.865E-06 10.7988E-06 117.700E-06 116.217E-06 -4.67318E-06 -18.0298E-06
2380 2 3848 10.4740E-03 -2.01174E-03 -6.63900E-03 -7.84743E-03 771.739E-06 -384.638E-06 30.7983E-06 5.24148E-03 5.12795E-03 -541.446E-06 -940.251E-06
2894 2 8253 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
2894 2 8255 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
2894 2 8270 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
3372 2 5920 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
3372 2 5961 52.7705E-03 12.2948E-03 -40.8019E-03 -31.1251E-03 7.36309E-03 -2.56505E-03 -502.055E-06 18.8167E-03 17.9038E-03 2.12060E-03 5.38774E-03
3372 2 5996 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
3936 3 6782 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
3936 3 6852 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
3936 3 6857 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
3937 4 6410 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
3937 4 6452 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
3937 4 6488 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
3955 2 6940 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
3955 2 6941 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
3955 2 6993 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
4024 2 8027 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
4024 2 8050 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
....
文件 2
....
Node COORD.Magnitude COORD.COOR1 COORD.COOR2 COORD.COOR3 U.Magnitude U.U1 U.U2 U.U3
1 131.691 14.5010 -92.2190 -92.8868 1.93638 188.252E-03 -1.64949 -996.662E-03
2 131.336 10.9038 -92.2281 -92.8663 1.93341 188.250E-03 -1.64672 -995.468E-03
3 132.130 18.7534 -92.4681 -92.5002 1.93968 188.190E-03 -1.65258 -997.959E-03
4 130.769 1.97638 -92.5186 -92.3953 1.92580 188.179E-03 -1.63965 -992.387E-03
5 130.560 -4.04517 -93.1433 -91.3993 1.92030 188.026E-03 -1.63459 -990.122E-03
6 132.422 24.0768 -93.9662 -90.1454 1.94282 187.819E-03 -1.65564 -999.062E-03
7 130.377 -8.39503 -94.1640 -89.7827 1.91586 187.774E-03 -1.63054 -988.235E-03
8 126.321 13.6556 -88.0641 -89.5278 1.93579 192.554E-03 -1.64736 -998.202E-03
9 125.963 4.31065 -88.6558 -89.3771 1.92786 192.145E-03 -1.64012 -994.852E-03
10 130.037 3.02359 -94.4877 -89.2894 1.92501 187.692E-03 -1.63909 -991.871E-03
11 126.692 18.5888 -88.1164 -89.1107 1.93970 192.653E-03 -1.65097 -999.810E-03
12 125.751 -1.96189 -89.1238 -88.6928 1.92231 192.010E-03 -1.63500 -992.572E-03
13 125.719 -3.46723 -89.2798 -88.4437 1.92094 191.971E-03 -1.63373 -992.005E-03
14 130.026 7.42596 -95.0372 -88.4289 1.92818 187.556E-03 -1.64210 -993.086E-03
15 130.736 16.3557 -95.3755 -87.9092 1.93527 187.472E-03 -1.64873 -995.891E-03
16 130.251 -12.8122 -95.5572 -87.5783 1.91105 187.430E-03 -1.62618 -986.163E-03
17 130.250 12.8770 -95.6602 -87.4548 1.93216 187.401E-03 -1.64586 -994.616E-03
18 125.609 -7.73838 -90.1949 -87.0785 1.91668 191.718E-03 -1.62985 -990.191E-03
19 124.466 -6.21492 -88.8834 -86.9075 1.91827 192.783E-03 -1.63095 -991.270E-03
20 126.958 23.9470 -89.5421 -86.7584 1.94289 192.337E-03 -1.65406 -1.00096
21 121.210 6.64491 -84.7929 -86.3587 1.92993 196.112E-03 -1.64059 -997.316E-03
22 121.369 12.5781 -84.3620 -86.3434 1.93495 196.450E-03 -1.64514 -999.468E-03
....
我要执行以下步骤:
- 从 File1 中删除前两列
- 比较两个文件的节点标签
- 以 "rpt" 格式编写一个输出文本文件,其中并排包含具有相同 "node label" 的行
这是我用过的代码。看起来它适用于小文件。但是对于大文件,需要花费大量时间。
nodEl = open("P:/File1.rpt", "r")
uniNod = open("P:/File2.rpt", "r")
row_nodEl = nodEl.readlines()
row_uniNod = uniNod.readlines()
nodEl.close()
uniNod.close()
output = open("P:/output.rpt", "w")
for index, line in enumerate(row_nodEl):
if index > 23081 and index < 40572 and index !=23083 and index !=23084:
temp = line.strip()
temp2 = " ".join(temp.split())
var = temp2.split(" ",3)
for index2, line2 in enumerate(row_uniNod):
if index2 > 11412 and index2 < 21258 and index2 != 11414 and index2 !=11415:
temp3 = line.strip()
temp4 = " ".join(temp3.split())
var2 = temp4.split(" ",1)
if var[2] == var2[0]:
output.write("%s" %var[2]) + " " + "%s" %var[3] + " " + "%s" %var2[1])
欢迎任何建议!
您正在将一个文件的每一行(m
行)与另一个文件的每一行(n
行)进行比较。这导致时间复杂度O(m*n)
。这意味着两个文件,每个文件有 10,000 行,将产生 100,000,000 次比较。
如果您更改读取值的方式,您可以加快代码速度。考虑将文件读入字典而不是列表。字典中的每个键都是一个节点号,每个值都是完整的行。
使用这种方法,您可以执行以下操作:
- 将第一个文件加载到字典中
- 将第二个文件加载到字典中
- 对于第一个字典中的每个节点,在第二个字典中找到对应的节点
使用 Python,它看起来类似于此
file_contents_1 = load_file("P:/File1.rpt")
file_contents_2 = load_file("P:/File2.rpt")
for node_label in file_contents_1:
# Skip processing node which doesn't have corresponding values in the second file
if not node_label in file_contents_2:
continue
# Do something
这种方法的好处是您可以单独加载文件,这意味着时间复杂度现在变为线性 O(m+n)
。当在第二个文件中查找相应的节点时,由于字典的实现方式(即哈希表),您的时间复杂度是恒定的。
这应该会使您的代码更快。
希望我能对这个问题有所帮助:
我有两个文本文件,由大约 10.000 行组成(比方说文件 1 和文件 2)来自 FEM 分析。文件的结构是:
文件 1
....
Element Facet Node CNORMF.Magnitude CNORMF.CNF1 CNORMF.CNF2 CNORMF.CNF3 CPRESS CSHEAR1 CSHEAR2 CSHEARF.Magnitude CSHEARF.CSF1 CSHEARF.CSF2 CSHEARF.CSF3
881 3 6619 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
881 3 6648 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
881 3 6653 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
930 3 6452 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
930 3 6483 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
930 3 6488 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
1244 2 7722 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
1244 2 7724 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
1244 2 7754 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
2380 2 3757 304.326E-06 -123.097E-06 -203.689E-06 -189.663E-06 564.697E-06 -281.448E-06 22.5357E-06 152.710E-06 144.843E-06 -26.7177E-06 -40.3387E-06
2380 2 3826 226.603E-06 -85.9859E-06 -161.270E-06 -133.967E-06 270.594E-06 -134.865E-06 10.7988E-06 117.700E-06 116.217E-06 -4.67318E-06 -18.0298E-06
2380 2 3848 10.4740E-03 -2.01174E-03 -6.63900E-03 -7.84743E-03 771.739E-06 -384.638E-06 30.7983E-06 5.24148E-03 5.12795E-03 -541.446E-06 -940.251E-06
2894 2 8253 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
2894 2 8255 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
2894 2 8270 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
3372 2 5920 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
3372 2 5961 52.7705E-03 12.2948E-03 -40.8019E-03 -31.1251E-03 7.36309E-03 -2.56505E-03 -502.055E-06 18.8167E-03 17.9038E-03 2.12060E-03 5.38774E-03
3372 2 5996 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
3936 3 6782 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
3936 3 6852 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
3936 3 6857 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
3937 4 6410 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
3937 4 6452 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
3937 4 6488 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
3955 2 6940 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
3955 2 6941 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
3955 2 6993 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
4024 2 8027 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
4024 2 8050 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
....
文件 2
....
Node COORD.Magnitude COORD.COOR1 COORD.COOR2 COORD.COOR3 U.Magnitude U.U1 U.U2 U.U3
1 131.691 14.5010 -92.2190 -92.8868 1.93638 188.252E-03 -1.64949 -996.662E-03
2 131.336 10.9038 -92.2281 -92.8663 1.93341 188.250E-03 -1.64672 -995.468E-03
3 132.130 18.7534 -92.4681 -92.5002 1.93968 188.190E-03 -1.65258 -997.959E-03
4 130.769 1.97638 -92.5186 -92.3953 1.92580 188.179E-03 -1.63965 -992.387E-03
5 130.560 -4.04517 -93.1433 -91.3993 1.92030 188.026E-03 -1.63459 -990.122E-03
6 132.422 24.0768 -93.9662 -90.1454 1.94282 187.819E-03 -1.65564 -999.062E-03
7 130.377 -8.39503 -94.1640 -89.7827 1.91586 187.774E-03 -1.63054 -988.235E-03
8 126.321 13.6556 -88.0641 -89.5278 1.93579 192.554E-03 -1.64736 -998.202E-03
9 125.963 4.31065 -88.6558 -89.3771 1.92786 192.145E-03 -1.64012 -994.852E-03
10 130.037 3.02359 -94.4877 -89.2894 1.92501 187.692E-03 -1.63909 -991.871E-03
11 126.692 18.5888 -88.1164 -89.1107 1.93970 192.653E-03 -1.65097 -999.810E-03
12 125.751 -1.96189 -89.1238 -88.6928 1.92231 192.010E-03 -1.63500 -992.572E-03
13 125.719 -3.46723 -89.2798 -88.4437 1.92094 191.971E-03 -1.63373 -992.005E-03
14 130.026 7.42596 -95.0372 -88.4289 1.92818 187.556E-03 -1.64210 -993.086E-03
15 130.736 16.3557 -95.3755 -87.9092 1.93527 187.472E-03 -1.64873 -995.891E-03
16 130.251 -12.8122 -95.5572 -87.5783 1.91105 187.430E-03 -1.62618 -986.163E-03
17 130.250 12.8770 -95.6602 -87.4548 1.93216 187.401E-03 -1.64586 -994.616E-03
18 125.609 -7.73838 -90.1949 -87.0785 1.91668 191.718E-03 -1.62985 -990.191E-03
19 124.466 -6.21492 -88.8834 -86.9075 1.91827 192.783E-03 -1.63095 -991.270E-03
20 126.958 23.9470 -89.5421 -86.7584 1.94289 192.337E-03 -1.65406 -1.00096
21 121.210 6.64491 -84.7929 -86.3587 1.92993 196.112E-03 -1.64059 -997.316E-03
22 121.369 12.5781 -84.3620 -86.3434 1.93495 196.450E-03 -1.64514 -999.468E-03
....
我要执行以下步骤:
- 从 File1 中删除前两列
- 比较两个文件的节点标签
- 以 "rpt" 格式编写一个输出文本文件,其中并排包含具有相同 "node label" 的行
这是我用过的代码。看起来它适用于小文件。但是对于大文件,需要花费大量时间。
nodEl = open("P:/File1.rpt", "r")
uniNod = open("P:/File2.rpt", "r")
row_nodEl = nodEl.readlines()
row_uniNod = uniNod.readlines()
nodEl.close()
uniNod.close()
output = open("P:/output.rpt", "w")
for index, line in enumerate(row_nodEl):
if index > 23081 and index < 40572 and index !=23083 and index !=23084:
temp = line.strip()
temp2 = " ".join(temp.split())
var = temp2.split(" ",3)
for index2, line2 in enumerate(row_uniNod):
if index2 > 11412 and index2 < 21258 and index2 != 11414 and index2 !=11415:
temp3 = line.strip()
temp4 = " ".join(temp3.split())
var2 = temp4.split(" ",1)
if var[2] == var2[0]:
output.write("%s" %var[2]) + " " + "%s" %var[3] + " " + "%s" %var2[1])
欢迎任何建议!
您正在将一个文件的每一行(m
行)与另一个文件的每一行(n
行)进行比较。这导致时间复杂度O(m*n)
。这意味着两个文件,每个文件有 10,000 行,将产生 100,000,000 次比较。
如果您更改读取值的方式,您可以加快代码速度。考虑将文件读入字典而不是列表。字典中的每个键都是一个节点号,每个值都是完整的行。
使用这种方法,您可以执行以下操作:
- 将第一个文件加载到字典中
- 将第二个文件加载到字典中
- 对于第一个字典中的每个节点,在第二个字典中找到对应的节点
使用 Python,它看起来类似于此
file_contents_1 = load_file("P:/File1.rpt")
file_contents_2 = load_file("P:/File2.rpt")
for node_label in file_contents_1:
# Skip processing node which doesn't have corresponding values in the second file
if not node_label in file_contents_2:
continue
# Do something
这种方法的好处是您可以单独加载文件,这意味着时间复杂度现在变为线性 O(m+n)
。当在第二个文件中查找相应的节点时,由于字典的实现方式(即哈希表),您的时间复杂度是恒定的。
这应该会使您的代码更快。