在 python 中读取平面文件时折叠多行
Collapse multiple lines when reading flat file in python
我想在 python;
中解析如下所示的平面文件
Element ID Element Type Result Jacobian Sign
============== ================= ========= =====================
1 Parabolic Warning 1.000000
Hexahedron
2 Parabolic Warning 1.000000
Hexahedron
3 Parabolic Warning 1.000000
Hexahedron
4 Parabolic Warning 1.000000
我尝试使用 this answer 中使用的机制,如下所示;
import pandas as pd
def parse_file(file):
col_spec = [(0, 15), (16, 33), (34, 43), (44, 65)]
return pd.read_fwf(file, colspecs=col_spec)
但是它读取了第一行的一条记录和除了单词 'Hexahedron' 作为元素类型之外的空行。
>>> data = parse_file("example.txt")
>>> data.head()
Element ID Element Type Result Jacobian Sign
0 NaN NaN NaN NaN
1 ============== ================ ======== ====================
2 1 Parabolic Warning 1.000000
3 NaN Hexahedron NaN NaN <= Extra record
4 2 Parabolic Warning 1.000000
从行中可以看出,前两行被捕获为2条记录(记录2和3)。我希望解析器将前两行捕获为一条记录,以便将短语 'Parabolic Hexahedron' 捕获为元素类型。我该怎么做?
一些post-处理应该可以解决问题。下面是一些使用 shift
运算符的代码。另请注意,不需要打开文件,只需将文件名传递给 pd.read_fwf
.
import pandas as pd
col_spec = [(0, 15), (15, 32), (32, 42), (43, 65)]
df = pd.read_fwf("example.txt", colspecs=col_spec, comment="=")
# combine rows
df["combined"] = (df['Element Type'] + df['Element Type'].shift(-1)).where(df['Element ID'].notnull(), df['Element Type'] )
# remove extra rows
df = df[df['Element ID'].notnull()]
这应该给出一个如下所示的 DataFrame:
Element ID Element Type Result Jacobian Sign combined
2 1 Parabolic Warning 1.000000 ParabolicHexahedron
4 2 Parabolic Warning 1.000000 ParabolicHexahedron
6 3 Parabolic Warning 1.000000 ParabolicHexahedron
8 4 Parabolic Warning 1.000000 ParabolicHexahedron
我想在 python;
中解析如下所示的平面文件 Element ID Element Type Result Jacobian Sign
============== ================= ========= =====================
1 Parabolic Warning 1.000000
Hexahedron
2 Parabolic Warning 1.000000
Hexahedron
3 Parabolic Warning 1.000000
Hexahedron
4 Parabolic Warning 1.000000
我尝试使用 this answer 中使用的机制,如下所示;
import pandas as pd
def parse_file(file):
col_spec = [(0, 15), (16, 33), (34, 43), (44, 65)]
return pd.read_fwf(file, colspecs=col_spec)
但是它读取了第一行的一条记录和除了单词 'Hexahedron' 作为元素类型之外的空行。
>>> data = parse_file("example.txt")
>>> data.head()
Element ID Element Type Result Jacobian Sign
0 NaN NaN NaN NaN
1 ============== ================ ======== ====================
2 1 Parabolic Warning 1.000000
3 NaN Hexahedron NaN NaN <= Extra record
4 2 Parabolic Warning 1.000000
从行中可以看出,前两行被捕获为2条记录(记录2和3)。我希望解析器将前两行捕获为一条记录,以便将短语 'Parabolic Hexahedron' 捕获为元素类型。我该怎么做?
一些post-处理应该可以解决问题。下面是一些使用 shift
运算符的代码。另请注意,不需要打开文件,只需将文件名传递给 pd.read_fwf
.
import pandas as pd
col_spec = [(0, 15), (15, 32), (32, 42), (43, 65)]
df = pd.read_fwf("example.txt", colspecs=col_spec, comment="=")
# combine rows
df["combined"] = (df['Element Type'] + df['Element Type'].shift(-1)).where(df['Element ID'].notnull(), df['Element Type'] )
# remove extra rows
df = df[df['Element ID'].notnull()]
这应该给出一个如下所示的 DataFrame:
Element ID Element Type Result Jacobian Sign combined
2 1 Parabolic Warning 1.000000 ParabolicHexahedron
4 2 Parabolic Warning 1.000000 ParabolicHexahedron
6 3 Parabolic Warning 1.000000 ParabolicHexahedron
8 4 Parabolic Warning 1.000000 ParabolicHexahedron