如何使用 pandas DataFrame 读取科学格式的文本文件数据
How to read text file data in scientific format using pandas DataFrame
我有一个文本文件 (input.txt),其中包含 2 列数据,这些数据采用科学格式,如图所示。
input.txt file contents:
4.6277245181485196e-02 -3.478992280123e-02
5.147225314664928553e-02 -3.626645537995224627e-02
5.719622597261836416e-02 -3.778369677696073736e-02
6.351385032440140521e-02 -3.9348512512335400e-02
7.049988917103996999e-02 -4.096034949794334634e-02
7.822948857937785105e-02 -4.261684461302106541e-02
8.67649433797989394455e-02 -4.77e-02
9.614380281036348508e-02 -4.604114963738591831e-02
1.063651118650106309e-01 -4.777947266421164740e-02
1.173824105396738815e-01 -4.950717696170207904e-02
1.291006932795119577e-01 -5.119743181445588626e-02
我使用下面的代码将数据读取为 DataFrame。
import pandas as pd
from tabulate import tabulate
df = pd.read_csv('input.txt',delim_whitespace=True,engine='python',header=None,skip_blank_lines=True)
f=open('output.txt','w')
f.write(tabulate(df.values,tablefmt="plain"))
f.close()
但数据并未以科学格式读取。我正在使用 tabulate 将相同的数据写入另一个 outfile 文件(看起来像 table 一样间隔均匀)。而且,它不是科学格式,并且还截断了所示的数字。
output.txt file contents:
0.0462772 -0.0347899
0.0514723 -0.0362665
0.0571962 -0.0377837
0.0635139 -0.0393485
0.0704999 -0.0409603
0.0782295 -0.0426168
0.0867649 -0.0477
0.0961438 -0.0460411
0.106365 -0.0477795
0.117382 -0.0495072
0.129101 -0.0511974
我需要按原样读取数据,即在本例中为科学格式,并使用制表输出到另一个文件。上面的代码需要修改什么?
读取 CSV 时指定 dtype=str
:
df = pd.read_csv("input.txt", sep=r"\s+", engine="python", dtype=str, header=None)
print(tabulate(df.values, tablefmt="plain", disable_numparse=True))
打印:
4.6277245181485196e-02 -3.478992280123e-02
5.147225314664928553e-02 -3.626645537995224627e-02
5.719622597261836416e-02 -3.778369677696073736e-02
6.351385032440140521e-02 -3.9348512512335400e-02
7.049988917103996999e-02 -4.096034949794334634e-02
7.822948857937785105e-02 -4.261684461302106541e-02
8.67649433797989394455e-02 -4.77e-02
9.614380281036348508e-02 -4.604114963738591831e-02
1.063651118650106309e-01 -4.777947266421164740e-02
1.173824105396738815e-01 -4.950717696170207904e-02
1.291006932795119577e-01 -5.119743181445588626e-02
我有一个文本文件 (input.txt),其中包含 2 列数据,这些数据采用科学格式,如图所示。
input.txt file contents:
4.6277245181485196e-02 -3.478992280123e-02
5.147225314664928553e-02 -3.626645537995224627e-02
5.719622597261836416e-02 -3.778369677696073736e-02
6.351385032440140521e-02 -3.9348512512335400e-02
7.049988917103996999e-02 -4.096034949794334634e-02
7.822948857937785105e-02 -4.261684461302106541e-02
8.67649433797989394455e-02 -4.77e-02
9.614380281036348508e-02 -4.604114963738591831e-02
1.063651118650106309e-01 -4.777947266421164740e-02
1.173824105396738815e-01 -4.950717696170207904e-02
1.291006932795119577e-01 -5.119743181445588626e-02
我使用下面的代码将数据读取为 DataFrame。
import pandas as pd
from tabulate import tabulate
df = pd.read_csv('input.txt',delim_whitespace=True,engine='python',header=None,skip_blank_lines=True)
f=open('output.txt','w')
f.write(tabulate(df.values,tablefmt="plain"))
f.close()
但数据并未以科学格式读取。我正在使用 tabulate 将相同的数据写入另一个 outfile 文件(看起来像 table 一样间隔均匀)。而且,它不是科学格式,并且还截断了所示的数字。
output.txt file contents:
0.0462772 -0.0347899
0.0514723 -0.0362665
0.0571962 -0.0377837
0.0635139 -0.0393485
0.0704999 -0.0409603
0.0782295 -0.0426168
0.0867649 -0.0477
0.0961438 -0.0460411
0.106365 -0.0477795
0.117382 -0.0495072
0.129101 -0.0511974
我需要按原样读取数据,即在本例中为科学格式,并使用制表输出到另一个文件。上面的代码需要修改什么?
读取 CSV 时指定 dtype=str
:
df = pd.read_csv("input.txt", sep=r"\s+", engine="python", dtype=str, header=None)
print(tabulate(df.values, tablefmt="plain", disable_numparse=True))
打印:
4.6277245181485196e-02 -3.478992280123e-02
5.147225314664928553e-02 -3.626645537995224627e-02
5.719622597261836416e-02 -3.778369677696073736e-02
6.351385032440140521e-02 -3.9348512512335400e-02
7.049988917103996999e-02 -4.096034949794334634e-02
7.822948857937785105e-02 -4.261684461302106541e-02
8.67649433797989394455e-02 -4.77e-02
9.614380281036348508e-02 -4.604114963738591831e-02
1.063651118650106309e-01 -4.777947266421164740e-02
1.173824105396738815e-01 -4.950717696170207904e-02
1.291006932795119577e-01 -5.119743181445588626e-02