在导出到同一 CSV 和从同一 CSV 导入的两个数据框中找到不同的值

Question

我有一个 df_final pandas v1.3.4 数据框并将其导出到 CSV 文件，因此我不需要在每次进行分析时都重复数据框构建步骤。 df_final 将是一个 13000 x 91 数据帧，但我首先在较小的 689x91 数据帧上测试该过程。

我想确认通过读取 df_final CSV 生成的 df_final_csv 数据帧与 df_final 数据帧相同。根据以下内容，看起来它们是不同的。但是，我不确定如何。我复制了一些堆栈溢出代码（下面，改编自 ) but some other solutions (）不起作用，因为我的 df_final 中有列表对象。我如何找到导致问题的值？

如果有任何其他信息有帮助，请告诉我。

#689 rows x 91 columns
df_final = pd.DataFrame.from_dict(results)                                
print (f'NaN are present:  {df_final.isnull().values.any()}')# False

#export to csv
df_final.to_csv('integrated_df.csv')

#read in csv
df_final_csv = pd.read_csv('integrated_df.csv', index_col = 0)
print (f' NaN are present:  {df_final_csv .isnull().values.any()}')# False')
print (f'imported df is same as exported df:  {df_final.equals(df_final_csv)}')#False 

#try and find discrepancies (--> empty df)     
different_values = df_final_csv [~df_final_csv .isin(df_final)].dropna() #empty df with only column headers

干杯！

Answer 1

可能有一些特殊字符被 CSV 弄乱了。尝试写入 .pkl 文件，您将获得 100% 相同的数据。

import pickle
# write into pickle file
pickle.dump(df, open("df.pkl", 'wb'))

# then read it
df_new = pickle.load(open("df.pkl", 'rb'))

在导出到同一 CSV 和从同一 CSV 导入的两个数据框中找到不同的值

find different values in two dataframes exported to and imported from the same CSV

python

csv

comparison

dataframe

pandas