pandas :读取 xlsx 文件以 column1 为键和 column2 为值的字典

Question

我是 pandas 的新手。我需要读取一个 xlsx 文件并使用 pandas 将第一列转换为字典的键，将第二列转换为字典的值。我还需要跳过/排除第一行 headers.

here is for pymysql and here 的答案是 csv。我需要用户 pandas.

这是一个示例excel数据

dict_key    dict_value  
key1        str_value1  
key2        str_value2  
key3         None  
key4         int_value3

到目前为止我的代码如下。

import pandas as pd

excel_file = "file.xlsx"
xls = pd.ExcelFile(excel_file)
df = xls.parse(xls.sheet_names[0], skiprows=1, index_col=None, na_values=['None'])
data_dict = df.to_dict()

但是，它给了我字典，其中键是列号，值是 column1 数据和 column2 数据。

>>> data_dict
{u'Chg_Parms': {0: u'  key1 ', 1: u'   key2 ', 2: u'   key3 ', 3: u'   key4 ', 4: u'   str_value1 ', 
                5: u'   str_value2 ', 6: u'   Nan ', 6: u'   int_value3 '}}

我想要的是第 1 列数据作为键，第 2 列数据作为值，并且 NaN 替换为 None

data_dict = {'key1': 'str_value1', 'key2': 'str_value2', 'key3': None, 'key4': int_value3}

感谢您的帮助。

Answer 1

您可以使用 collections.OrderedDict to keep the keys in order. You'll note that pd.read_excel 默认加载第一个 sheet。编辑：然后你说你想对字典中的项目进行编码，并将 'None' 评估为 None...

import collections as co
import pandas as pd

df = pd.read_excel('file.xlsx')
df = df.where(pd.notnull(df), None)
od = co.OrderedDict((k.strip().encode('utf8'),v.strip().encode('utf8')) 
                    for (k,v) in df.values)

结果：

>>> od
OrderedDict([(u'key1', u'str_value1'), (u'key2', u'str_value2'), (u'key3', u'None'), (u'key4', u'int_value3')])

一般说明：您应该在 Python 程序中将字符串保留为 Unicode。

Answer 2

您可以使用pandasread_excel的方法来更方便的读取excel文件。您可以传递一个 index_col 参数，您可以在其中定义 xlsx 的哪一列是索引。

question 中解释了如何将 NaN 更改为 None。

给定一个名为 example.xlsx 的 xlsx 文件，它是按照您上面所写的方式构建的，以下代码应该会给出您预期的结果：

import pandas as pd

df = pd.read_excel("example.xlsx", index_col=0)
df = df.where(pd.notnull(df), None)

print df.to_dict()["dict_value"]

pandas :读取 xlsx 文件以 column1 为键和 column2 为值的字典

pandas :Read xlsx file to dict with column1 as key and column2 as values

python

xlsx

python-2.7

pandas