Pandas groupby 给出 "keyError",即使密钥存在
Pandas groupby is giving "keyError", even when the key exists
我是 Python 的新手,对于我的一个项目,我需要将 csv 转换为嵌套 Json。在网上搜索,我发现 pandas
在这种情况下很有帮助。
我遵循了 中给出的方法
但是我收到一个 keyError 异常 KeyError: 'state'
df info
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
country 4 non-null object
state 4 non-null object
city 4 non-null object
dtypes: object(3)
memory usage: 176.0+ bytes
None
Traceback (most recent call last):
File "csvToJson.py", line 31, in <module>
grouped = df.groupby(['country', 'state'])
File "/home/simarpreet/Envs/j/lib/python3.7/site-packages/pandas/core/generic.py", line 7632, in groupby
observed=observed, **kwargs)
File "/home/simarpreet/Envs/j/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 2110, in groupby
return klass(obj, by, **kwds)
File "/home/simarpreet/Envs/j/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 360, in __init__
mutated=self.mutated)
File "/home/simarpreet/Envs/j/lib/python3.7/site-packages/pandas/core/groupby/grouper.py", line 578, in _get_grouper
raise KeyError(gpr)
KeyError: 'state'
输入 csv:
country, state, city
India, Delhi, Tilak nagar
India, Mumbai, Bandra
Australia, Queensland, Gold Coast
US, California, Los Angeles
我的代码:
csvFilePath = "/home/simarpreet/sampleCsv.csv"
jsonFilePath = "/home/simarpreet/sampleJson.json"
jsonFile = open(jsonFilePath, 'w')
df = pd.read_csv(csvFilePath, encoding='utf-8-sig')
print("df info")
print(df.info())
finalList = []
grouped = df.groupby(['country', 'state'])
for key, value in grouped:
dictionary = {}
j = grouped.get_group(key).reset_index(drop=True)
dictionary['country'] = j.at[0, 'country']
dictionary['state'] = j.at[0, 'state']
dictList = []
anotherDict = {}
for i in j.index:
anotherDict['city'] = j.at[i, 'city']
dictList.append(anotherDict)
dictionary['children'] = dictList
finalList.append(dictionary)
json.dumps(finalList)
问题出在您的 csv 文件上,列名中有前导空格,因此会出现键错误。
正如@cs95 所指出的,你可以做到
df.columns = df.columns.str.strip()
或者您可以使用 read_csv 来处理空格:
pd.read_csv(csvFilePath, encoding='utf-8-sig', sep='\s*,\s*', engine='python')
PS : 糟糕的处理方式 :
grouped = df.groupby(['country', ' state'])
我是 Python 的新手,对于我的一个项目,我需要将 csv 转换为嵌套 Json。在网上搜索,我发现 pandas
在这种情况下很有帮助。
我遵循了 KeyError: 'state'
df info
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
country 4 non-null object
state 4 non-null object
city 4 non-null object
dtypes: object(3)
memory usage: 176.0+ bytes
None
Traceback (most recent call last):
File "csvToJson.py", line 31, in <module>
grouped = df.groupby(['country', 'state'])
File "/home/simarpreet/Envs/j/lib/python3.7/site-packages/pandas/core/generic.py", line 7632, in groupby
observed=observed, **kwargs)
File "/home/simarpreet/Envs/j/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 2110, in groupby
return klass(obj, by, **kwds)
File "/home/simarpreet/Envs/j/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 360, in __init__
mutated=self.mutated)
File "/home/simarpreet/Envs/j/lib/python3.7/site-packages/pandas/core/groupby/grouper.py", line 578, in _get_grouper
raise KeyError(gpr)
KeyError: 'state'
输入 csv:
country, state, city
India, Delhi, Tilak nagar
India, Mumbai, Bandra
Australia, Queensland, Gold Coast
US, California, Los Angeles
我的代码:
csvFilePath = "/home/simarpreet/sampleCsv.csv"
jsonFilePath = "/home/simarpreet/sampleJson.json"
jsonFile = open(jsonFilePath, 'w')
df = pd.read_csv(csvFilePath, encoding='utf-8-sig')
print("df info")
print(df.info())
finalList = []
grouped = df.groupby(['country', 'state'])
for key, value in grouped:
dictionary = {}
j = grouped.get_group(key).reset_index(drop=True)
dictionary['country'] = j.at[0, 'country']
dictionary['state'] = j.at[0, 'state']
dictList = []
anotherDict = {}
for i in j.index:
anotherDict['city'] = j.at[i, 'city']
dictList.append(anotherDict)
dictionary['children'] = dictList
finalList.append(dictionary)
json.dumps(finalList)
问题出在您的 csv 文件上,列名中有前导空格,因此会出现键错误。
正如@cs95 所指出的,你可以做到
df.columns = df.columns.str.strip()
或者您可以使用 read_csv 来处理空格:
pd.read_csv(csvFilePath, encoding='utf-8-sig', sep='\s*,\s*', engine='python')
PS : 糟糕的处理方式 :
grouped = df.groupby(['country', ' state'])