Pandas: 在包含字典的列中按字典键分组
Pandas: Group by key of dict in column which contains dictionaries
我的数据
我有以下 pandas 数据框:
df = pd.DataFrame({
'c1': range(5),
'c2': [
{'k1': 'x-1', 'k2': 'z'},
{'k1': 'x-2', 'k2': 'z1'},
{'k1': 'x-3', 'k2': 'z1'},
{'k1': 'y-1', 'k2': 'z'},
{'k1': 'y-2', 'k2': 'z1'}
]
})
我的目标
现在,我想按 'k1'
分组,这是包含字典的列 'c2'
的所有行中的公共键。分组函数将是 lambda x: x.split('-')[0]
以切断破折号后面的数字。
期望的输出是:
'x' 3
'y' 2
尝试次数
>>> df.groupby(df['c2']['k1'].str.split('-')[0]).count()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.7/site-packages/pandas/core/series.py", line 601, in __getitem__
result = self.index.get_value(self, key)
File "/Library/Python/2.7/site-packages/pandas/core/indexes/base.py", line 2477, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas/_libs/index.pyx", line 98, in pandas._libs.index.IndexEngine.get_value (pandas/_libs/index.c:4404)
File "pandas/_libs/index.pyx", line 106, in pandas._libs.index.IndexEngine.get_value (pandas/_libs/index.c:4087)
File "pandas/_libs/index.pyx", line 156, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5210)
KeyError: 'k1'
显然,我无法通过 df['c2']['k1']
索引行 c2
的键 k1
。
我该怎么做?
你很接近,只需要将带有 dicts
的列转换为新的 DataFrame
:
print (pd.DataFrame(df['c2'].values.tolist()))
k1 k2
0 x-1 z
1 x-2 z1
2 x-3 z1
3 y-1 z
4 y-2 z1
a = pd.DataFrame(df['c2'].values.tolist())['k1'].str.split('-').str[0]
print (a)
0 x
1 x
2 x
3 y
4 y
Name: k1, dtype: object
df = df.groupby(a).size().reset_index(name='len')
print (df)
k1 len
0 x 3
1 y 2
另一种解决方案是对 groupby 键使用 list comprehension
:
L = [x['k1'].split('-')[0] for x in df['c2']]
print (L)
['x', 'x', 'x', 'y', 'y']
df = df.groupby(L).size().rename_axis('k1').reset_index(name='len')
print (df)
k1 len
0 x 3
1 y 2
value_counts
的解决方案:
df = a.value_counts().rename_axis('k1').reset_index(name='len')
print (df)
k1 len
0 x 3
1 y 2
df = pd.Series(L).value_counts().rename_axis('k1').reset_index(name='len')
print (df)
k1 len
0 x 3
1 y 2
我的数据
我有以下 pandas 数据框:
df = pd.DataFrame({
'c1': range(5),
'c2': [
{'k1': 'x-1', 'k2': 'z'},
{'k1': 'x-2', 'k2': 'z1'},
{'k1': 'x-3', 'k2': 'z1'},
{'k1': 'y-1', 'k2': 'z'},
{'k1': 'y-2', 'k2': 'z1'}
]
})
我的目标
现在,我想按 'k1'
分组,这是包含字典的列 'c2'
的所有行中的公共键。分组函数将是 lambda x: x.split('-')[0]
以切断破折号后面的数字。
期望的输出是:
'x' 3
'y' 2
尝试次数
>>> df.groupby(df['c2']['k1'].str.split('-')[0]).count()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.7/site-packages/pandas/core/series.py", line 601, in __getitem__
result = self.index.get_value(self, key)
File "/Library/Python/2.7/site-packages/pandas/core/indexes/base.py", line 2477, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas/_libs/index.pyx", line 98, in pandas._libs.index.IndexEngine.get_value (pandas/_libs/index.c:4404)
File "pandas/_libs/index.pyx", line 106, in pandas._libs.index.IndexEngine.get_value (pandas/_libs/index.c:4087)
File "pandas/_libs/index.pyx", line 156, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5210)
KeyError: 'k1'
显然,我无法通过 df['c2']['k1']
索引行 c2
的键 k1
。
我该怎么做?
你很接近,只需要将带有 dicts
的列转换为新的 DataFrame
:
print (pd.DataFrame(df['c2'].values.tolist()))
k1 k2
0 x-1 z
1 x-2 z1
2 x-3 z1
3 y-1 z
4 y-2 z1
a = pd.DataFrame(df['c2'].values.tolist())['k1'].str.split('-').str[0]
print (a)
0 x
1 x
2 x
3 y
4 y
Name: k1, dtype: object
df = df.groupby(a).size().reset_index(name='len')
print (df)
k1 len
0 x 3
1 y 2
另一种解决方案是对 groupby 键使用 list comprehension
:
L = [x['k1'].split('-')[0] for x in df['c2']]
print (L)
['x', 'x', 'x', 'y', 'y']
df = df.groupby(L).size().rename_axis('k1').reset_index(name='len')
print (df)
k1 len
0 x 3
1 y 2
value_counts
的解决方案:
df = a.value_counts().rename_axis('k1').reset_index(name='len')
print (df)
k1 len
0 x 3
1 y 2
df = pd.Series(L).value_counts().rename_axis('k1').reset_index(name='len')
print (df)
k1 len
0 x 3
1 y 2