Dataframe Pandas aggregation and/or groupby
Dataframe Pandas aggregation and/or groupby
我有一个这样的数据框:
serie = [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]
values = [2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2]
series_X_values = {'series': serie, 'values': values}
df_mytest = pd.DataFrame.from_dict(series_X_values)
df_mytest
我需要创建第三列(例如更频繁)
df_mytest['most_frequent'] = np.nan
其值将在按 'series' 分组的 'values' 列中最常观察到,或者将 'values' 列中的值替换为最常见的术语本身,如下面的数据框:
serie = [1, 2, 3]
values = [2, 2, 1]
series_X_values = {'series': serie, 'values': values}
df_mytest = pd.DataFrame.from_dict(series_X_values)
df_mytest
我尝试了一些不成功的选项,例如:
def personal_most_frequent(col_name):
from sklearn.impute import SimpleImputer
imp = SimpleImputer(strategy="most_frequent")
return imp
df_result = df_mytest.groupby('series').apply(personal_most_frequent('values'))
但是...
TypeError Traceback (most recent call
last)
/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/groupby.py
in apply(self, func, *args, **kwargs)
688 try:
--> 689 result = self._python_apply_general(f)
690 except Exception:
5 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/groupby.py
in _python_apply_general(self, f)
706 keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 707 self.axis)
708
/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/ops.py in
apply(self, f, data, axis)
189 group_axes = _get_axes(group)
--> 190 res = f(group)
191 if not _is_indexed_like(res, group_axes):
TypeError: 'SimpleImputer' object is not callable
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call
last) in ()
5 return imp
6
----> 7 df_result = df_mytest.groupby('series').apply(personal_most_frequent('values'))
/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/groupby.py
in apply(self, func, *args, **kwargs)
699
700 with _group_selection_context(self):
--> 701 return self._python_apply_general(f)
702
703 return result
/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/groupby.py
in _python_apply_general(self, f)
705 def _python_apply_general(self, f):
706 keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 707 self.axis)
708
709 return self._wrap_applied_output(
/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/ops.py in
apply(self, f, data, axis)
188 # group might be modified
189 group_axes = _get_axes(group)
--> 190 res = f(group)
191 if not _is_indexed_like(res, group_axes):
192 mutated = True
TypeError: 'SimpleImputer' object is not callable
和...
df_mytest.groupby(['series', 'values']).agg(lambda x:x.value_counts().index[0])
但又...
IndexError Traceback (most recent call
last)
/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/ops.py in
agg_series(self, obj, func)
589 try:
--> 590 return self._aggregate_series_fast(obj, func)
591 except Exception:
12 frames pandas/_libs/reduction.pyx in
pandas._libs.reduction.SeriesGrouper.get_result()
pandas/_libs/reduction.pyx in
pandas._libs.reduction.SeriesGrouper.get_result()
IndexError: index 0 is out of bounds for axis 0 with size 0
During handling of the above exception, another exception occurred:
IndexError Traceback (most recent call
last)
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in
getitem(self, key) 3956 if is_scalar(key): 3957 key = com.cast_scalar_indexer(key)
-> 3958 return getitem(key) 3959 3960 if isinstance(key, slice):
IndexError: index 0 is out of bounds for axis 0 with size 0
我向社区寻求帮助以完成此过程。
假设您可以通过取最大值来打破平局,您可以这样做:
df_mf = df_mytest.groupby('series')['values'].apply(lambda ds: ds.mode().max()).to_frame('most_frequent')
df_mytest.merge(df_mf, 'left', left_on='series', right_index=True)
输出:
series values most_frequent
0 1 2 2
1 1 2 2
2 1 2 2
3 1 1 2
4 2 2 2
5 2 2 2
6 2 1 2
7 2 1 2
8 3 1 1
9 3 1 1
10 3 1 1
11 3 2 1
我有一个这样的数据框:
serie = [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]
values = [2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2]
series_X_values = {'series': serie, 'values': values}
df_mytest = pd.DataFrame.from_dict(series_X_values)
df_mytest
我需要创建第三列(例如更频繁)
df_mytest['most_frequent'] = np.nan
其值将在按 'series' 分组的 'values' 列中最常观察到,或者将 'values' 列中的值替换为最常见的术语本身,如下面的数据框:
serie = [1, 2, 3]
values = [2, 2, 1]
series_X_values = {'series': serie, 'values': values}
df_mytest = pd.DataFrame.from_dict(series_X_values)
df_mytest
我尝试了一些不成功的选项,例如:
def personal_most_frequent(col_name):
from sklearn.impute import SimpleImputer
imp = SimpleImputer(strategy="most_frequent")
return imp
df_result = df_mytest.groupby('series').apply(personal_most_frequent('values'))
但是...
TypeError Traceback (most recent call last) /usr/local/lib/python3.6/dist-packages/pandas/core/groupby/groupby.py in apply(self, func, *args, **kwargs) 688 try: --> 689 result = self._python_apply_general(f) 690 except Exception:
5 frames /usr/local/lib/python3.6/dist-packages/pandas/core/groupby/groupby.py in _python_apply_general(self, f) 706 keys, values, mutated = self.grouper.apply(f, self._selected_obj, --> 707 self.axis) 708
/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/ops.py in apply(self, f, data, axis) 189 group_axes = _get_axes(group) --> 190 res = f(group) 191 if not _is_indexed_like(res, group_axes):
TypeError: 'SimpleImputer' object is not callable
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last) in () 5 return imp 6 ----> 7 df_result = df_mytest.groupby('series').apply(personal_most_frequent('values'))
/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/groupby.py in apply(self, func, *args, **kwargs) 699 700 with _group_selection_context(self): --> 701 return self._python_apply_general(f) 702 703 return result
/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/groupby.py in _python_apply_general(self, f) 705 def _python_apply_general(self, f): 706 keys, values, mutated = self.grouper.apply(f, self._selected_obj, --> 707 self.axis) 708 709 return self._wrap_applied_output(
/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/ops.py in apply(self, f, data, axis) 188 # group might be modified 189 group_axes = _get_axes(group) --> 190 res = f(group) 191 if not _is_indexed_like(res, group_axes): 192 mutated = True
TypeError: 'SimpleImputer' object is not callable
和...
df_mytest.groupby(['series', 'values']).agg(lambda x:x.value_counts().index[0])
但又...
IndexError Traceback (most recent call last) /usr/local/lib/python3.6/dist-packages/pandas/core/groupby/ops.py in agg_series(self, obj, func) 589 try: --> 590 return self._aggregate_series_fast(obj, func) 591 except Exception:
12 frames pandas/_libs/reduction.pyx in pandas._libs.reduction.SeriesGrouper.get_result()
pandas/_libs/reduction.pyx in pandas._libs.reduction.SeriesGrouper.get_result()
IndexError: index 0 is out of bounds for axis 0 with size 0
During handling of the above exception, another exception occurred:
IndexError Traceback (most recent call last) /usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in getitem(self, key) 3956 if is_scalar(key): 3957 key = com.cast_scalar_indexer(key) -> 3958 return getitem(key) 3959 3960 if isinstance(key, slice):
IndexError: index 0 is out of bounds for axis 0 with size 0
我向社区寻求帮助以完成此过程。
假设您可以通过取最大值来打破平局,您可以这样做:
df_mf = df_mytest.groupby('series')['values'].apply(lambda ds: ds.mode().max()).to_frame('most_frequent')
df_mytest.merge(df_mf, 'left', left_on='series', right_index=True)
输出:
series values most_frequent
0 1 2 2
1 1 2 2
2 1 2 2
3 1 1 2
4 2 2 2
5 2 2 2
6 2 1 2
7 2 1 2
8 3 1 1
9 3 1 1
10 3 1 1
11 3 2 1