汇总要在 Python 中计数的数据帧字符串值 3
Summarizing dataframe string values to count in Python 3
在下面的屏幕截图中,您会发现一个数据框,每个单元格中都包含字符串值。我想做的是从这个包含 3 列的数据框中创建一个新的数据框:'Very interested' 'Somewhat interested' 和 'Not interested'。我不知道如何将原始 df 转换为这个新的 df,我尝试只计算满足 'Very interested' 等条件的值并将它们放入新的 df 但数字似乎不正确。
如有任何帮助,我将不胜感激。谢谢。
编辑:这也是重现类似于屏幕截图中的数据帧的代码:
df = pd.DataFrame({1: ['Very interested', 'Not interested', 'Somewhat interested', 'Very interested', 'Not interested', 'Somewhat interested'], 2: ['Very interested', 'Not interested', 'Somewhat interested', 'Very interested', 'Not interested', 'Somewhat interested'], 3: ['Very interested', 'Not interested', 'Somewhat interested', 'Very interested', 'Not interested', 'Somewhat interested'], 4: ['Very interested', 'Not interested', 'Somewhat interested', 'Very interested', 'Not interested', 'Somewhat interested'], 5: ['Very interested', 'Not interested', 'Somewhat interested', 'Very interested', 'Not interested', 'Somewhat interested'], 6: ['Very interested', 'Not interested', 'Somewhat interested', 'Very interested', 'Not interested', 'Somewhat interested']},
index=['Big Data','Data Analysis','Data Journalism', 'Data Visualization', 'Deep Learning', 'Machine Learning'])
根据所需的输出,它应该是这样的:
我认为需要通过 melt
and then get counts by GroupBy.size
with Series.unstack
重塑:
df = (df.rename_axis('val')
.reset_index()
.melt('val', var_name='a', value_name='b')
.groupby(['val','b'])
.size()
.unstack(fill_value=0))
另一种解决方案stack
, counts by SeriesGroupBy.value_counts
with Series.unstack
:
df = (df.stack()
.groupby(level=0)
.value_counts()
.unstack(fill_value=0))
在下面的屏幕截图中,您会发现一个数据框,每个单元格中都包含字符串值。我想做的是从这个包含 3 列的数据框中创建一个新的数据框:'Very interested' 'Somewhat interested' 和 'Not interested'。我不知道如何将原始 df 转换为这个新的 df,我尝试只计算满足 'Very interested' 等条件的值并将它们放入新的 df 但数字似乎不正确。
如有任何帮助,我将不胜感激。谢谢。
df = pd.DataFrame({1: ['Very interested', 'Not interested', 'Somewhat interested', 'Very interested', 'Not interested', 'Somewhat interested'], 2: ['Very interested', 'Not interested', 'Somewhat interested', 'Very interested', 'Not interested', 'Somewhat interested'], 3: ['Very interested', 'Not interested', 'Somewhat interested', 'Very interested', 'Not interested', 'Somewhat interested'], 4: ['Very interested', 'Not interested', 'Somewhat interested', 'Very interested', 'Not interested', 'Somewhat interested'], 5: ['Very interested', 'Not interested', 'Somewhat interested', 'Very interested', 'Not interested', 'Somewhat interested'], 6: ['Very interested', 'Not interested', 'Somewhat interested', 'Very interested', 'Not interested', 'Somewhat interested']},
index=['Big Data','Data Analysis','Data Journalism', 'Data Visualization', 'Deep Learning', 'Machine Learning'])
根据所需的输出,它应该是这样的:
我认为需要通过 melt
and then get counts by GroupBy.size
with Series.unstack
重塑:
df = (df.rename_axis('val')
.reset_index()
.melt('val', var_name='a', value_name='b')
.groupby(['val','b'])
.size()
.unstack(fill_value=0))
另一种解决方案stack
, counts by SeriesGroupBy.value_counts
with Series.unstack
:
df = (df.stack()
.groupby(level=0)
.value_counts()
.unstack(fill_value=0))