使用来自三列的分组问题制作数据框
Make a dataframe with grouped questions from three columns
我有以下数据框:
A B C
I am motivated Agree 4
I am motivated Strongly Agree 5
I am motivated Disagree 6
I am open-minded Agree 4
I am open-minded Disagree 4
I am open-minded Strongly Disagree 3
其中 A 列是问题,B 列是答案,C 列是 "Strongly Agree"、"Agree"、"Disagree" 和 "Strongly Disagree" 的频率A 列中的问题。
如何将其转换为以下数据框?
Strongly Agree Agree Disagree Strongly Disagree
I am motivated 5 4 6 0
I am open-minded 0 4 4 3
我尝试查看 groupby() 以查找其他帖子中的列,但无法弄清楚。使用 python 3
In [250]: df.pivot_table(index='A', columns='B', values='C', aggfunc='sum', fill_value=0)
Out[250]:
B Agree Disagree Strongly Agree Strongly Disagree
A
I am motivated 4 6 5 0
I am open-minded 4 4 0 3
因为这些已经是频率计数,我们可以假设我们有唯一的 Question
/ Opinion
对。因此,我们可以使用 set_index
和 unstack
,因为不需要聚合。这应该可以为我们节省一些时间和效率。我们可以使用 pivot
实现相同的目标,但是,pivot
没有 fill_value
选项使我们能够保留 dtype
df.set_index(['A', 'B']).C.unstack(fill_value=0)
B Agree Disagree Strongly Agree Strongly Disagree
A
I am motivated 4 6 5 0
I am open-minded 4 4 0 3
额外学分
将'B'
变成pd.Categorical
,列将被排序
df.B = pd.Categorical(
df.B, ['Strongly Disagree', 'Disagree', 'Agree', 'Strongly Agree'], True)
df.set_index(['A', 'B']).C.unstack(fill_value=0)
B Strongly Disagree Disagree Agree Strongly Agree
A
I am motivated 0 6 4 5
I am open-minded 3 4 4 0
我有以下数据框:
A B C
I am motivated Agree 4
I am motivated Strongly Agree 5
I am motivated Disagree 6
I am open-minded Agree 4
I am open-minded Disagree 4
I am open-minded Strongly Disagree 3
其中 A 列是问题,B 列是答案,C 列是 "Strongly Agree"、"Agree"、"Disagree" 和 "Strongly Disagree" 的频率A 列中的问题。
如何将其转换为以下数据框?
Strongly Agree Agree Disagree Strongly Disagree
I am motivated 5 4 6 0
I am open-minded 0 4 4 3
我尝试查看 groupby() 以查找其他帖子中的列,但无法弄清楚。使用 python 3
In [250]: df.pivot_table(index='A', columns='B', values='C', aggfunc='sum', fill_value=0)
Out[250]:
B Agree Disagree Strongly Agree Strongly Disagree
A
I am motivated 4 6 5 0
I am open-minded 4 4 0 3
因为这些已经是频率计数,我们可以假设我们有唯一的 Question
/ Opinion
对。因此,我们可以使用 set_index
和 unstack
,因为不需要聚合。这应该可以为我们节省一些时间和效率。我们可以使用 pivot
实现相同的目标,但是,pivot
没有 fill_value
选项使我们能够保留 dtype
df.set_index(['A', 'B']).C.unstack(fill_value=0)
B Agree Disagree Strongly Agree Strongly Disagree
A
I am motivated 4 6 5 0
I am open-minded 4 4 0 3
额外学分
将'B'
变成pd.Categorical
,列将被排序
df.B = pd.Categorical(
df.B, ['Strongly Disagree', 'Disagree', 'Agree', 'Strongly Agree'], True)
df.set_index(['A', 'B']).C.unstack(fill_value=0)
B Strongly Disagree Disagree Agree Strongly Agree
A
I am motivated 0 6 4 5
I am open-minded 3 4 4 0