计算 pandas 中多列问题的李克特量表结果数
Count the number of likert scale results from multiple column questions in pandas
我有以下数据框:
Question1 Question2 Question3 Question4
User1 Agree Agree Disagree Strongly Disagree
User2 Disagree Agree Agree Disagree
User3 Agree Agree Agree Agree
有没有办法将上面列出的数据帧转换为以下数据帧?
Agree Disagree Strongly Disagree
Question1 2 1 0
Question2 2 1 0
Question3 2 1 0
Question4 1 1 1
这与我之前的问题类似:
我试着用 stack/pivot 查看以前的问题,但无法弄清楚。实际数据框有 20 多个问题和李克特量表,从非常同意、同意、中立、不同意、非常不同意。
您可以使用 pd.Series.value_counts
遍历列。如果您使用 apply 执行此操作,索引将自动对齐:
df.apply(pd.Series.value_counts)
Out:
Question1 Question2 Question3 Question4
Agree 2.0 3.0 2.0 1
Disagree 1.0 NaN 1.0 1
Strongly Disagree NaN NaN NaN 1
一些后期处理:
df.apply(pd.Series.value_counts).fillna(0).astype('int')
Out:
Question1 Question2 Question3 Question4
Agree 2 3 2 1
Disagree 1 0 1 1
Strongly Disagree 0 0 0 1
df.apply(lambda x:x.value_counts()).fillna(0).astype(int)
# Question1 Question2 Question3 Question4
#Agree 2 3 2 1
#Disagree 1 0 1 1
#Strongly Disagree 0 0 0 1
和pd.get_dummies
pd.get_dummies(df.stack()).groupby(level=1).sum()
Agree Disagree Strongly Disagree
Question1 2 1 0
Question2 3 0 0
Question3 2 1 0
Question4 1 1 1
更上一层楼
我们可以使用 numpy.bincount
来加快速度。但是我们要注意维度
v = df.values
f, u = pd.factorize(v.ravel())
n, m = u.size, v.shape[1]
r = np.tile(np.arange(m), n)
b0 = np.bincount(r * n + f)
pad = np.zeros(n * m - b0.size, dtype=int)
b = np.append(b0, pad)
pd.DataFrame(b.reshape(m, n), df.columns, u)
Agree Disagree Strongly Disagree
Question1 2 1 0
Question2 3 0 0
Question3 2 1 0
Question4 1 1 1
另一个numpy
选项
v = df.values
n, m = v.shape
f, u = pd.factorize(v.ravel())
pd.DataFrame(
np.eye(u.size, dtype=int)[f].reshape(n, m, -1).sum(0),
df.columns, u
)
Agree Disagree Strongly Disagree
Question1 2 1 0
Question2 3 0 0
Question3 2 1 0
Question4 1 1 1
速度差异
%%timeit
v = df.values
f, u = pd.factorize(v.ravel())
n, m = u.size, v.shape[1]
r = np.tile(np.arange(m), n)
b0 = np.bincount(r * n + f)
pad = np.zeros(n * m - b0.size, dtype=int)
b = np.append(b0, pad)
pd.DataFrame(b.reshape(m, n), df.columns, u)
1000 loops, best of 3: 194 µs per loop
%%timeit
v = df.values
n, m = v.shape
f, u = pd.factorize(v.ravel())
pd.DataFrame(
np.eye(u.size, dtype=int)[f].reshape(n, m, -1).sum(0),
df.columns, u
)
1000 loops, best of 3: 195 µs per loop
%timeit pd.get_dummies(df.stack()).groupby(level=1).sum()
1000 loops, best of 3: 1.2 ms per loop
我有以下数据框:
Question1 Question2 Question3 Question4
User1 Agree Agree Disagree Strongly Disagree
User2 Disagree Agree Agree Disagree
User3 Agree Agree Agree Agree
有没有办法将上面列出的数据帧转换为以下数据帧?
Agree Disagree Strongly Disagree
Question1 2 1 0
Question2 2 1 0
Question3 2 1 0
Question4 1 1 1
这与我之前的问题类似:
我试着用 stack/pivot 查看以前的问题,但无法弄清楚。实际数据框有 20 多个问题和李克特量表,从非常同意、同意、中立、不同意、非常不同意。
您可以使用 pd.Series.value_counts
遍历列。如果您使用 apply 执行此操作,索引将自动对齐:
df.apply(pd.Series.value_counts)
Out:
Question1 Question2 Question3 Question4
Agree 2.0 3.0 2.0 1
Disagree 1.0 NaN 1.0 1
Strongly Disagree NaN NaN NaN 1
一些后期处理:
df.apply(pd.Series.value_counts).fillna(0).astype('int')
Out:
Question1 Question2 Question3 Question4
Agree 2 3 2 1
Disagree 1 0 1 1
Strongly Disagree 0 0 0 1
df.apply(lambda x:x.value_counts()).fillna(0).astype(int)
# Question1 Question2 Question3 Question4
#Agree 2 3 2 1
#Disagree 1 0 1 1
#Strongly Disagree 0 0 0 1
和pd.get_dummies
pd.get_dummies(df.stack()).groupby(level=1).sum()
Agree Disagree Strongly Disagree
Question1 2 1 0
Question2 3 0 0
Question3 2 1 0
Question4 1 1 1
更上一层楼
我们可以使用 numpy.bincount
来加快速度。但是我们要注意维度
v = df.values
f, u = pd.factorize(v.ravel())
n, m = u.size, v.shape[1]
r = np.tile(np.arange(m), n)
b0 = np.bincount(r * n + f)
pad = np.zeros(n * m - b0.size, dtype=int)
b = np.append(b0, pad)
pd.DataFrame(b.reshape(m, n), df.columns, u)
Agree Disagree Strongly Disagree
Question1 2 1 0
Question2 3 0 0
Question3 2 1 0
Question4 1 1 1
另一个numpy
选项
v = df.values
n, m = v.shape
f, u = pd.factorize(v.ravel())
pd.DataFrame(
np.eye(u.size, dtype=int)[f].reshape(n, m, -1).sum(0),
df.columns, u
)
Agree Disagree Strongly Disagree
Question1 2 1 0
Question2 3 0 0
Question3 2 1 0
Question4 1 1 1
速度差异
%%timeit
v = df.values
f, u = pd.factorize(v.ravel())
n, m = u.size, v.shape[1]
r = np.tile(np.arange(m), n)
b0 = np.bincount(r * n + f)
pad = np.zeros(n * m - b0.size, dtype=int)
b = np.append(b0, pad)
pd.DataFrame(b.reshape(m, n), df.columns, u)
1000 loops, best of 3: 194 µs per loop
%%timeit
v = df.values
n, m = v.shape
f, u = pd.factorize(v.ravel())
pd.DataFrame(
np.eye(u.size, dtype=int)[f].reshape(n, m, -1).sum(0),
df.columns, u
)
1000 loops, best of 3: 195 µs per loop
%timeit pd.get_dummies(df.stack()).groupby(level=1).sum()
1000 loops, best of 3: 1.2 ms per loop