如何根据条件添加带有计数器的列
how to add a column with a counter based on a condition
我有一个如下所示的数据集:
cols = ['question_nummber', 'Answer', 'Avg_Score']
data=[['Q1', 'w1', 'N/A'],
['Q1','w2', 4.3],
['Q1','w3', 1.2],
['Q1','w4', 3.5],
['Q2','w5', 'N/A'],
['Q2','w6', 3.1],
['Q2','w7', 2.4],
['Q2','w8', 1.7],
['Q2','w9', 4.6],
['Q3','w10', 'N/A'],
['Q3','w11', 3.0]]
df = pd.DataFrame(data, columns = cols)
我想添加一列,其中包含问题 Q1 至 Q3 的答案编号。每次循环找到字符串 N/A 列“Avg_Score”时,计数器都必须设置为 1 回到 1。我想要的输出是:
cols = ['question_nummber', 'answers-options', 'Answer', 'Avg_Score']
data=[['Q1', 'A1', 'w1', 'N/A'],
['Q1', 'A2','w2', 4.3],
['Q1','A3','w3', 1.2],
['Q1', 'A4','w4', 3.5],
['Q2','A1','w5', 'N/A'],
['Q2','A2','w6', 3.1],
['Q2','A3','w7', 2.4],
['Q2','A4','w8', 1.7],
['Q2','A5', 'w9', 4.6],
['Q3','A1','w10', 'N/A'],
['Q3','A2', 'w11', 3.0]]
df = pd.DataFrame(data, columns = cols)
我尝试了下面的代码,但它不起作用,因为每次找到“N/A”时计数器都没有设置为 1。计数器继续。如何获得我想要的输出?
c=1
for x, row in df.iterrows():
if df.loc[x, 'Avg_Score'] == 'N/A':
df.loc[x,'question_alternative'] = 'Null'
else:
c=c+1
df.loc[x,'question_alternative'] = 'A{}'.format(c)
如果每组以 N/A
值开始,用 Series.notna
and create counter by GroupBy.cumsum
测试缺失值:
df['Avg_Score'] = df['Avg_Score'].replace('N/A', np.nan)
df['new'] = 'A' + (df['Avg_Score'].notna()
.groupby(df['question_nummber'])
.cumsum()
.add(1)
.astype(str))
没有缺失值的备选方案:
df['new'] = ('A' + df['Avg_Score'].eq('N/A')
.groupby(df['question_nummber'])
.cumsum()
.add(1)
.astype(str))
print (df)
question_nummber answers-options Answer Avg_Score new
0 Q1 A1 w1 NaN A1
1 Q1 A2 w2 4.3 A2
2 Q1 A3 w3 1.2 A3
3 Q1 A4 w4 3.5 A4
4 Q2 A1 w5 NaN A1
5 Q2 A2 w6 3.1 A2
6 Q2 A3 w7 2.4 A3
7 Q2 A4 w8 1.7 A4
8 Q2 A5 w9 4.6 A5
9 Q3 A1 w10 NaN A1
10 Q3 A2 w11 3.0 A2
如果只需要按计数器按列分组 question_nummber
(N/A
不是必需的测试)使用 GroupBy.cumcount
作为计数器:
df['new'] = ('A' + df.groupby('question_nummber')
.cumcount()
.add(1)
.astype(str))
我有一个如下所示的数据集:
cols = ['question_nummber', 'Answer', 'Avg_Score']
data=[['Q1', 'w1', 'N/A'],
['Q1','w2', 4.3],
['Q1','w3', 1.2],
['Q1','w4', 3.5],
['Q2','w5', 'N/A'],
['Q2','w6', 3.1],
['Q2','w7', 2.4],
['Q2','w8', 1.7],
['Q2','w9', 4.6],
['Q3','w10', 'N/A'],
['Q3','w11', 3.0]]
df = pd.DataFrame(data, columns = cols)
我想添加一列,其中包含问题 Q1 至 Q3 的答案编号。每次循环找到字符串 N/A 列“Avg_Score”时,计数器都必须设置为 1 回到 1。我想要的输出是:
cols = ['question_nummber', 'answers-options', 'Answer', 'Avg_Score']
data=[['Q1', 'A1', 'w1', 'N/A'],
['Q1', 'A2','w2', 4.3],
['Q1','A3','w3', 1.2],
['Q1', 'A4','w4', 3.5],
['Q2','A1','w5', 'N/A'],
['Q2','A2','w6', 3.1],
['Q2','A3','w7', 2.4],
['Q2','A4','w8', 1.7],
['Q2','A5', 'w9', 4.6],
['Q3','A1','w10', 'N/A'],
['Q3','A2', 'w11', 3.0]]
df = pd.DataFrame(data, columns = cols)
我尝试了下面的代码,但它不起作用,因为每次找到“N/A”时计数器都没有设置为 1。计数器继续。如何获得我想要的输出?
c=1
for x, row in df.iterrows():
if df.loc[x, 'Avg_Score'] == 'N/A':
df.loc[x,'question_alternative'] = 'Null'
else:
c=c+1
df.loc[x,'question_alternative'] = 'A{}'.format(c)
如果每组以 N/A
值开始,用 Series.notna
and create counter by GroupBy.cumsum
测试缺失值:
df['Avg_Score'] = df['Avg_Score'].replace('N/A', np.nan)
df['new'] = 'A' + (df['Avg_Score'].notna()
.groupby(df['question_nummber'])
.cumsum()
.add(1)
.astype(str))
没有缺失值的备选方案:
df['new'] = ('A' + df['Avg_Score'].eq('N/A')
.groupby(df['question_nummber'])
.cumsum()
.add(1)
.astype(str))
print (df)
question_nummber answers-options Answer Avg_Score new
0 Q1 A1 w1 NaN A1
1 Q1 A2 w2 4.3 A2
2 Q1 A3 w3 1.2 A3
3 Q1 A4 w4 3.5 A4
4 Q2 A1 w5 NaN A1
5 Q2 A2 w6 3.1 A2
6 Q2 A3 w7 2.4 A3
7 Q2 A4 w8 1.7 A4
8 Q2 A5 w9 4.6 A5
9 Q3 A1 w10 NaN A1
10 Q3 A2 w11 3.0 A2
如果只需要按计数器按列分组 question_nummber
(N/A
不是必需的测试)使用 GroupBy.cumcount
作为计数器:
df['new'] = ('A' + df.groupby('question_nummber')
.cumcount()
.add(1)
.astype(str))