pandas if else 多列条件使用dataframe
pandas if else conditions for multiple columns using dataframe
我有数据框,我想对数据框中的字符串列值使用应用函数或 lambda 函数,以对列应用 if-else 条件。我试过 for 循环迭代
Input Dataframe
text1 output_column
['bread','bread','bread'] ['bread] --> [ if count values >2 ]
['bread','butter','jam'] ['butter']--> [if all 3 values are unique select 1st element value as output]
['bread','jam','jam'] ['jam']--> [if count values >2]
['unknown'] ['unknown'] --> [if any of the value came as blank or null mark it as 'unknown']
################## I tried below lines of code#########
output_column=[]
df_value = df[['text_col1','text_col2','text_col3']].values.tolist()
if np.all(df_value <= 1):
output_column.append(df_value[1])
else:
output_column.append(max_count[np.argmax(df_value)])
output Dataframe
text1 output_column
['bread','bread','bread'] ['bread']
['bread','butter','jam'] ['butter']
['bread','jam','jam'] ['jam']
['unknown'] ['unknown']
import pandas as pd
df = pd.DataFrame({'text1': [['bread', 'bread', 'bread'],
['bread', 'butter', 'jam'],
['bread', 'jam', 'jam'],
['unknown']]})
列表单元格不好,所以让我们explode
它们:
df = df.explode('text1')
>>> df.head()
text1
0 bread
0 bread
0 bread
1 bread
1 butter
现在您可以使用 groupby
将函数应用于每个文档(通过按索引级别 0 分组)。
启发式的细节由您决定,但这里有一些事情可以开始:
def get_values(s):
counts = s.value_counts()
if "unknown" in counts:
return "unknown"
if counts.eq(1).all():
return s.iloc[1]
if counts.max() >= 2:
return counts.idxmax()
应用于各组:
>>> df.groupby(level=0).text1.apply(get_values)
0 bread
1 butter
2 jam
3 unknown
Name: text1, dtype: object
我有数据框,我想对数据框中的字符串列值使用应用函数或 lambda 函数,以对列应用 if-else 条件。我试过 for 循环迭代
Input Dataframe
text1 output_column
['bread','bread','bread'] ['bread] --> [ if count values >2 ]
['bread','butter','jam'] ['butter']--> [if all 3 values are unique select 1st element value as output]
['bread','jam','jam'] ['jam']--> [if count values >2]
['unknown'] ['unknown'] --> [if any of the value came as blank or null mark it as 'unknown']
################## I tried below lines of code#########
output_column=[]
df_value = df[['text_col1','text_col2','text_col3']].values.tolist()
if np.all(df_value <= 1):
output_column.append(df_value[1])
else:
output_column.append(max_count[np.argmax(df_value)])
output Dataframe
text1 output_column
['bread','bread','bread'] ['bread']
['bread','butter','jam'] ['butter']
['bread','jam','jam'] ['jam']
['unknown'] ['unknown']
import pandas as pd
df = pd.DataFrame({'text1': [['bread', 'bread', 'bread'],
['bread', 'butter', 'jam'],
['bread', 'jam', 'jam'],
['unknown']]})
列表单元格不好,所以让我们explode
它们:
df = df.explode('text1')
>>> df.head()
text1
0 bread
0 bread
0 bread
1 bread
1 butter
现在您可以使用 groupby
将函数应用于每个文档(通过按索引级别 0 分组)。
启发式的细节由您决定,但这里有一些事情可以开始:
def get_values(s):
counts = s.value_counts()
if "unknown" in counts:
return "unknown"
if counts.eq(1).all():
return s.iloc[1]
if counts.max() >= 2:
return counts.idxmax()
应用于各组:
>>> df.groupby(level=0).text1.apply(get_values)
0 bread
1 butter
2 jam
3 unknown
Name: text1, dtype: object