检查数据框列是否已填充并按字符串搜索
Checking if the dataframe column is filled and searching by string
我有以下数据框:
import pandas as pd
import re
df = pd.DataFrame({'Column_01': ['Press', 'Temp', '', 'Strain gauge', 'Ultrassonic', ''],
'Column_02': ['five', 'two', 'five', 'five', 'three', 'three']})
我想先确认 'Column_01' 是否已填满。
如果 'Columns_01' 被填充或 'Column_02' 包含单词 'one'、'two'、'three'。新列(分类器)将收到 'SENSOR'.
为了识别 'Column_02' 字符串,我执行了以下代码:
df['Classifier'] = df.apply(lambda x: 'SENSOR'
if re.search(r'one|two|three', x['Column_02'])
else 'Nan', axis = 1)
此代码有效。它完美地找到了数据框行上的字符串。但是,我还需要检查 'Column_01' 是否已填满。我无法使用函数 notnull() 来解决问题。
我希望输出为:
Column_01 Column_02 Classifier
Press five SENSOR #current line of Column_01 completed
Temp two SENSOR #current line of Column_02 completed; string 'two'
five Nan
Strain gauge five SENSOR #current line of Column_01 completed
Ultrassonic three SENSOR #current line of Column_01 completed; string 'three'
three SENSOR #string 'three'
通常您应该避免 .apply()
(参考 )。
这应该可以解决问题:
import numpy as np
df["Classifier"]=np.where(df["Column_01"].fillna('').ne('')|df["Column_02"].str.contains("(one)|(two)|(three)"), "SENSOR", np.nan)
输出:
Column_01 Column_02 Classifier
0 Press five SENSOR
1 Temp two SENSOR
2 five nan
3 Strain gauge five SENSOR
4 Ultrassonic three SENSOR
5 three SENSOR
我有以下数据框:
import pandas as pd
import re
df = pd.DataFrame({'Column_01': ['Press', 'Temp', '', 'Strain gauge', 'Ultrassonic', ''],
'Column_02': ['five', 'two', 'five', 'five', 'three', 'three']})
我想先确认 'Column_01' 是否已填满。 如果 'Columns_01' 被填充或 'Column_02' 包含单词 'one'、'two'、'three'。新列(分类器)将收到 'SENSOR'.
为了识别 'Column_02' 字符串,我执行了以下代码:
df['Classifier'] = df.apply(lambda x: 'SENSOR'
if re.search(r'one|two|three', x['Column_02'])
else 'Nan', axis = 1)
此代码有效。它完美地找到了数据框行上的字符串。但是,我还需要检查 'Column_01' 是否已填满。我无法使用函数 notnull() 来解决问题。
我希望输出为:
Column_01 Column_02 Classifier
Press five SENSOR #current line of Column_01 completed
Temp two SENSOR #current line of Column_02 completed; string 'two'
five Nan
Strain gauge five SENSOR #current line of Column_01 completed
Ultrassonic three SENSOR #current line of Column_01 completed; string 'three'
three SENSOR #string 'three'
通常您应该避免 .apply()
(参考
这应该可以解决问题:
import numpy as np
df["Classifier"]=np.where(df["Column_01"].fillna('').ne('')|df["Column_02"].str.contains("(one)|(two)|(three)"), "SENSOR", np.nan)
输出:
Column_01 Column_02 Classifier
0 Press five SENSOR
1 Temp two SENSOR
2 five nan
3 Strain gauge five SENSOR
4 Ultrassonic three SENSOR
5 three SENSOR