仅识别数据框中列中的数值 - Python

Question

我想要一个单独的列，如果 “ID”列包含所有数值，returns“是”，如果它包含字母，则 'No'字母数字值。

ID      Result
3965      Yes
wyq8      No
RO_123    No
CMD_      No
2976      Yes

Answer 1

检查 ID 是否包含 non-digits 并使用 ~ 反转布尔选择。使用 np.where，分配选项

df['Result']=np.where(~df.ID.str.contains('(\D+)'),'Yes','N0')

     ID Result
0    3965    Yes
1    wyq8     N0
2  RO_123     N0
3    CMD_     N0
4    2976    Yes

如@Cameron Riddell 所述。您也可以跳过反转布尔值并执行以下操作；

df['Result']=np.where(df.ID.str.contains('(\D+)'),'No','Yes')

Answer 2

您可以使用.isnumeric()方法：

df3["Result"] = df3["ID"].str.isnumeric().apply(lambda x: "No" if x == False else "Yes")

[更新]: 此方法仅适用于整数，其他情况请查看@Ch3steR 答案。

Answer 3

你可以在这里使用pd.Series.str.isnumeric。

df['Result'] = np.where(df['ID'].str.isnumeric(), 'YES', 'NO')

       ID Result
0    3965    YES
1    wyq8     NO
2  RO_123     NO
3    CMD_     NO
4    2976    YES

使用 isnumeric 有一个警告，它不能识别 float 个号码。

test = pd.Series(["9.0", "9"])
test.str.isnumeric()

0    False
1     True
dtype: bool

如果你严格标记YES为int然后使用isnumeric否则你可以在这里使用pd.Series.str.fullmatch（从版本1.1.0开始可用）

df['Result'] = np.where(df['ID'].str.fullmatch(r"\d+|\d+\.\d+", 'YES', 'NO')

对于版本 <1.1.0，您使用 re.fullmatch

numeric_pat = re.compile(r"\d+|\d+\.\d+")
def numeric(val):
    match = numeric_pat.fullmatch(val)
    if match: return 'YES'
    else: return 'NO'

df['Result'] = df['ID'].apply(numeric)

或者我们可以使用pd.to_numeric with boolean masking using pd.Series.isna

m = pd.to_numeric(df['ID'], errors='coerce').isna()
df['Result'] = np.where(m, 'NO', 'YES')

将 errors 参数设置为 'coerce' 无法转换为数值的值将设置为 Nan。

test = pd.Series(['3965', 'wyq8', 'RO_123', 'CMD_', '2976'])
pd.to_numeric(test)

0    3965.0
1       NaN
2       NaN
3       NaN
4    2976.0
Name: ID, dtype: float64

或者您可以构建自定义函数

def numeric(val):
    try:
        float(val)     # Using just `float` would suffice as int can be 
        return 'YES'   # converted to `float` so both `int`
                       # and `float` wouldnot raise any error
    except ValueError:
        return 'NO'

df['Result'] = df['ID'].apply(numeric)

注意：float 也处理科学记数法，float("1e6") -> 1000000.0.

test = pd.Series(['1e6', '1', 'a 10', '1E6'])
test.apply(numeric)

0    YES
1    YES
2     NO
3    YES
dtype: object

仅识别数据框中列中的数值 - Python

Identifying only numeric values from a column in a Data Frame- Python

python

automation

numpy

dataframe

pandas