在 Pandas 中,如何检查三个组合字符串列是否 == 10 个字符,如果是,则插入到新列中?
In Pandas, how do I check if three combined string columns == 10 characters, and if so, insert into new column?
我想将三个 Pandas 字符串列组合成一个新列,如果新列的总组合字符数等于 10。
如果不等于 10,则检查下一个组合列。
如果 Phone1Area 等于 3 个字符串字符,Phone1Prefix 等于 3 个字符串字符,Phone1NumberPart 等于 4 个字符串字符,换句话说,总共十个字符,我尝试将这些列一起添加到一个新列中。如果它们是 3+3+4 个字符,我尝试添加列,df.loc,以及更多东西。
这是数据集的示例:
代码如下:
dfp['p1'] = df[(df['Phone1Area'].str.len() == 3.0)]['Phone1Area'] +
df[(df['Phone1Exchange'].str.len() == 3.0)]['Phone1Exchange'] +
df[(df['Phone1NumberPart'].str.len() == 4.0)]['Phone1NumberPart']
dfp['p2'] = df[(df['Phone2Area'].str.len() == 3.0)]['Phone2Area'] +
df[(df['Phone2Exchange'].str.len() == 3.0)]['Phone2Exchange'] +
df[(df['Phone2NumberPart'].str.len() == 4.0)]['Phone2NumberPart']
df_phone.loc[df_phone['p1'].str.len() == 10, 'phone'] = df_phone['p1']
df_phone.loc[df_phone['p2'].str.len() == 10, 'phone'] = df_phone['p2']
这是我想要它做的,但在 Pandas:
if df_phone['p1'].str.len() == 10:
then insert df_phone['p1'] into df_phone['phone']
elif df_phone['p2'].str.len() == 10:
then insert df_phone['p2'] into df_phone['phone']
elif df_phone['p3'].str.len() == 10:
then insert df_phone['p3'] into df_phone['phone']
我希望 phone 列包含来自 phone 1 的 10 个字符,如果不是 10 个字符,那么 phone 列包含来自 phone 的 10 个字符phone2等
但其中一个结果是:
AttributeError: 'DataFrame' object has no attribute 'str'
知道如何解决这个问题吗?
这应该有帮助:
df['phone'] = ''
df['test_phone'] = df['phone1Area'] + df['phone1Exchange'] + df['phone1NumberPart']
df['phone'][df['test_phone'].str.len() == 10] = df['test_phone']
df['test_phone'] = df['phone2Area'] + df['phone2Exchange'] + df['phone2NumberPart']
df['phone'][(df['test_phone'].str.len() == 10) & (df['phone'] == '')] = df['test_phone']
df['test_phone'] = df['phone3Area'] + df['phone3Exchange'] + df['phone3NumberPart']
df['phone'][(df['test_phone'].str.len() == 10) & (df['phone'] == '')] = df['test_phone']
etc.
另一个使用 np.select
的解决方案应该更快:
conditions = [df_phone['p1'].str.len() == 10, df_phone['p2'].str.len() == 10,\
df_phone['p3'].str.len() == 10]
choices = [df_phone['p1'], df_phone['p2'], df_phone['p3']]
df_phone['phone'] = np.select(conditions, choices, default = '')
Documentation:
- np.select: pick the choice for the first
True
value encountered in conditions. If only False
, fill with default
.
我想将三个 Pandas 字符串列组合成一个新列,如果新列的总组合字符数等于 10。
如果不等于 10,则检查下一个组合列。
如果 Phone1Area 等于 3 个字符串字符,Phone1Prefix 等于 3 个字符串字符,Phone1NumberPart 等于 4 个字符串字符,换句话说,总共十个字符,我尝试将这些列一起添加到一个新列中。如果它们是 3+3+4 个字符,我尝试添加列,df.loc,以及更多东西。
这是数据集的示例:
代码如下:
dfp['p1'] = df[(df['Phone1Area'].str.len() == 3.0)]['Phone1Area'] +
df[(df['Phone1Exchange'].str.len() == 3.0)]['Phone1Exchange'] +
df[(df['Phone1NumberPart'].str.len() == 4.0)]['Phone1NumberPart']
dfp['p2'] = df[(df['Phone2Area'].str.len() == 3.0)]['Phone2Area'] +
df[(df['Phone2Exchange'].str.len() == 3.0)]['Phone2Exchange'] +
df[(df['Phone2NumberPart'].str.len() == 4.0)]['Phone2NumberPart']
df_phone.loc[df_phone['p1'].str.len() == 10, 'phone'] = df_phone['p1']
df_phone.loc[df_phone['p2'].str.len() == 10, 'phone'] = df_phone['p2']
这是我想要它做的,但在 Pandas:
if df_phone['p1'].str.len() == 10:
then insert df_phone['p1'] into df_phone['phone']
elif df_phone['p2'].str.len() == 10:
then insert df_phone['p2'] into df_phone['phone']
elif df_phone['p3'].str.len() == 10:
then insert df_phone['p3'] into df_phone['phone']
我希望 phone 列包含来自 phone 1 的 10 个字符,如果不是 10 个字符,那么 phone 列包含来自 phone 的 10 个字符phone2等
但其中一个结果是:
AttributeError: 'DataFrame' object has no attribute 'str'
知道如何解决这个问题吗?
这应该有帮助:
df['phone'] = ''
df['test_phone'] = df['phone1Area'] + df['phone1Exchange'] + df['phone1NumberPart']
df['phone'][df['test_phone'].str.len() == 10] = df['test_phone']
df['test_phone'] = df['phone2Area'] + df['phone2Exchange'] + df['phone2NumberPart']
df['phone'][(df['test_phone'].str.len() == 10) & (df['phone'] == '')] = df['test_phone']
df['test_phone'] = df['phone3Area'] + df['phone3Exchange'] + df['phone3NumberPart']
df['phone'][(df['test_phone'].str.len() == 10) & (df['phone'] == '')] = df['test_phone']
etc.
另一个使用 np.select
的解决方案应该更快:
conditions = [df_phone['p1'].str.len() == 10, df_phone['p2'].str.len() == 10,\
df_phone['p3'].str.len() == 10]
choices = [df_phone['p1'], df_phone['p2'], df_phone['p3']]
df_phone['phone'] = np.select(conditions, choices, default = '')
Documentation:
- np.select: pick the choice for the first
True
value encountered in conditions. If onlyFalse
, fill withdefault
.