Python 中列表理解的多个条件

Question

我想根据索引中的值创建一个列：

如果索引以字母值 而非 'I0'、return“P”开头，否则 return“C”。

尝试过：

df['new_col'] = ['P' if (x[0].isalpha() and not x[0].startswith("I0"))  else 'C' for x in df.index]

但是对于以 'I0':

开头的行，它 returned 'P'


         A           B           C       new_col
Index           
I00001  1.325337    4.692308    1.615385    P
I00002  1.614780    3.615385    0.769231    P
I00003  1.141453    5.461538    2.000000    P
I00004  0.918300    8.538462    2.769231    P
I00005  1.189606    11.846154   2.692308    P
I00006  0.941459    7.153846    2.153846    P
I00007  0.466383    12.153846   9.384615    P
I00008  0.308627    198.692308  23.461538   P
I00011  0.537142    23.384615   6.846154    P
I00012  1.217390    11.923077   1.230769    P
I00013  1.052840    3.384615    2.000000    P
...

可重现的例子：

df = pd.DataFrame({'A': {'I00001': 1.3253365856660808,
  'I00002': 1.6147800817881086,
  'I00003': 1.1414534979918203,
  'I00004': 0.9183004454646491,
  'I00005': 1.1896061362142527,
  'I00006': 0.941459102789141,
  'I00007': 0.46638312473267185,
  'I00008': 0.3086270976042302,
  'I00011': 0.5371419441302684,
  'I00012': 1.2173904641254587,
  'I00013': 1.052839529263679,
  'I00014': 1.3587324409735149,
  'I00015': 3.464101615137755,
  'I00016': 1.1989578808281798,
  'I00018': 0.2433560755649686,
  'I00019': 0.5510000980337852,
  'I00020': 3.464101615137755,
  'I00022': 1.0454523047666737,
  'I00023': 1.3850513878332371,
  'I00024': 1.3314720972390754},
 'B': {'I00001': 4.6923076923076925,
  'I00002': 3.6153846153846154,
  'I00003': 5.461538461538462,
  'I00004': 8.538461538461538,
  'I00005': 11.846153846153847,
  'I00006': 7.153846153846154,
  'I00007': 12.153846153846153,
  'I00008': 198.69230769230768,
  'I00011': 23.384615384615383,
  'I00012': 11.923076923076923,
  'I00013': 3.3846153846153846,
  'I00014': 1.0,
  'I00015': 0.07692307692307693,
  'I00016': 0.6153846153846154,
  'I00018': 481.7692307692308,
  'I00019': 7.3076923076923075,
  'I00020': 0.07692307692307693,
  'I00022': 1.6153846153846154,
  'I00023': 0.5384615384615384,
  'I00024': 12.538461538461538},
 'C': {'I00001': 1.6153846153846154,
  'I00002': 0.7692307692307693,
  'I00003': 2.0,
  'I00004': 2.769230769230769,
  'I00005': 2.6923076923076925,
  'I00006': 2.1538461538461537,
  'I00007': 9.384615384615385,
  'I00008': 23.46153846153846,
  'I00011': 6.846153846153846,
  'I00012': 1.2307692307692308,
  'I00013': 2.0,
  'I00014': 0.38461538461538464,
  'I00015': 0.07692307692307693,
  'I00016': 0.46153846153846156,
  'I00018': 79.07692307692308,
  'I00019': 3.6923076923076925,
  'I00020': 0.07692307692307693,
  'I00022': 1.1538461538461537,
  'I00023': 0.46153846153846156,
  'I00024': 2.3076923076923075}}
)

Answer 1

非循环解决方案 numpy.where:

df['new_col'] = np.where(df.index.str[0].str.isalpha() &
                         ~df.index.str.startswith("I0"), 'P', 'C')

您的解决方案 - 从 x[0].startswith("I0") 中删除 x[0] - 如果不是 I0，它会测试第一个值，总是 True:

df['new_col'] = ['P' if (x[0].isalpha() and not x.startswith("I0"))   
                     else 'C' for x in df.index]

测试:

df = pd.DataFrame({'A': {'AA00001': 1.3253365856660808,
  'I00002': 1.6147800817881086,
  'IR0003': 1.1414534979918203,
  '00004': 0.9183004454646491,
  '**00005': 1.1896061362142527,
  'I00007': 0.46638312473267185}}
)

df['new_col'] = np.where(df.index.str[0].str.isalpha() &
                         ~df.index.str.startswith("I0"), 'P', 'C')

df['new_col1'] = ['P' if (x[0].isalpha() and not x.startswith("I0"))   
                      else 'C' for x in df.index]
print (df)
                A new_col new_col1
**00005  1.189606       C        C
00004    0.918300       C        C
AA00001  1.325337       P        P
I00002   1.614780       C        C
I00007   0.466383       C        C
IR0003   1.141453       P        P

Answer 2

您正在检查代码中的 x[0].startswith("I0")，这是不正确的试试这个（检查 x.startswith("I0")）

df['new_col'] = ['P' if (x[0].isalpha() and not x.startswith("I0"))  else 'C' for x in df.index]

Python 中列表理解的多个条件

Multiple conditions for list comprehension in Python

python

numpy

list-comprehension

pandas