检查列值是否在列表中并报告到新列

Question

经过讨论，我有以下数据框：

data = {'Item':['1', '2', '3', '4', '5'], 
'Len':[142, 11, 50, 60, 12], 
'Hei':[55, 65, 130, 14, 69],
'C':[68, -18, 65, 16, 17],
'Thick':[60, 0, -150, 170, 130],
'Vol':[230, 200, -500, 10, 160]
'Fail':[['Len', 'Thick'], ['Thick'], ['Hei', 'Thick', 'Vol'], ['Vol'], ""}

df = pd.DataFrame(data)

表示不同的项目以及与其某些参数相关的相应值（Le、Hei、C、...）。在失败列中报告了失败的参数，例如。 G。第 1 项因参数 Len 和 Thick 而失败，第 3 项因参数 B、Thick 和 Vol 而失败，而第 4 项则没有失败。对于每个项目，我需要一个新列，其中报告失败的参数及其值，格式如下：failed parameter = value。因此，对于第一项，我应该得到 Len=142 和 Thick=60。至此，我已经将失败列分解为多列：

failed_param = df['Fail'].apply(pd.Series)
failed_param = failed_param.rename(columns = lambda x : 'Failed_param_' + str(x +1 ))
df2_list = failed_param.columns.values.tolist()
df2 = pd.concat([df[:], failed_param[:]], axis=1)

然后，如果我执行以下操作：

for name in df2_list:
    df2.loc[df2[f"{name}"] == "D", "new"] = "D"+ "=" + df2["D"].map(str)

我可以得到我需要的，但只有一个参数（在本例中为 D）。我怎样才能一次获得所有参数的相同值？

Answer 1

如问题中所述，您需要插入一个包含字符串列表的新列（例如，FailParams）。每个字符串代表项目的失败（例如，Len=142,Thick=60）。一个快速的解决方案可以是：

import pandas as pd

data = {
  'Item' : ['1', '2', '3', '4', '5'],
  'Len'  : [142, 11, 50, 60, 12],
  'Hei'  : [55, 65, 130, 14, 69],
  'C'    : [68, -18, 65, 16, 17],
  'Thick': [60, 0, -150, 170, 130],
  'Vol'  : [230, 200, -500, 10, 160],
  'Fail' : [['Len', 'Thick'], ['Thick'], ['Hei', 'Thick', 'Vol'], ['Vol'], []]
}

# Convert the dictionary into a DataFrame.
df = pd.DataFrame(data)

# The first solution: using list comprehension.
column = [
  ",".join(  # Add commas between the list items.
    # Find the target items and their values.
    [el + "=" + str(df.loc[int(L[0]) - 1, el]) for el in L[1]]
  )
  if (len(L[1]) > 0) else ""  # If the Fail inner is empty, return an empty string.
  for L in zip(df['Item'].values, df['Fail'].values)  # Loop on the Fail items.
]

# Insert the new column.
df['FailParams'] = column

# Print the DF after insertion.
print(df)

之前的解决方案是使用列表理解添加的。另一个使用循环的解决方案可以是：

# The second solution: using loops.
records = []
for L in zip(df['Item'].values, df['Fail'].values):
  if (len(L[1]) <= 0):
    record = ""
  else:
    record = ",".join([el + "=" + str(df.loc[int(L[0]) - 1, el]) for el in L[1]])
  records.append(record)
print(records)

# Insert the new column.
df['FailParams'] = records

# Print the DF after insertion.
print(df)

样本输出应该是：

  Item  Len  Hei   C  Thick  Vol               Fail                   FailParams
0    1  142   55  68     60  230       [Len, Thick]             Len=142,Thick=60
1    2   11   65 -18      0  200            [Thick]                      Thick=0
2    3   50  130  65   -150 -500  [Hei, Thick, Vol]  Hei=130,Thick=-150,Vol=-500
3    4   60   14  16    170   10              [Vol]                       Vol=10
4    5   12   69  17    130  160                 []

Answer 2

首先构建一个中间表示可能是个好主意，像这样（我假设 Fail 列中的空单元格是一个空列表 [] 以便匹配其他值的数据类型):

# create a Boolean mask to filter failed values
m = df.apply(lambda row: row.index.isin(row.Fail), 
             axis=1, 
             result_type='broadcast')

>>> df[m]
  Item    Len    Hei   C  Thick    Vol Fail
0  NaN  142.0    NaN NaN   60.0    NaN  NaN
1  NaN    NaN    NaN NaN    0.0    NaN  NaN
2  NaN    NaN  130.0 NaN -150.0 -500.0  NaN
3  NaN    NaN    NaN NaN    NaN   10.0  NaN
4  NaN    NaN    NaN NaN    NaN    NaN  NaN

这也允许您对失败的值进行实际操作。

有了这个，生成值列表可以通过类似于 Hossam Magdy Balaha 的答案的方式来完成，也许有一点功能：

def join_params(row):
    row = row.dropna().to_dict()
    return ', '.join(f'{k}={v}' for k,v in row.items())

>>> df[m].apply(join_params, axis=1)
0                  Len=142.0, Thick=60.0
1                              Thick=0.0
2    Hei=130.0, Thick=-150.0, Vol=-500.0
3                               Vol=10.0
4                                       
dtype: object

检查列值是否在列表中并报告到新列

Check if a column value is in a list and report to a new column

python

comparison

filtering

dataframe

pandas