检查列值是否在列表中并报告到新列
Check if a column value is in a list and report to a new column
经过讨论,我有以下数据框:
data = {'Item':['1', '2', '3', '4', '5'],
'Len':[142, 11, 50, 60, 12],
'Hei':[55, 65, 130, 14, 69],
'C':[68, -18, 65, 16, 17],
'Thick':[60, 0, -150, 170, 130],
'Vol':[230, 200, -500, 10, 160]
'Fail':[['Len', 'Thick'], ['Thick'], ['Hei', 'Thick', 'Vol'], ['Vol'], ""}
df = pd.DataFrame(data)
表示不同的项目以及与其某些参数相关的相应值(Le、Hei、C、...)。在失败列中报告了失败的参数,例如。 G。第 1 项因参数 Len 和 Thick 而失败,第 3 项因参数 B、Thick 和 Vol 而失败,而第 4 项则没有失败。
对于每个项目,我需要一个新列,其中报告失败的参数及其值,格式如下:failed parameter = value。因此,对于第一项,我应该得到 Len=142 和 Thick=60。
至此,我已经将失败列分解为多列:
failed_param = df['Fail'].apply(pd.Series)
failed_param = failed_param.rename(columns = lambda x : 'Failed_param_' + str(x +1 ))
df2_list = failed_param.columns.values.tolist()
df2 = pd.concat([df[:], failed_param[:]], axis=1)
然后,如果我执行以下操作:
for name in df2_list:
df2.loc[df2[f"{name}"] == "D", "new"] = "D"+ "=" + df2["D"].map(str)
我可以得到我需要的,但只有一个参数(在本例中为 D)。我怎样才能一次获得所有参数的相同值?
如问题中所述,您需要插入一个包含字符串列表的新列(例如,FailParams
)。每个字符串代表项目的失败(例如,Len=142,Thick=60
)。一个快速的解决方案可以是:
import pandas as pd
data = {
'Item' : ['1', '2', '3', '4', '5'],
'Len' : [142, 11, 50, 60, 12],
'Hei' : [55, 65, 130, 14, 69],
'C' : [68, -18, 65, 16, 17],
'Thick': [60, 0, -150, 170, 130],
'Vol' : [230, 200, -500, 10, 160],
'Fail' : [['Len', 'Thick'], ['Thick'], ['Hei', 'Thick', 'Vol'], ['Vol'], []]
}
# Convert the dictionary into a DataFrame.
df = pd.DataFrame(data)
# The first solution: using list comprehension.
column = [
",".join( # Add commas between the list items.
# Find the target items and their values.
[el + "=" + str(df.loc[int(L[0]) - 1, el]) for el in L[1]]
)
if (len(L[1]) > 0) else "" # If the Fail inner is empty, return an empty string.
for L in zip(df['Item'].values, df['Fail'].values) # Loop on the Fail items.
]
# Insert the new column.
df['FailParams'] = column
# Print the DF after insertion.
print(df)
之前的解决方案是使用列表理解添加的。另一个使用循环的解决方案可以是:
# The second solution: using loops.
records = []
for L in zip(df['Item'].values, df['Fail'].values):
if (len(L[1]) <= 0):
record = ""
else:
record = ",".join([el + "=" + str(df.loc[int(L[0]) - 1, el]) for el in L[1]])
records.append(record)
print(records)
# Insert the new column.
df['FailParams'] = records
# Print the DF after insertion.
print(df)
样本输出应该是:
Item Len Hei C Thick Vol Fail FailParams
0 1 142 55 68 60 230 [Len, Thick] Len=142,Thick=60
1 2 11 65 -18 0 200 [Thick] Thick=0
2 3 50 130 65 -150 -500 [Hei, Thick, Vol] Hei=130,Thick=-150,Vol=-500
3 4 60 14 16 170 10 [Vol] Vol=10
4 5 12 69 17 130 160 []
首先构建一个中间表示可能是个好主意,像这样(我假设 Fail
列中的空单元格是一个空列表 []
以便匹配其他值的数据类型):
# create a Boolean mask to filter failed values
m = df.apply(lambda row: row.index.isin(row.Fail),
axis=1,
result_type='broadcast')
>>> df[m]
Item Len Hei C Thick Vol Fail
0 NaN 142.0 NaN NaN 60.0 NaN NaN
1 NaN NaN NaN NaN 0.0 NaN NaN
2 NaN NaN 130.0 NaN -150.0 -500.0 NaN
3 NaN NaN NaN NaN NaN 10.0 NaN
4 NaN NaN NaN NaN NaN NaN NaN
这也允许您对失败的值进行实际操作。
有了这个,生成值列表可以通过类似于 Hossam Magdy Balaha 的答案的方式来完成,也许有一点功能:
def join_params(row):
row = row.dropna().to_dict()
return ', '.join(f'{k}={v}' for k,v in row.items())
>>> df[m].apply(join_params, axis=1)
0 Len=142.0, Thick=60.0
1 Thick=0.0
2 Hei=130.0, Thick=-150.0, Vol=-500.0
3 Vol=10.0
4
dtype: object
经过
data = {'Item':['1', '2', '3', '4', '5'],
'Len':[142, 11, 50, 60, 12],
'Hei':[55, 65, 130, 14, 69],
'C':[68, -18, 65, 16, 17],
'Thick':[60, 0, -150, 170, 130],
'Vol':[230, 200, -500, 10, 160]
'Fail':[['Len', 'Thick'], ['Thick'], ['Hei', 'Thick', 'Vol'], ['Vol'], ""}
df = pd.DataFrame(data)
表示不同的项目以及与其某些参数相关的相应值(Le、Hei、C、...)。在失败列中报告了失败的参数,例如。 G。第 1 项因参数 Len 和 Thick 而失败,第 3 项因参数 B、Thick 和 Vol 而失败,而第 4 项则没有失败。 对于每个项目,我需要一个新列,其中报告失败的参数及其值,格式如下:failed parameter = value。因此,对于第一项,我应该得到 Len=142 和 Thick=60。 至此,我已经将失败列分解为多列:
failed_param = df['Fail'].apply(pd.Series)
failed_param = failed_param.rename(columns = lambda x : 'Failed_param_' + str(x +1 ))
df2_list = failed_param.columns.values.tolist()
df2 = pd.concat([df[:], failed_param[:]], axis=1)
然后,如果我执行以下操作:
for name in df2_list:
df2.loc[df2[f"{name}"] == "D", "new"] = "D"+ "=" + df2["D"].map(str)
我可以得到我需要的,但只有一个参数(在本例中为 D)。我怎样才能一次获得所有参数的相同值?
如问题中所述,您需要插入一个包含字符串列表的新列(例如,FailParams
)。每个字符串代表项目的失败(例如,Len=142,Thick=60
)。一个快速的解决方案可以是:
import pandas as pd
data = {
'Item' : ['1', '2', '3', '4', '5'],
'Len' : [142, 11, 50, 60, 12],
'Hei' : [55, 65, 130, 14, 69],
'C' : [68, -18, 65, 16, 17],
'Thick': [60, 0, -150, 170, 130],
'Vol' : [230, 200, -500, 10, 160],
'Fail' : [['Len', 'Thick'], ['Thick'], ['Hei', 'Thick', 'Vol'], ['Vol'], []]
}
# Convert the dictionary into a DataFrame.
df = pd.DataFrame(data)
# The first solution: using list comprehension.
column = [
",".join( # Add commas between the list items.
# Find the target items and their values.
[el + "=" + str(df.loc[int(L[0]) - 1, el]) for el in L[1]]
)
if (len(L[1]) > 0) else "" # If the Fail inner is empty, return an empty string.
for L in zip(df['Item'].values, df['Fail'].values) # Loop on the Fail items.
]
# Insert the new column.
df['FailParams'] = column
# Print the DF after insertion.
print(df)
之前的解决方案是使用列表理解添加的。另一个使用循环的解决方案可以是:
# The second solution: using loops.
records = []
for L in zip(df['Item'].values, df['Fail'].values):
if (len(L[1]) <= 0):
record = ""
else:
record = ",".join([el + "=" + str(df.loc[int(L[0]) - 1, el]) for el in L[1]])
records.append(record)
print(records)
# Insert the new column.
df['FailParams'] = records
# Print the DF after insertion.
print(df)
样本输出应该是:
Item Len Hei C Thick Vol Fail FailParams
0 1 142 55 68 60 230 [Len, Thick] Len=142,Thick=60
1 2 11 65 -18 0 200 [Thick] Thick=0
2 3 50 130 65 -150 -500 [Hei, Thick, Vol] Hei=130,Thick=-150,Vol=-500
3 4 60 14 16 170 10 [Vol] Vol=10
4 5 12 69 17 130 160 []
首先构建一个中间表示可能是个好主意,像这样(我假设 Fail
列中的空单元格是一个空列表 []
以便匹配其他值的数据类型):
# create a Boolean mask to filter failed values
m = df.apply(lambda row: row.index.isin(row.Fail),
axis=1,
result_type='broadcast')
>>> df[m]
Item Len Hei C Thick Vol Fail
0 NaN 142.0 NaN NaN 60.0 NaN NaN
1 NaN NaN NaN NaN 0.0 NaN NaN
2 NaN NaN 130.0 NaN -150.0 -500.0 NaN
3 NaN NaN NaN NaN NaN 10.0 NaN
4 NaN NaN NaN NaN NaN NaN NaN
这也允许您对失败的值进行实际操作。
有了这个,生成值列表可以通过类似于 Hossam Magdy Balaha 的答案的方式来完成,也许有一点功能:
def join_params(row):
row = row.dropna().to_dict()
return ', '.join(f'{k}={v}' for k,v in row.items())
>>> df[m].apply(join_params, axis=1)
0 Len=142.0, Thick=60.0
1 Thick=0.0
2 Hei=130.0, Thick=-150.0, Vol=-500.0
3 Vol=10.0
4
dtype: object