Python Dataframe 找到具有公差的最接近匹配值

Question

我有一个由列表作为元素组成的数据框。我想在给定值的百分比范围内找到最接近的匹配值。我的代码：

df = pd.DataFrame({'A':[[1,2],[4,5,6]]})
df
           A
0     [1, 2]
1  [3, 5, 7]

# in each row, lets find a the values and their index that match 5 with 20% tolerance 
val = 5
tol = 0.2 # find values matching 5 or 20% within 5 (4 or 6)
df['Matching_index'] = (df['A'].map(np.array)-val).map(abs).map(np.argmin)

当前解决方案：

df
           A     Matching_index
0     [1, 2]     1                # 2 matches closely with 5 but this is wrong
1  [4, 5, 6]     1                # 5 matches with 5, correct.

预期解决方案：

df
           A     Matching_index
0     [1, 2]     NaN              # No matching value, hence NaN
1  [4, 5, 6]     1                # 5 matches with 5, correct.

Answer 1

想法是用 val 获取差异，然后如果不匹配公差则替换为缺失值，最后获取 np.nanargmin 如果所有缺失值都会引发错误，因此添加下一个条件 np.any:

def f(x):
    a = np.abs(np.array(x)-val)
    m = a <= val * tol
    return np.nanargmin(np.where(m, a, np.nan)) if m.any() else np.nan
    
df['Matching_index']  = df['A'].map(f)

print (df)
           A  Matching_index
0     [1, 2]             NaN
1  [4, 5, 6]             1.0

Pandas 解法：

df1 = pd.DataFrame(df['A'].tolist(), index=df.index).sub(val).abs()

df['Matching_index'] = df1.where(df1 <= val * tol).dropna(how='all').idxmin(axis=1)

Answer 2

我不确定你想要所有索引还是只需要一个计数器。

试试这个：

import pandas as pd
import numpy as np

df = pd.DataFrame({'A':[[1,2],[4,5,6,7,8]]})

val = 5
tol = 0.3

def closest(arr,val,tol):
    idxs = [ idx for idx,el in enumerate(arr) if (np.abs(el - val) < val*tol)]
    result = len(idxs) if len(idxs) != 0 else np.nan
    return result

df['Matching_index'] = df['A'].apply(closest, args=(val,tol,))
df

如果你想要所有索引，只需 return idxs 而不是 len(idxs).

Python Dataframe 找到具有公差的最接近匹配值

Python Dataframe find closest matching value with a tolerance

python

numpy

list

dataframe

pandas