Python Dataframe 处理两列列表并找到最小值
Python Dataframe process two columns of lists and find minimum
我有一个由列表作为元素组成的数据框。我想从每个列表中减去一个值并找到最小值的索引。我想在另一列中找到每个列表对应的值。
我的代码:
df = pd.DataFrame({'A':[[1,2,3],[1,3,5,6]]})
df
A B
0 [1, 2, 3] [10, 20, 30]
1 [1, 3, 5, 6] [10, 30, 50, 60]
# lets subtract 2 from A, find index of minimum in this result and find corresponding element in the B column
val = 2
df['A_new_min'] = (df['A'].map(np.array)-val).map(abs).map(np.argmin)
df['B_new'] = df[['A_new_min','B']].apply(lambda x: x[1][x[0]],axis=1)
目前的解决方案:它产生了一个正确的解决方案,但我不想存储 A_new_min
,这是不必要的。我正在寻找是否有可能在一行代码中得到这个结果?
df =
A B A_new_min B_new
0 [1, 2, 3] [10, 20, 30] 1 20
1 [1, 3, 5, 6] [10, 30, 50, 60] 0 10
预期解决方案:
我怎样才能直接获得下面的解决方案而不必创建一个额外的和不必要的列A_new_min
?简单来说,我想
df =
A B B_new
0 [1, 2, 3] [10, 20, 30] 20
1 [1, 3, 5, 6] [10, 30, 50, 60] 10
一种选择是使用列表理解:
df['new'] = [arr[i] for i, arr in zip(df['A'].map(np.array).sub(2).abs().map(np.argmin), df['B'])]
另一种选择是完全不转换为 numpy 数组并坚持使用列表:
df['new'] = [b[min(enumerate([abs(x-2) for x in a]), key=lambda x:x[1])[0]] for a,b in zip(df['A'], df['B'])]
输出:
A B new
0 [1, 2, 3] [10, 20, 30] 20
1 [1, 3, 5, 6] [10, 30, 50, 60] 10
与apply
:
df["B_new"] = df.apply(lambda row: row["B"][np.argmin(abs(np.array(row["A"])-val))], axis=1)
>>> df
A B B_new
0 [1, 2, 3] [10, 20, 30] 20
1 [1, 3, 5, 6] [10, 30, 50, 60] 10
IMO,最有效的方法是只使用列表理解。
仅 B_new:
df['B_new'] = [b[min(range(len(a)), key=lambda x: abs(a[x]-val))]
for a,b in zip(df['A'], df['B'])]
输出:
A B B_new
0 [1, 2, 3] [10, 20, 30] 20
1 [1, 3, 5, 6] [10, 30, 50, 60] 10
两列:
df2 = pd.DataFrame([[(i:=min(range(len(a)), key=lambda x: abs(a[x]-val))), b[i]]
for a,b in zip(df['A'], df['B'])], columns=['A_new_min', 'B_new'])
df.join(df2)
输出:
A B A_new_min B_new
0 [1, 2, 3] [10, 20, 30] 1 20
1 [1, 3, 5, 6] [10, 30, 50, 60] 0 10
计时(20 万行)
# @mozway (option #1)
290 ms ± 16.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# @enke (list comprehension)
340 ms ± 16.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# @enke list comprehension + numpy
968 ms ± 17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# @not_speshal
4.12 s ± 246 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
我有一个由列表作为元素组成的数据框。我想从每个列表中减去一个值并找到最小值的索引。我想在另一列中找到每个列表对应的值。
我的代码:
df = pd.DataFrame({'A':[[1,2,3],[1,3,5,6]]})
df
A B
0 [1, 2, 3] [10, 20, 30]
1 [1, 3, 5, 6] [10, 30, 50, 60]
# lets subtract 2 from A, find index of minimum in this result and find corresponding element in the B column
val = 2
df['A_new_min'] = (df['A'].map(np.array)-val).map(abs).map(np.argmin)
df['B_new'] = df[['A_new_min','B']].apply(lambda x: x[1][x[0]],axis=1)
目前的解决方案:它产生了一个正确的解决方案,但我不想存储 A_new_min
,这是不必要的。我正在寻找是否有可能在一行代码中得到这个结果?
df =
A B A_new_min B_new
0 [1, 2, 3] [10, 20, 30] 1 20
1 [1, 3, 5, 6] [10, 30, 50, 60] 0 10
预期解决方案:
我怎样才能直接获得下面的解决方案而不必创建一个额外的和不必要的列A_new_min
?简单来说,我想
df =
A B B_new
0 [1, 2, 3] [10, 20, 30] 20
1 [1, 3, 5, 6] [10, 30, 50, 60] 10
一种选择是使用列表理解:
df['new'] = [arr[i] for i, arr in zip(df['A'].map(np.array).sub(2).abs().map(np.argmin), df['B'])]
另一种选择是完全不转换为 numpy 数组并坚持使用列表:
df['new'] = [b[min(enumerate([abs(x-2) for x in a]), key=lambda x:x[1])[0]] for a,b in zip(df['A'], df['B'])]
输出:
A B new
0 [1, 2, 3] [10, 20, 30] 20
1 [1, 3, 5, 6] [10, 30, 50, 60] 10
与apply
:
df["B_new"] = df.apply(lambda row: row["B"][np.argmin(abs(np.array(row["A"])-val))], axis=1)
>>> df
A B B_new
0 [1, 2, 3] [10, 20, 30] 20
1 [1, 3, 5, 6] [10, 30, 50, 60] 10
IMO,最有效的方法是只使用列表理解。
仅B_new:
df['B_new'] = [b[min(range(len(a)), key=lambda x: abs(a[x]-val))]
for a,b in zip(df['A'], df['B'])]
输出:
A B B_new
0 [1, 2, 3] [10, 20, 30] 20
1 [1, 3, 5, 6] [10, 30, 50, 60] 10
两列:
df2 = pd.DataFrame([[(i:=min(range(len(a)), key=lambda x: abs(a[x]-val))), b[i]]
for a,b in zip(df['A'], df['B'])], columns=['A_new_min', 'B_new'])
df.join(df2)
输出:
A B A_new_min B_new
0 [1, 2, 3] [10, 20, 30] 1 20
1 [1, 3, 5, 6] [10, 30, 50, 60] 0 10
计时(20 万行)
# @mozway (option #1)
290 ms ± 16.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# @enke (list comprehension)
340 ms ± 16.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# @enke list comprehension + numpy
968 ms ± 17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# @not_speshal
4.12 s ± 246 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)