同时使用 zip 和百分位数 - Python

Question

我正在使用一个熊猫数据框，其中我将列表作为单个单元格元素（用于几列）。我想检查一列列表中每个元素的条件，并 select 其他列中的相应列表元素。我知道这可以使用 zip 命令轻松完成，例如：

p = 5 ; q = 6; DF['Column3'] = [[b for a, b in zip(x, y) if a > p and a <q ] for x, y in zip(DF['Column1'], DF['Column2'])] 但是我不确定如何在这里使用百分位数，即我想使用列表的某个百分位数而不是固定的 p 和 q（比如 50（相当于 p）百分位数到 90 个百分位数（相当于 q））。

因此，对于一列中的每个单元格（每个单元格组成一个列表），它应该计算分位数值并检查另一列中其他列表（来自相应单元格）的相应列表元素。

举例说明问题（假设是DF）：

	A	B	Column1	Column2
0	3.4	5.7	[2.1, 2.9, 5.2, 6.8]	[2.5,3.4,1.2,5.1]
1	4	1.7	[1.1, 2.5, 5.6, 11.5, 15.6, 21.5]	[12.15,1.58,5.4,1.2,34.2,67.2]

50-Percentile DF['Column1'][0] 为 4.05，90-Percentile 为 6.32。因此，Column1 中的第三个值满足此条件。 Column2中这个值对应的是1.2。因此 Column3 应该有一个 1.2 的输出列表。下一行也有类似的程序（p = 8.55，q = 18.55）：

	A	B	Column1	Column2	Column3
0	3.4	5.7	[2.1, 2.9, 5.2, 6.8]	[2.5,3.4,1.2,5.1]	[1.2]
1	4	1.7	[1.1, 2.5, 5.6, 11.5, 15.6, 21.5]	[12.15,1.58,5.4,1.2,34.2,67.2]	[1.2,34.2]

Answer 1

您可以使用 numpy.percentile 获取范围内的两个值。然后，跨列使用列表理解（通过传递 axis=1）。

作为一个班轮，你可以这样做：

df['Column3'] = (df.assign(Column3=df['Column1'].apply(lambda x: np.percentile(x, [50, 90])))
                   .apply(lambda x: [b for (a,b) in zip(x['Column1'], x['Column2']) 
                                     if x['Column3'][0] < a < x['Column3'][1]], axis=1))

将步骤分解为更详细的内容：

df = pd.DataFrame(
{'A' : [3.4,4],
'B' : [5.7, 1.7],
'Column1' : [[2.1, 2.9, 5.2, 6.8], [1.1, 2.5, 5.6, 11.5, 15.6, 21.5]],
'Column2' : [[2.5,3.4,1.2,5.1],[12.15,1.58,5.4,1.2,34.2,67.2]]})
df['Column3'] = df['Column1'].apply(lambda x: np.percentile(x, 50))
df['Column4'] = df['Column1'].apply(lambda x: np.percentile(x, 90))
df['Column5'] = df.apply(lambda x: [b for (a,b) in zip(x['Column1'], x['Column2']) 
                                    if x['Column3'] < a < x['Column4']], axis=1)
df
Out[1]: 
     A    B                            Column1  \
0  3.4  5.7               [2.1, 2.9, 5.2, 6.8]   
1  4.0  1.7  [1.1, 2.5, 5.6, 11.5, 15.6, 21.5]   

                               Column2  Column3  Column4      Column5  
0                 [2.5, 3.4, 1.2, 5.1]     4.05     6.32        [1.2]  
1  [12.15, 1.58, 5.4, 1.2, 34.2, 67.2]     8.55    18.55  [1.2, 34.2]

从那里，您可以：

 df = df.drop(['Column3', 'Column4'], axis=1)

同时使用 zip 和百分位数 - Python

Using zip and percentile together - Python

python

zip

percentile

pandas