按数组numpy过滤
filter by array numpy
我正在尝试用我收集的另一个数组(具有相同的值)过滤我的 ndarray
我的主要 ndarray 看起来像
[['Name' 'Col1' 'Count']
['test' '' '413']
['erd' ' ' '60']
...,
['Td1' 'f' '904']
['Td2' 'K' '953']
['Td3' 'r' '111']]
我有另一个包含各种匹配名称的列表
names = ['Td1','test','erd']
我想做什么
我想使用列表名称作为上面 ndarray 的过滤器吗?
我试过的
name_filter = main_ndarray[:,0] == names
这行不通
我的期望
[['Name' 'Col1' 'Count']
['test' '' '413']
['erd' ' ' '60']
['Td1' 'f' '904']]
您也可以使用 filter
功能。
cats_array = numpy.array(
[['Name' ,'Col1', 'Count'],
['test', '' ,'413'],
['erd' ,' ' ,'60'],
['Td1' ,'f' ,'904'],
['Td2' ,'K' ,'953'],
['Td3' ,'r', '111']]
)
names = ['Td1','test','erd']
filter(lambda x: x[0] in names, cats_array)
给出:
[array(['test', '', '413'],
dtype='|S5'), array(['erd', ' ', '60'],
dtype='|S5'), array(['Td1', 'f', '904'],
dtype='|S5')]
考虑对此类数据使用 Pandas:
import pandas as pd
data = [['Name', 'Col1', 'Count'],
['test', '', '413'],
['erd', ' ', '60'],
['Td1', 'f', '904'],
['Td2', 'K', '953'],
['Td3', 'r', '111']]
df = pd.DataFrame(data[1:], columns=data[0])
names = ['Td1','test','erd']
result = df[df.Name.isin(names)]
结果:
>>> df
Name Col1 Count
0 test 413
1 erd 60
2 Td1 f 904
3 Td2 K 953
4 Td3 r 111
>>> result
Name Col1 Count
0 test 413
1 erd 60
2 Td1 f 904
>>>
参考资料
- http://pandas.pydata.org/
- Filter dataframe rows if value in column is in a set list of values
我也会选择@YXD 的 Pandas 解决方案,但为了完整起见,我还提供了一个基于列表理解的简单解决方案:
data = [['Name', 'Col1', 'Count'],
['test', '', '413'],
['erd', ' ', '60'],
['Td1', 'f', '904'],
['Td2', 'K', '953'],
['Td3', 'r', '111']]
names = ['Td1', 'test', 'erd']
# select all sublist of data
res = [l for l in data if l[0] in names]
# insert the first row of data
res.insert(0, data[0])
然后给你想要的输出:
[['Name', 'Col1', 'Count'],
['test', '', '413'],
['erd', ' ', '60'],
['Td1', 'f', '904']]
我正在尝试用我收集的另一个数组(具有相同的值)过滤我的 ndarray
我的主要 ndarray 看起来像
[['Name' 'Col1' 'Count']
['test' '' '413']
['erd' ' ' '60']
...,
['Td1' 'f' '904']
['Td2' 'K' '953']
['Td3' 'r' '111']]
我有另一个包含各种匹配名称的列表
names = ['Td1','test','erd']
我想做什么
我想使用列表名称作为上面 ndarray 的过滤器吗?
我试过的
name_filter = main_ndarray[:,0] == names
这行不通
我的期望
[['Name' 'Col1' 'Count']
['test' '' '413']
['erd' ' ' '60']
['Td1' 'f' '904']]
您也可以使用 filter
功能。
cats_array = numpy.array(
[['Name' ,'Col1', 'Count'],
['test', '' ,'413'],
['erd' ,' ' ,'60'],
['Td1' ,'f' ,'904'],
['Td2' ,'K' ,'953'],
['Td3' ,'r', '111']]
)
names = ['Td1','test','erd']
filter(lambda x: x[0] in names, cats_array)
给出:
[array(['test', '', '413'],
dtype='|S5'), array(['erd', ' ', '60'],
dtype='|S5'), array(['Td1', 'f', '904'],
dtype='|S5')]
考虑对此类数据使用 Pandas:
import pandas as pd
data = [['Name', 'Col1', 'Count'],
['test', '', '413'],
['erd', ' ', '60'],
['Td1', 'f', '904'],
['Td2', 'K', '953'],
['Td3', 'r', '111']]
df = pd.DataFrame(data[1:], columns=data[0])
names = ['Td1','test','erd']
result = df[df.Name.isin(names)]
结果:
>>> df
Name Col1 Count
0 test 413
1 erd 60
2 Td1 f 904
3 Td2 K 953
4 Td3 r 111
>>> result
Name Col1 Count
0 test 413
1 erd 60
2 Td1 f 904
>>>
参考资料
- http://pandas.pydata.org/
- Filter dataframe rows if value in column is in a set list of values
我也会选择@YXD 的 Pandas 解决方案,但为了完整起见,我还提供了一个基于列表理解的简单解决方案:
data = [['Name', 'Col1', 'Count'],
['test', '', '413'],
['erd', ' ', '60'],
['Td1', 'f', '904'],
['Td2', 'K', '953'],
['Td3', 'r', '111']]
names = ['Td1', 'test', 'erd']
# select all sublist of data
res = [l for l in data if l[0] in names]
# insert the first row of data
res.insert(0, data[0])
然后给你想要的输出:
[['Name', 'Col1', 'Count'],
['test', '', '413'],
['erd', ' ', '60'],
['Td1', 'f', '904']]