Select 来自 rowname 的数据帧行使用不区分大小写(如 `grep -i`)
Select dataframe row from rowname using case-insensitive (like `grep -i`)
我有一个如下所示的数据框:
In [1]: mydict = {"1421293_at Hdgfl1":[2.140412,1.143337,3.260313],
"1429877_at Lrriq3":[9.019368,0.874524,2.051820]}
In [3]: import pandas as pd
In [4]: df = pd.DataFrame.from_dict(mydict, orient='index')
In [5]: df
Out[5]:
0 1 2
1421293_at Hdgfl1 2.140412 1.143337 3.260313
1429877_at Lrriq3 9.019368 0.874524 2.051820
我想要做的是 select 使用不区分大小写的查询从行名中提取行。
例如给定查询 "hdgfl1" 它应该 return:
0 1 2
1421293_at Hdgfl1 2.140412 1.143337 3.260313
"hdgfl1" 是“1421293_at Hdgfl1”的不区分大小写的查询。基本等同于grep -i
.
有什么方法可以做到?
你可以这样做:
query = 'hdgfl1'
mask = df.index.to_series().str.contains(query, case=False)
df[mask]
另一种可能性是:
mask = df.reset_index()['index'].str.contains(query, case=False)
但这慢了 2 倍。
In [229]: df.filter(regex=r'(?i)hdgfl1', axis=0)
Out[229]:
0 1 2
1421293_at Hdgfl1 2.140412 1.143337 3.260313
并使用 select():
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import re
mydict = {
"1421293_at Hdgfl1":[2.140412,1.143337,3.260313],
"1429877_at Lrriq3":[ 9.019368,0.874524,2.051820],
"1421293_at hDGFl1":[2.140412,1.143337,3.260313],
}
df = pd.DataFrame.from_dict(mydict, orient='index')
def create_match_func(a_str):
def match_func(x):
pattern = r".* {}".format(a_str)
match_obj = re.search(pattern, x, flags=re.X|re.I)
return match_obj
return match_func
print df
print '-' * 20
target = "hdgfl1"
print df.select(create_match_func(target), axis=0)
--output:--
0 1 2
1421293_at Hdgfl1 2.140412 1.143337 3.260313
1429877_at Lrriq3 9.019368 0.874524 2.051820
1421293_at hDGFl1 2.140412 1.143337 3.260313
--------------------
0 1 2
1421293_at Hdgfl1 2.140412 1.143337 3.260313
1421293_at hDGFl1 2.140412 1.143337 3.260313
...
df.select(lambda x: x == 'A', axis=1)
select()
接受一个 function
,它沿着 axis
对标签进行操作,并且
函数应该 return a boolean
.
http://pandas.pydata.org/pandas-docs/stable/indexing.html#the-select-method
我有一个如下所示的数据框:
In [1]: mydict = {"1421293_at Hdgfl1":[2.140412,1.143337,3.260313],
"1429877_at Lrriq3":[9.019368,0.874524,2.051820]}
In [3]: import pandas as pd
In [4]: df = pd.DataFrame.from_dict(mydict, orient='index')
In [5]: df
Out[5]:
0 1 2
1421293_at Hdgfl1 2.140412 1.143337 3.260313
1429877_at Lrriq3 9.019368 0.874524 2.051820
我想要做的是 select 使用不区分大小写的查询从行名中提取行。 例如给定查询 "hdgfl1" 它应该 return:
0 1 2
1421293_at Hdgfl1 2.140412 1.143337 3.260313
"hdgfl1" 是“1421293_at Hdgfl1”的不区分大小写的查询。基本等同于grep -i
.
有什么方法可以做到?
你可以这样做:
query = 'hdgfl1'
mask = df.index.to_series().str.contains(query, case=False)
df[mask]
另一种可能性是:
mask = df.reset_index()['index'].str.contains(query, case=False)
但这慢了 2 倍。
In [229]: df.filter(regex=r'(?i)hdgfl1', axis=0)
Out[229]:
0 1 2
1421293_at Hdgfl1 2.140412 1.143337 3.260313
并使用 select():
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import re
mydict = {
"1421293_at Hdgfl1":[2.140412,1.143337,3.260313],
"1429877_at Lrriq3":[ 9.019368,0.874524,2.051820],
"1421293_at hDGFl1":[2.140412,1.143337,3.260313],
}
df = pd.DataFrame.from_dict(mydict, orient='index')
def create_match_func(a_str):
def match_func(x):
pattern = r".* {}".format(a_str)
match_obj = re.search(pattern, x, flags=re.X|re.I)
return match_obj
return match_func
print df
print '-' * 20
target = "hdgfl1"
print df.select(create_match_func(target), axis=0)
--output:--
0 1 2
1421293_at Hdgfl1 2.140412 1.143337 3.260313
1429877_at Lrriq3 9.019368 0.874524 2.051820
1421293_at hDGFl1 2.140412 1.143337 3.260313
--------------------
0 1 2
1421293_at Hdgfl1 2.140412 1.143337 3.260313
1421293_at hDGFl1 2.140412 1.143337 3.260313
...
df.select(lambda x: x == 'A', axis=1)
select()
接受一个 function
,它沿着 axis
对标签进行操作,并且
函数应该 return a boolean
.
http://pandas.pydata.org/pandas-docs/stable/indexing.html#the-select-method