如何根据行和列屏蔽 Panda DataFrame
How to mask Panda DataFrame based on the row and columns
我有以下数据框。我正在寻找一种方法来自动屏蔽并从特定行和列中选择值而不考虑 nan 值。
data=pd.DataFrame([[ np.nan, 0. , np.nan, 3. , 77. ],
[ 5.6, 40. , 12. , 9. , np.nan],
[ 5.9, np.nan, 5. , 5. , 59. ],
[ 4.8, 30. , np.nan, 11. , 30. ],
[ 2.2, 6. , 15. , np.nan, 5. ]])
例如 select 第 0 行和第 3 行以及第 1、3 和 4 列,如下所示:
data_selected=pd.DataFrame([[ 0, 3, 77 ],[ 30, 11, 30 ]], index=[0,3],columns=[1,3,4])
这是我的解决方案。如果你有更好的解决方案请给我你的答案
import pandas as pd
from scipy.stats import pearsonr
from sklearn import metrics
import numpy as np
data=pd.DataFrame([[ np.nan, 0. , np.nan, 3. , 77. ],
[ 5.6, 40. , 12. , 9. , np.nan],
[ 5.9, np.nan, 5. , 5. , 59. ],
[ 4.8, 30. , np.nan, 11. , 30. ],
[ 2.2, 6. , 15. , np.nan, 5. ]])
# interate over the data frame to find best combination of row and column for extraction
size_list=[]
index_list=[]
dfs=[]
for i in data.index:
print(i)
# interate over the dataframe and remove columns based on nan values in each row
boolean_ind=data.loc[i, :].isnull()
ex7=data[data.columns[boolean_ind==False]]
# after removing the columns with nan value in row, use dropna command to romove other nan values from the selected dataframe (ex7)
ex7_drop=ex7.dropna()
# append each dataframe into the list to choose it after finding the best dataframe based on the size
dfs.append(ex7_drop)
# compute shape of the selected dataframe to choose the beset dataframe based on the size
size=ex7_drop.shape[0]*ex7_drop.shape[1]
size_list.append(size)
# put row index of each dataframe into the index_list to easily select the selected rows from the base dataframe
index_list.append(ex7_drop.index)
# select best index based on the max size
max_size_index=index_list[size_list.index(max(size_list))]
selected_df=dfs[size_list.index(max(size_list))]
我有以下数据框。我正在寻找一种方法来自动屏蔽并从特定行和列中选择值而不考虑 nan 值。
data=pd.DataFrame([[ np.nan, 0. , np.nan, 3. , 77. ],
[ 5.6, 40. , 12. , 9. , np.nan],
[ 5.9, np.nan, 5. , 5. , 59. ],
[ 4.8, 30. , np.nan, 11. , 30. ],
[ 2.2, 6. , 15. , np.nan, 5. ]])
例如 select 第 0 行和第 3 行以及第 1、3 和 4 列,如下所示:
data_selected=pd.DataFrame([[ 0, 3, 77 ],[ 30, 11, 30 ]], index=[0,3],columns=[1,3,4])
这是我的解决方案。如果你有更好的解决方案请给我你的答案
import pandas as pd
from scipy.stats import pearsonr
from sklearn import metrics
import numpy as np
data=pd.DataFrame([[ np.nan, 0. , np.nan, 3. , 77. ],
[ 5.6, 40. , 12. , 9. , np.nan],
[ 5.9, np.nan, 5. , 5. , 59. ],
[ 4.8, 30. , np.nan, 11. , 30. ],
[ 2.2, 6. , 15. , np.nan, 5. ]])
# interate over the data frame to find best combination of row and column for extraction
size_list=[]
index_list=[]
dfs=[]
for i in data.index:
print(i)
# interate over the dataframe and remove columns based on nan values in each row
boolean_ind=data.loc[i, :].isnull()
ex7=data[data.columns[boolean_ind==False]]
# after removing the columns with nan value in row, use dropna command to romove other nan values from the selected dataframe (ex7)
ex7_drop=ex7.dropna()
# append each dataframe into the list to choose it after finding the best dataframe based on the size
dfs.append(ex7_drop)
# compute shape of the selected dataframe to choose the beset dataframe based on the size
size=ex7_drop.shape[0]*ex7_drop.shape[1]
size_list.append(size)
# put row index of each dataframe into the index_list to easily select the selected rows from the base dataframe
index_list.append(ex7_drop.index)
# select best index based on the max size
max_size_index=index_list[size_list.index(max(size_list))]
selected_df=dfs[size_list.index(max(size_list))]