如何根据行和列屏蔽 Panda DataFrame

Question

我有以下数据框。我正在寻找一种方法来自动屏蔽并从特定行和列中选择值而不考虑 nan 值。

data=pd.DataFrame([[ np.nan,  0. ,  np.nan,  3. , 77. ],
   [ 5.6, 40. , 12. ,  9. ,  np.nan],
   [ 5.9,  np.nan,  5. ,  5. , 59. ],
   [ 4.8, 30. ,  np.nan, 11. , 30. ],
   [ 2.2,  6. , 15. ,  np.nan,  5. ]])

例如 select 第 0 行和第 3 行以及第 1、3 和 4 列，如下所示：

data_selected=pd.DataFrame([[ 0, 3, 77 ],[ 30,  11, 30 ]], index=[0,3],columns=[1,3,4])

Answer 1

这是我的解决方案。如果你有更好的解决方案请给我你的答案

import pandas as pd
from scipy.stats import pearsonr
from sklearn import metrics
import numpy as np

data=pd.DataFrame([[ np.nan,  0. ,  np.nan,  3. , 77. ],
   [ 5.6, 40. , 12. ,  9. ,  np.nan],
   [ 5.9,  np.nan,  5. ,  5. , 59. ],
   [ 4.8, 30. ,  np.nan, 11. , 30. ],
   [ 2.2,  6. , 15. ,  np.nan,  5. ]])


# interate over the data frame to find best combination of row and column for extraction
size_list=[]
index_list=[]
dfs=[]
for i in data.index:
    print(i)
    # interate over the dataframe and remove columns based on nan values in each row
    boolean_ind=data.loc[i, :].isnull()  
    ex7=data[data.columns[boolean_ind==False]]
    # after removing the columns with nan value in row, use dropna command to romove other nan values from the selected dataframe (ex7)
    ex7_drop=ex7.dropna()
    # append each dataframe into the list to choose it after finding the best dataframe based on the size
    dfs.append(ex7_drop)
    # compute shape of the selected dataframe to choose the beset dataframe based on the size
    size=ex7_drop.shape[0]*ex7_drop.shape[1]
    size_list.append(size)
    # put row index of each dataframe into the index_list to easily select the selected rows from the base dataframe
    index_list.append(ex7_drop.index)

# select best index based on the max size
max_size_index=index_list[size_list.index(max(size_list))]
selected_df=dfs[size_list.index(max(size_list))]

如何根据行和列屏蔽 Panda DataFrame

How to mask Panda DataFrame based on the row and columns

nan

masking

pandas